Evaluation of analytical methods used for regulation of foods and drugs

of a method's performance in interlab- oratory collaborative studies. John Mandel of the National Bu- reau of Standards, in his 1981 Shew- hart medal ...
2 downloads 0 Views 13MB Size
William Horwitr

Regulations

Bureau of Fwds HFFJ Food and Drug Administration Washington, D.C. 20204

Evaluation of Analytical Methods Used for Regulation of Foods and Drugs Although the Association of Official Analytical Chemists (AOAC) has been evaluating and approving methods of analysis for almost 100 years, there is practically no discussion in the Journal of the Association of Official Ana/) tical Chemists of the criteria for determining which methods should be approved for regulatory use. These decisions are usually made on the basis of a methods performance in interlaboratory collaborative studies. John Mandel of the National Bureau of Standards, in his 1981 Shewhart medal address ( I ) , pointed out that the basicobjective of conducting interlaboratory tests is not to detect the known statistically significant differences among laboratories: “The real aim is to achieve the practical interchangeability of test resulta.” Inter. laboratory tests are conducted to determine how much allowance must b, made for Variability among laboratories in order to make the values interchangeable. An irreducible difference exists be tween supposedly identical measurements made in different laboratories This point was recently demonstrated by a group of New Zealand government laboratories in attempting to minimize the discrepancies in values for blood alcohol between laboratories. The laboratories went to great pains to discover every source of error, even to the extreme of moving analysts from one laboratory to another. They found that an analyst increases his or her intra-analyst variability when moved to a different laboratory envi ronment. They concluded that the only way IO eliminate interlaboratory variability was to conduct all analysea

Presented at the 95th Annual Meeting of the Association of Official Analytical Chemists, Washington,D.C.,Oct. 19,1981.

Thls article not subject lo U.S. ccpyrighl Published 1981 American Chemical Society

in a single laboratory and presumably by the same analyst. Under our legal system this solution is impossible, since defendants accused by laboratory evidence have a constitutional right to produce rebuttal evidence from any laboratory of their choice. Therefore, the important question to be answered in the evaluation of methods of analysis is how much allowance must be made for between-laboratory variability in interpreting the values produced by different laboratories. If the variability or error produced by the method is excessive-that is, i t does not permit effective regulation as required by the statute-the method must be judged unacceptable for the intended purpose. The purpose of this paper is to

-

tiie study procedure has provided the essential data for developing this information.

Method Characlerlstlcs Methods are usually evaluated on the basis of three characteristics: reliability, applicability, and practicability. For our present purpose, reliability is the overriding consideration. In general, when a need exists for a method, we have to accept any reasonable degree of reliability. Applicability to a wide range of sample types and practicability with respect to cost, time, and training constraints both assume greater importance when there are several competing methods. The important aspects of reliability, listed in their approximate order of importance for most purposes, are: Reproducibility, or total between-laboratory precision. This is the measure of the ability of different lab-

ANALYTICAL CHEMISTRY, VOL. 54. NO. 1. JANUARY 1982

*

67A

+30

tDrugs

In Feeds

Minor Nutrients

.

Concentration Flgure 1. The general cwve relating interlaboratory cwfficients of variation (expressed as powers of two on the right) with concentration (expressed a s powers of 10) along the horizontal center axis

oratories to check each other. I t is the overall measure of variability, including the within-lahoratory component. Repeatability, or within-laboratory precision. This is the measure of the ability of a laboratory (or analyst) to check itself. Systematic error or bias (sometimes also called “accuracy or inaccuracy”). This is the difference of the value(s) obtained from the true, assigned, or consensus value(s). Specificity (when required). This is the ability of the method to measure what it is intended to measure. Limit of reliable measurement (when required). This is the smallest amount (or concentration) of a material that can be measured with a stated degree of confidence. Which of these factors is most important depends upon the purpose for which the data will be used. In regulatory analysis, analytical values are used for three major purposes: to suruey a field to determine the extent of a problem; to monitor trends to determine if any corrective action has to be taken; and to determine Compliance with an economic or legal specification. A different emphasis on the various method characteristics is required to accomplish each purpose. In surveying a field, the normal variabilit y of the measurement of a commodity or the environment is usually so large 88A

.

that a high degree of accuracy and precision is not an important requirement. The averaging of numerous imprecise determinations often provides a surprisingly good mean. Sometimes all that is needed is to differentiate samples with “none” of the analyte from those that contain “significant” amounts. In monitoring trends, the systematic error, as long as it is constant, is not important. The precision must be good enough to detect when a “significant” difference occurs. In compliance activities a high degree of accuracy (as lack of bias) and precision are required a t the specification level, unless the specification is based upon the method itself, in which case only precision is pertinent. The precision requirement may decrease as the distance from the specification value increases. When the “no residue” requirements of the Federal F d , Drug, and Cosmetic Act are involved, specificity and limit of reliable measurement are the most important considerations. In practical work, other requirements come into play. In surveys, the need to analyze many samples makes a rapid method a necessity; in monitoring, repeated sampling of the same population is important; in compliance, practicality, although important, is secondary to reliability. Therefore, it is apparent that there is no such thing as a “best” method

ANALYTICAL CHEMISTRY, VOL. 54, NO. 1. JANUARY 1982

since the definition of “best” will vary with the purpose for which the method will be used. Since this is not usually known beforehand, we must usually assume that our primary interest will be in the achievement of a suitable degree of precision and bias; requirements for specificity and limit of reliable measurement will usually be selfevident. In regulatory work, or even in analyzing for adherence to commercial specifications, between-laboratory variability is the most important factor. Bias can be tolerated. If it is constant, a correction factor can be used. If it is variable, it becomes a component of reproducibility (between-laboratory variability). In fact, Youden (2) equates systematic error to the “true between-laboratory” variability, which in our terminology is the reproducibility adjusted for the within-laboratory variability (repeatability). Bias, as a recovery factor, particularly in modem trace analysis, is generally permitted to seek its own level, provided it is above 60-6070(3). Interlaboratory Preclslon It would appear that any systematic approach to estimating what constitutes a reasonable precision would he an almost impossible task. Methods are composed of almwt infinite combinations of dissolution, cleanup, and measurement procedures. These innumerable combinations are applied to pure substances and complex mixtures as solids, liquids, and gases by analysts with various degrees of competency. Yet, despite this complexity, we have found that analytical variability can be summarized (in an oversimplified fashion to be sure) by plotting the determined mean coefficient of variation (CV), expressed as powers of two, against the analyte level measured, expressed as powers of 10, as shown in Figure 1,taken from our recent paper on quality control (4). The sources of these data are an examination of over 150 independent AOAC interlaboratory collaborative studies covering numerous AOAC topics, from drug preparations and pesticide formulations on the high end of the concentration scale to aflatoxin contaminants a t the low end, with important stops in between at pesticide residue and trace element concentrations. At least five analytical methods-chromatography, atomic absorption spectrometry, spectrophotometry, polarography, and bioassay--are involved. A convenient, easily remembered reference point is that at 1ppm (10-6),the CV is 24 = -16%.Other points are given in Figure 1. The most important and startling point is that this idealized smoothed curve is independent of the nature of

7iI;

m

"

reagents, telling you everything you

s d

There's no limit to the number of orders you can place or to the numberofbenefitsyoucangetfrom our catalog. It lists over 20,ooO laboratory items and chemical

accompany every item, too, so you're never at a loss as to what the product looks like. The next time you're lookingfor sci-

days of receiving your order. In fact, we deliver so quickly you could get yourorderfasterfrom usthan from distributors in your own city! DER SERVICE CARD

Serving labs the

1

2

8

6

4

1

0

1

2

I.

Year

Figure 2. The performance of laboratories analyzing EPA's acalitv-Control SamDleS tor Desticide residues in fat and blood . . over a 13-year perid (4).Fat 0 , blood 0

I o I

Sample Fat Content (%) . ,

32

I

10

32

1

03

Concentration (%)

Figure 4. The interlaboratory coefficient of variation (center curve) and the 95% confidence limits (two outer curves) for the gas chromatograohic determination of methvl esters of fatty acids (7) the analyte or of the analytical technique that was used to make the measurement. This curve is merely a summary of available interlaboratory data, independent of such external influences as sampling and contamination. The significant data points are averages of a number of studies of similar analytes whose CVs may cover a factor of two in either direction: similarlv the concentration range may'also extend in both directions by an order of magnitude or so. But in general, the values taken from this curve are indicative of achievable and acceptable performance of an analytical method hy different laboratories. Independent evidence also supports this general precision curve. Qualkycontrol studies of pesticide residue determinations in fat and blood by EPA contractors, summarized in Figure 2

~

".."

'

-

Flgure 3. The interlaboratory coefficient of variation and standard deviation labsolute) of the aravimetric ether extraction method for thedetermination oitat in meat a s a function of the concentration (%) of fat (6)

"

100

,.~ ........

( 5 ) , show that the between-laboratory CVs improved with analytical experience, but only to a minimum value approximating the 16%found in the collaborative studies for pesticide residues in food. Similarly, the quality control monitoring of laboratories determining aflatoxin in peanuts for certification purposes ( 4 ) gives a value that corresnonds to the 32% CV on the general cu;ve for aflatoxin at the 10 ppb concentration level. Other evidence suggests that this curve may represent a floor for the precision of analytical methods in the interlahoratory environment. In fact, it appears that all methods more or less follow such a curve UD to a noint where the precision begins to deteriorate a t an even faster rate. This phenomenon is shown by methods in the macro scale such as fat in meat (gravi-

7 0 A * ANALYTICAL CHEMISTRY, VOL. 54, NO. 1, JANUARY 1982

metric) ( 6 ) and methyl esters of fatty acids (gas chromatographic) (7), as well as methods in the micro scale such as pesticide residues (gas chromatographic) (8,9), and trace elements (predominantly atomic absorption) (10).These examples are given in Figures 3-6, with the general precision curve labeled "AOAC" drawn in for reference in Figures 5 and 6. When the specific method curve deviates markedly from the general precision curve, we have obviously gone beyond the limit of reliable measurement for that method. There is one other useful piece of information that has been extracted from the data used to construct the general precision curve. The precision component due essentially to analysts (within-laboratory error or repeatabilty) in AOAC studies is usually onehalf to two-thirds of the total variahility (the combined effect of the withinand between-laboratory variability), as given in Figure 1. Ratios of repeatability to reproducibility considerably less than 0.5 indicate a very personal method Analysts can check themselves very well but they cannot check other analysts in other laboratories. This situation suggests that the directions require reworking or that the reference standards may differ from laboratory to laboratory. However, this low ratio is also typical of methods requiring considerable personal skill, such as counting filth elements or mold. A very high ratio can indicate that individual analyst replications are so poor that they swamp out the between-laboratory component. Outliers Another index that may prove useful for evaluating methods is the percentage of outliers reported in a collaborative study. Every analyst has

UV-240 GRAPHICORD

. ..

Provides High Performance at LOWCost High Spectral Performance Blazed holographic grating. . . low stray light (0.05%) and high resolution (>0.15nm). Microcmputer Contra! and Data Processing Reliable, accurate quantitative analysis is ensured by background correction, twolthree wavelength correction method. combined with least square calibration. Automatic wavelenoth calibration.

New Recording Functions and Formats Built-in graphic printer presents all normal scanning functions plus a record of all instructional parameters as well as final constituent concentrations. 0 Sequential or overlay scanning without distortion. Multiple scans are easily separated due to four different print formats.

~

SHIMADZU SCIENTIFIC INSTRUMENTS, INC.

[

Low cost The UV-240 offers the best cosV func-

tion ratio available...compare before you purchase, you will be glad you

did! SHIMADZU offers the UV-120 tine spectrophotometer, CS-910 and CS-920spectrophotometric TLC scanners and a variety of other SpeCtrOphotometer.

10

2

100

10

1.0

0.1

I

'I

Concentration (ppm) Figure 5. The Interlaboratory coefficients of variation for the determination of pesticide residues in butterfat (8)and in wildlife (9)by gas chromatographic methods. Wildlife 0.but-

implicit faith that if a method is followed exactly, the correct result.wil1 automatically be produced. However, our review of several hundred collaborative studies in which the samples were examined as true unknowns reveals that often 5 to 15%of the reported values are statistical outliersvalues that are far outside the region where most of the other values reside. Outliers are produced by experienced chemists as well as by novices, at macro as well as at trace levels. Schuller et al. (ll),in their review of aflatoxin methods, noted that they had to tolerate a 10%outlier rate in recommending methods for international referee status. In the analysis of moon rocks from the Lunar Analysis Program of the US.National Aeronautics and Space Administration, Morrison reported ( 1 2 )that almost 7% of the values had to be discarded as outliers. Outliers produced by a single analyst or within a laboratory are usually inconspicuous, since they are either unrecognized from analysis of single samples where there is nothing to 72A

Figure 8. The interlaboratory coefficients of variation of trace el6ments in blood by various methods as calculated from the data reviewed by Versieck and Cornelis ( 10)

compare the result with, or they are eliminated by repetition. In an interlaboratory situation with blind samples, where there is no opportunity to censor the data, outliers are more ohvious. In current AOAC collaborative studies, outliers are usually eliminated by the techniques suggested by Youden (2): a ranking test to remove consistently high or low laboratories, followed by the elimination of outlying individual values by a Dixon test involving the deviations of extreme values. We have only recently realized the importance of outliers in the evaluation of methods of analysis for approval by the AOAC. Outliers are a fact of laboratory life and allowance must he made for them. By definition, they lie at the extreme points of the statistical frequency distributions of a series of analytical values. Therefore, they have a large influence on the magnitude of the indices used to measure the performance of methods. Figures 7 and 8, using the data from the collaborative study of aflatoxin in

ANALYTICAL CHEMISTRY, VOL. 54. NO. 1, JANUARY 1982

Concentration (ppb)

green cacao h e m (13).illustrate this effect. The repeatability (within-laboratory variability) changes from an unacceptable CV of 50% for the 20 values from 10 laboratories to a marginally acceptable 36% by the omission of one value classified as an outlier by the Diaon test. In this case, as in many others, there is no question ahout the classification as an outlier since 18 ppb of dacoxin had been added to each sample. In this study, the mean changed little by the elimination of the outlier: from 16.0 to 14.6 ppb, or expressed in terms of recovery, from 89 to 81%. This particular example shows only a 590outlier rate. We have seen methods approved by the AOAC with an outlier rate as large as 50%. There is only one legitimate excuse for elimination of laboratories without a statistical test: intentional or unintentional failure to follow the method. An intentional failure occurs when the specified equipment or reagent is unavailable and failure to substitute will mean dropping out of the study. But

LT CHMMATOGRAPH GC-mn2 SERES GGu112 Series are compact, hyln quality gas cnromatographsfeaturing simple capillary column installation. unique columnoven temperature control.and easy maintenance

Fhmr IonhUon D.(&or M0d.l .Contam#nat,onfree The vapor from the flame Is exhausted directly from the jet through the cylinarlcal ion collector and ventsd at the top. thus elimmating almost all wntamination Lar e linear dynamic range (greater than

-

P

10 )

)r more in1

-

Dead-space4reecolumn connections mth nydrogen purging upstream 01 the column outlets Ebctron Crplum D.(eclor M O M -Pulse frequency control of constant current for electron capture detector. .On column and on detector cariier gas flow line with purge gas system. Separate detector purging system improves detector response and reduces the possibility 01 de. lector contamination and peak broadening. -Independent column wen and detector temperature controls.

)n these ai

Shimadzu also offers the GC-GAM Series cartridge unit type gas chromatograph. the =-TAG Series Ghost Cut gas chromato. graph, the GC.RlA Computer Controlled gas chromatographand a host of GC accessories.

ier Shimadzu instruments. SHIMADZU SCIENTIFIC INSTRUMENTS, INC.

SH IMADZU SCIENTIFC INSTRUMENTS, IN%.

I

9141-HR~dsramhRoad.Cobmbia.Md21045 U S A Phone (3011997~1227TelerRW59

SHIMADZU (EUROPA) QmbH mm~~~sei~ri~.manner weya str ,F ~ ~ e , m a ~ nno~n e i 0 2 1 ~ ~ 4 Tf W f i i 08~~683s SHIMADZU CORPORATION INTERNATIONAL MARKETING OW. Shmiuku-M!lSs Bulldmg. 1 1. NIShlSh8nlYku. 2.chome. Shlnlukwku Tokyo 160 J d D V b e : TokyoO3-346-5641 Telex 0232~329,SHMDT J

CIRCLE 192 ON READER SERVICE CARD

I

1

Total Aflatoxin (ppb) Figure 7. Original data from the interlaboratory study of t r ~ c determination of aflatoxin in cacao beans (13). The two Values from each laboratory are ploned horizontally. The circled value is an outlier by the Dixon test substitution on the basis of “it cannot possibly affect the results” is inexcusable. Collaborative studies are very expensive in terms of time and manpower. Jeopardizing their success with untested changes undermines the entire collaborative study. Although many AOAC studies show no outliers, a 5 to 15%outlier rate, particularly at the ppm and ppb levels, is not at all unusual. If but one of five or six laboratories required in a minimum statistical pattern turns out to be an outlier by the Youden ranking test, we have reached a 20% rejection point. It appears that we may have to tolerate a 20% outlier rate, because that is the penalty to be paid if we use a minimum number of participating laboratories. This is one of the reasons why having a t least 10 laboratories in a study will improve the chances of acquiring adequate data for statistical evaluation. There is one important statistical problem with outliers: What outlier test should be used? This is a complex statistical problem whose solution depends on the true distrihution of values. Chemists seem to have little difficulty in applying intuition and experience to this problem, but many statisticians are appalled at this approach. We hope to apply a number of outlier tests to a number of AOAC collaborative studies to determine if any of the several dozen procedures described in the statistical literature is best suited for application to interlaboratory work. 74A

Concentration (ppb) Figure 8. The data from Figure 7 plotted as a normal frequew cy distribution with the outlier included (20 points. broken line) and the outlier excluded (18 points, solid line)

False Positives and False Negatives Another potentially useful suitahility index for evaluation of methods may be the percentage of false positives and false negatives. False positives (excessively high blanks) may appear when working at any concentration level, but the appearance of both false positives and false negatives is characteristic of trace analysis. These values are not necessarily outliers, but they can be. We have noted in our examination of the available aflatoxin studies that the percentage of false negatives increases much like our CVJconcentration curve as the concentration approaches zero. This phenomenon may be more useful for delineating a limit of reliable measurement, i.e., the concentration at wbicb the proportion of false negatives is more than 20% (or some other number), than for evaluating the performance of methods in general. Bias or Systematic Error Up to now I have paid little attention to the matter of bias or systematic error because in most cases it takes care of itself. Very few methods are rejected because of low or high recoveries. In the case of macro methods, recovery is frequently very close to theoretical, because these methods are usually based upon stoichiometry or basic physical principles of extraction or separations, and the amount of analyte available for measurement is not

ANALYTICAL CHEMISTRY, VOL. 54. NO. 1. JANUARY 1982

limited. Furthermore, many methods in food chemistry are empirical-they are based upon the faith that other participants will adhere to the specified directions to produce equivalent results. Empirical methods by definition have no bias. In trace analysis, the precision characteristic usually takes care of recovery, because the random error is often as large or larger than the systematic error. If the recovery is low but repeatable, as in isotope dilution methods, any recovery is acceptable since correction can be made back to the 100%level. If it is variable, even though within acceptable limits, the correction factor procedure will not do. The method must then be accompanied hy sufficient recovery data to indicate the boundaries of performance. In the proposed “SOM” document of the FDA, recovery limits were given as more than 80% for concentrations of 0.1 ppm and above, and more than 60% for lower concentrations (3). Naturally, we would prefenhigher recoveries. But these figures do appear to be reasonable in light of actual recoveries under ordinary, and not collaborative, conditions.

Summary The primary objective of interlaboratory studies is to determine if we have achieved interchangeability of test results among laboratories. But interchangeability is a function of the purpose for which the results will be used to survey a field to monitor trends; or to determine compliance

THr

-

m n

i i n n m n

C[ -- -3K--- -- - -- C - ;i I ZI :I

The low histicated data processor designed to meet all of your chromatography applications: GC, LC, r l C and AA compare the Shimadzu standard features combined with low

Full battery protected memory Camplete alpha-numenc capability Continuously vanable peak width integration n Unique design based on years Or chromatography expenence offers reliability and flexibility

cost to see 'thy the Chromatopac CRlB is the best choice in recording integrators.

A

v for more informationon these and other Shimadzu instruments. 'HIYADZU SCIENTIFIC INSThumsnm-, mu.

SHIMADZU INC. SCIENTIFIC INSTRUMENTS,

4?-H Red Branch Road. Cdumblr. Yd. Z 1 W . U S A Phme: (5011997-

HIYADZU (EUROPA) CmbH

W Dilueldolt 1. Jahannn Wryer S b I F . R. Germany Phone: (M11131861 Tebr 0856%

HIMADZU CORPORATIONINTERNATIONAL MARKETING DW.

I

I

Howtochoose

anlCPSpectrometer

andanICPconwam

&3 Look for superior performance in any matrix. No 0 0

0 0

1. Are low detection limits (ppb) important? 2. Has the instrument been designed specifically for ICP? 3. Can alignment be performed and verified easily- graphically? 4. Does the instrument have the stability and accuracy you need?

The more "yeses" you checked, the more reasons you have to learn more about the Baird Plasma Spectromet, the most advanced system in the ICP field today. Designed from the ground up for ICP, the Plasma Spectromet combines Baird quality optics with the most advanced innovationsin plasma spectroscopy and high speed graphic computer technology. With the Baird Plasma Spectromet, you can have simultaneous multielement analysis in a matter of seconds. As the world leader in optical emission spectrometers since 1936, Baird can be depended upon for superior performance, quality workmanship, and a total commitment to our customers. For more informationabout the Baird Plasma Spectromet and a free copy of the entire "How to Choose"series, call or write us at Baird Corporation, 125 MiddlesexTurnpike, Bedford, MA 01730. Tel: (617) 276-6094. Telex: 923491.

BAIRD The Spectroscopy People CIRCLE 26 ON READER SERVICE CARD

76A

ANALYTICAL CHEMISTRY, VOL. 54, NO, 1, JANUARY 1982

with a specification. Each of these purposes places different emphasis upon the characteristics or attributes 3f methods: their reliability, applicability, and practicability. For regulatory purposes reliability is paramount and the between-laboratory precision is the critical component. In general, this precision can be represented by the following equation: cv (%) = 2(1-0.51o~C) where C is the concentration expressed as powers of 10 (e.g., 1ppm = 10-6). The coefficient of variation doubles for each decrease of concentration of two orders of magnitude. The between-laboratory coefficient of variation at 1ppm is 16%(Z4). The within-laboratory CV should ordinarily be one-half to two-thirds the between-laboratory CV. Other potential evaluation criteria include an outlier rate of 20% or less and an acceptable level of false positive values and false negative values. Acceptable specificity and limit of reliable measurement may also have to be based on the level of false positives and false negatives. Recovery values ordinarily take care of themselves at macro levels, but at trace levels, 60% at the pph level and 80% at the 0.1 ppm level may be the lowest acceptable recoveries.

References (1) Proeress . , Mandel. J. 'Jualitv . _ _ Auaust 1981,34-36.

(2) Youden, W. J.; Steiner,E. H.Statistiea1 Manual of the AOAC (1975).Associa-

tion of Official Analytical Chemists: Washington, D.C. (3) Fed. Regist. March 20,1979,44 (55), 17 070-114. (4) Horwitz, W.; Kamps. L. R.; Boyer, K. W. J. Assoc. Off. Anal. Chem. 1980. (5)63,1544-54. Watts, Randall R. "Proficiency Testing and Other Aspects of a Comprehensive Qu?lity Assurance Program." In "Optimmne Chemical Laboraton Performancethrough the Appliraiiiin 01 Qualit y Asurance Principles". Garfield. Frederick M. et al., Eds.; Asiwiation of Offi. vial Analviiral Chemists: Arlinetwi. 1 . Va.. 1980, ppk-115. (6) Pettinati, Julio D.; Swift, Clifton E. J. Assoc. Off. Anal. Chem. 1977,60, ~~~~~

~~~~~~~

~~

176): Food andiAmicul-

ture Org&izt;tion:'kome, Italy.

I

(9) Holden, A. V. ':The OECD Interna-

tional Co-operative Studies of Organochlorine Residues in Wildlife." In "Environmental Qualitv and Safety": Coulston F.; Koite, F ,Eds.; George Thieme Publishers: Stuttgart; Supplement Vol.

(lk;'iPePr%;Jacques; Cornelis, Rita, Anal. Chirn. Acta 1980,118,217-54. (11) Schuller, P. L.; Horwitz, W.; Stoloff, L. J. Assoc. Off. Anal. Chem. 1980,59,