Methods of determining the true accuracy of analytical methods

Morse for XRF analysis; and to Lisa Niedzwicki and Nancy. Bragg for valuable technical assistance during co-op and summer intern training periods...
0 downloads 0 Views 759KB Size
398

Anal. Chem. 1989, 61, 398-403

liquid chromatography/mass spectrometry, which involves extraction of the dye sample from the film.

ACKNOWLEDGMENT My thanks are extended to D. V. Brumbaugh and Lisa Niedzwiecki for the coatings used in this work; to M. R. Driscoll for the HPLC analysis; to D. L. Black and J. A. Timmons for the electron micrograph of Figure 7; to M. J. Morse for XRF analysis; and to Lisa Niedzwicki and Nancy Bragg for valuable technical assistance during co-op and summer intern training periods. I also gratefully acknowledge numerous technical discussions with D. V. Brumbaugh, M. S. Burberry, N. C. Ferris, A. H. Herz, and A. A. Muenter of these laboratories. LITERATURE CITED (1) The Theory of the Phorogrephlc Rocess, 4th ed.;James, T. H., Ed.; Macmiiian: New York, 1977 and references therein. (2) IbM, Chapters 8, 9, and 10. (3) Lenhard. J. J . Imag. Sci. 1988, 3 0 , 27. (4) Brandt. E. S. Appl. Spechosc. 1988, 42, 882. (5) Swface Enhanced Raman Scattering; Chang, R. K., Furtak, T. E., Eds.; Plenum Press: New York. 1982. (6) Black-and-Whlte Rocessing Using Kodek Chemkals; Kodak Publication J-I, catalog no. 152 8462; Photographic Products Group: Roch. . ester, NY, 1985. (7) Davydov, A. S. U s p . Fiz. #auk 1984, 82, 393. (8) Baetzold, R. C. J . Chem. fhys. 1971, 55, 4363.

(9) Weitr, D. A.; Garoff, S.;Gersten, J. I.; Nitran, A. J . Chem. fhys. 1903, 78, 5324. (10) Atkins, D. L. J. C o l b M Interface Sci. 1982, 90, 373. (11) Li, X.; Gu, B.; Atkins, D. L. Chem. fhys. Left. 1984, 705, 263. (12) Efrima, S. I n Modern Aspects of €/ectro&emk&y; Conway, 8. E., White, R. E., Bockris, J. O M . . Eds.; Plenum Press: New York, 1985; No. 16. DD 253-369 and references therein. (13) Brandt, E.' S.,unpublished results. (14) Zeman, E. J.; Canon, K. T.; Schatz, G. C.; VanDuyne, R . P. J . Chem. fhys. 1987, 87, 4189. (15) Kerker, M.; Wang. D. S.;Chew, H. Appl. Opt. 1980, 79, 3373. (16) Wang, J., et ai. Spectrochim. Acta 1987, 43A, 375. (17) Pettinger, B.; Gerolymatou, A. Ber. Bunsen-as. fhys. Chem. 1984, 88,359. (18) Hildebrandt, P.; Stockburger. M. J . fhys. Chem. 1984, 88, 5935. (19) Herz, A. H.; Danner, R. P.; Janusonls, G. A. I n Adsorption hom Aqueous Solotion; Advances in Chemistry 79; Amerlcan Chemical Society: Washington, Dc, 1968; pp 173-197. (20) Long, D. A. I n Raman Spectroscopy; McGraw-Hill: New York, 1977; p 34. (21) Batchelder, D. N.; Bloor, D. I n Advances in Inhared and Raman Spectroscopy; Clark, R. J. H., Hester. R. E.. Eds.; John Wiky and Sons: New York, 1984 Vol. 11. pp 141-143. (22) Garoff, S.; Weitz, D. A.; Aivarez, M. S.; Gersten, J. I. J . Chem. fhys. 1984, 87, 5189. (23) W e k D. A,; Garoff. S.; Hanson, C. D.: Gramila, T. J.; Gersten, J. J. Opt. Left. 1982, 7, 89. (24) Wokaun, A.; Lutz, H. P.; Klng, A. P.; Wild, U. P.; Ernst, R. R. J. Chem. fhys. 1983. 79, 509.

RECEIVED for review August 19, 1988. Accepted November 14, 1988.

Methods of Determining the True Accuracy of Analytical Methods Howard Mark* Technicon Industrial Systems, 51 1 Benedict Avenue, Tarrytown, New York 10591

Karl Norris Instrumentation Research Laboratory, USDAIARSISCSI, Building 002, Room 103, Beltsuille, Maryland 20705

Philip C. Williams Grain Research Laboratory, Canadian Grain Commission, 1404-303 Main Street, Winnipeg, R3C 369 Canada

Attempts to detmnine the accuracy of near-hdraredmethods of analysls led to the lnvestlgation of two algorithms for assesshg the true (or absolute) accuracy of analytical methods, as the usual methods of determlnlng the accuracy actually measure the comblned error of the near-infrared and the reference method againsl wMch It Is calibrated. One algorHhm Is sunable when a third method of analysts is available, the other Is more appropriate H many different anaiytlcai methods are used.

INTRODUCTION Our interest in methods of determining the accuracy of analytical methods arises from the diversity in analytical uses of near-infrared spectroscopy (NIRS) (1-3). Instruments employing this technology must be calibrated against a reference method of analysis. The calibration process most commonly used is multiple regression analysis. This mathematical procedure provides a number that purports to 0003-2700/89/0361-039880 1.5010

measure the accuracy (or, strictly speaking, the error) of the calibration results. However, it actually measures the combined error of the NIRS and reference methods. Indeed, the theory of multiple regression states that the dependent variable, which corresponds to the set of reference values, should properly provide most, if not all the error of the calibration ( 4 ) . Thus to date it has been very difficult to ascertain the true accuracy of NIRS analytical instruments or methodologies. The question of evaluating methods of chemical analysis is not new; chemists have been interested in means of assessing the accuracy of methods of chemical analysis for many years (see ref 5-8 for some early approaches to this problem). These approaches included various ways of comparing the results of different methods. More recently, Youden compiled an excellent treatise on this topic (IO). Mandel has developed methods of comparative evaluations that can be applied even in the absence of a calibration (see ref 11 and the citations in that chapter). The majority of the prior work classifies analytical error into two categories: precision (representing 0 1989 Amerlcan Chemical Society

ANALYTICAL CHEMISTRY, VOL. 61, NO. 5, MARCH 1, 1989

all random error) and bias (representing all systematic error). However, with the advent of modern methods of analysis, particularly instrumental methods, there is recognition that bias is only one of many possible systematic errors (12). An attempt was made to assess the accuracy of several methods of determining the protein content of wheat by using the average of multiple Kjeldahl determinations of protein as the “true” (called the ”definitive”) protein value against which other methods could be compared (13).However, this approach has some limitations. While averaging many readings to produce a “definitive”value for the protein content reduces the overall error of these readings, it does not reduce it to zero. Secondly, averaging many readings from one type of analytical method reduces only the contributions to the total error that are random. Simple averaging does not reduce possible systematic errors that affect a methodology. For example, if one sample contains an unsuspected interfering material, the average of many analyses of that sample will converge on the wrong result, no matter how many analyses are included in that average. On the other hand, comparing results that depend on two different methods gives only the total error of the difference between the combined methods. However, if results from three or more methodologies are available, then there is the possibility of circumventing these limitations and determining the absolute accuracy of each of the given techniques.

THEORY Two different approaches to determine the absolute accuracy of analytical methods are described here: one approach is more appropriate when four or more results for each specimen are available; the other is suitable when results from exactly three different methods of measurement are used. Algorithm 1. The first algorithm is suitable for four or more methods: If four or more readings (from different analytical methods) are available for each specimen, then the approach is to average the results from all the different analytical methods to obtain a better estimate of the “true” analyte value than any one method is capable of providing. The phenomena that generate systematic errors in one method will usually not apply to other methods. For example, the Kjeldahl method of protein analysis actually measures nitrogen; however, the measurement is almost invariably reported as protein. Since the truly correct conversion factor to use for any given sample depends upon the amino acid content of the proteins in that particular sample and is usually unknown, conventionally a fixed value, usually the average value for the generic sample type is used. The conversion factor between the measured nitrogen and reported protein is one of the systematic errors of the Kjeldahl method. This source of error does not apply to the near-IR method of analysis, for example. Therefore the systematic errors of unrelated methods are randomly distributed, and averaging results from many different and unrelated techniques will cause the average systematic error to approach zero, as well as the average random error. This algorithm is similar to the approach used in performing collaborative studies (IO),with an important difference. In a collaborative study, each laboratory uses the same analytical method. Therefore although the error is unknown, it is expected to be the same for each laboratory. On the other hand, when different analytical methods are compared, we do not expect the error to be the same for each method. This complication requires some changes in the way the data is analyzed. In the ordinary collaborative study, where the assumption of uniform error is made, the data is treated as a two-dimensional array, and a two-way analysis of variance is applied ( 9 , l O )(also see ref 15, p 168). But “The assumption would

399

also be wrong if one laboratory [Le., analytical method authors] showed more variable replicates than the remainder” (ref 9, p. 76). Computing the average ordinarily provides an estimate of the “true” analyte value for each sample. However, there is a difficulty associated with its use. If the different methodologies have different accuracies, then a simple averaging of the several results for each sample will not provide a minimum variance estimator for the mean (14).To circumvent this difficulty, the simple average of the different readings for each sample should be replaced with a weighted average (15)

c Wj

j=1

where X, represents the reading of the ith specimen by the jth technique, m is the number of different techniques in use, and W, is the weighting factor. In order for the weighted average to be the best possible estimate of the true value for the analyte, the correct weighting factor to use for each method is the inverse of the variance (the square of the standard deviation) of the error of the method (16). In nonstatistical language, the weighted mean is more accurate than the ordinary arithmetic mean because the weighted mean gives the more accurate methods greater influence in the determination of the result and therefore arrives at a more accurate estimate of the “true” value for the composition of each specimen. The summations are taken over the different methods used to analyze each specimen. The standard error of each method is then calculated from the differences between the readings obtained by using that method and the weighted average (which is the best estimator of the “true” constituent value (14)).The first approximation is to use the equation

where E represents the standard deviation of the error of the jth method and the summations are taken over all the readings obtained by using a given method. Attempts to use eq 2 lead to several difficulties. The first difficulty encountered in applying eq 2 is that of accounting for degrees of freedom. The array of raw data, containing m readings on each of n specimens contains nm degrees of freedom. However, computing the n averages uses up n degrees of freedom so that there are only nm - n = n(m - l) degrees of freedom available for computing error; these must then be allocated among m different errors so that each error must be computed by using n(m - l ) / m degrees of freedom; eq 2 therefore becomes

(3) \

J

The second difficulty is that eq 2 (also eq 3) requires the mean value for each sample, which is obtained from eq 1. Thus, the expressions in eq 1and eq 2 each require the results of the other equation for their evaluation; eq 1 requires the knowledge of the error, E , in order to compute the weighting factor, and eq 3 requires the weighted mean. Thus, in order to apply these equations to the data, an iterative approach is needed: First the weighted mean is computed for each

400

ANALYTICAL CHEMISTRY, VOL. 61, NO. 5, MARCH 1, 1989

sample by using eq 1, then the error of each method is computed from eq 2 and these values are used to compute the weighting factors to use in eq 1 on the next iteration. To initiate the computation, the errors (and therefore the weighting factors) are all set to unity at the beginning of the computation. Still another difficulty arises from the fact that, just as the standard error computed from comparison of results from two methods contains the error contribution from both methods, the difference of any given method from the mean value contains the error of the mean in addition to the error of the method; to compute the error of the method, the error of the mean must be subtracted. Just as the error of the mean of many readings with constant error is well-known to equal the error of the individual readings divided by the square root of the number of readings averaged, so it is possible to determine the error of a weighted mean. The general equation for the error of the weighted mean expressed by eq 1 (SD,,,) is (16)

(4)

The right-hand side of eq 6 is the sum square difference (SSD) of the analytical values. Each measured Xi can be written as the sum of the true value Xtrueplus an error Ei. Then eq 6 can be rewritten as

This reduces to

Considering the sum under the radical on the right-hand side of eq 8 n

n

n

n

i=l

i=l

i=l

i=l

C(Ei1- Ei2)2= CEil2+ CEi: - 2CEilEi2

(9)

Applying the definition of correlation coefficient (ref 17, p 61) to the errors, we obtain n

j=1

When the inverse values of the standard deviation of the corresponding values are used as the weighting factors, as we are doing here, this expression reduces to (see ref 16, p 135)

It is well-known that variances of random independent variables add ( 1 7). Applying this to the errors, the variance of each method‘s error is computed by subtracting the variance due to the error of the mean, computed from eq 5, from the variance for each method computed from eq 3. The third stage of approximation takes note of the fact that, even if the errors of all the methods are uncorrelated, the error of the set of values representing the weighted means will be more highly correlated with the values from the methods having more weight. In the presence of correlation, a fraction equal to R2, of the weighted mean’s error must be added back to the error of each method (see ref 17, p 87). Of course, both considerations can be taken into account a t once by subtracting (1 - R2)SDmem2from the error of each method. Algorithm 2. A second method must be used if analyses are available from only three methods. In this case the above procedure is not satisfactory, not only because the mean of each specimen is not as robustly determined as when more methods are available but also because the loss of degrees of freedom amounts to fully one-third of the total, which is excessive. Fortunately, another approach can be used when results from three different analytical methods are available. This is exactly the right number of methods that allow us to apply a nifty result that comes from the mathematical/statistical operation called analysis of variance. This approach does not require the computation of the mean or “true” value of each sample, so that using it does not result in any loss of degrees of freedom. The RMS difference between two sets of results is computed from the equation

Where R(E,,E2)is the correlation coefficient between the two errors, the Ei represent the individual errors, and E l and E2 represent the standard deviations of the two errors. The mean error is zero, therefore eq 10 reduces to

Rearranging eq 11, we obtain

CEilEi2= (n - 1)E1E2R(E1,E2)

i=l

(12)

Substituting the right-hand side of eq 12 for the last term on the right-hand side of eq 9, we obtain n

C(Ei1- E d 2 =

i=l

?Ei? i=l

+ i5Ei22 - 2(n - 1)E1E2R(E1,E2) (13) =l

Consequently, if the errors are uncorrelated, so that R(E,,E2)= 0, then eq 13 reduces to n

C(Ei1- EiJ2 =

i=l

?Ei12 i=l

+ i?Ei*2 =l

(14)

Squaring eq 8 and substituting eq 14 into it

?Ei12

i=l

+

?Ei22

i=l

E12’ = - n n Combining eq 15 with eq 6 we arrive at

?(Xi, - X d 2

i=l

5Eil2

= -i =+l -

?Ei22 i=l

(16) n n n In words, eq 16 states that the mean square difference between the readings from two methods of analysis is equal to the sum of the error variances of the two methods. Thus, if the two error standard deviations are known, the variances (which is the square of the standard deviation) can be used to compute the mean square difference or the square root of

ANALYTICAL CHEMISTRY, VOL. 61, NO. 5, MARCH 1, 1989

this, the root-mean-square (RMS) difference. However, as in the data available here, normally it is the RMS difference that can be calculated from the sets of values measured by the different methods of chemical analysis. What is unknown and desired, on the other hand, is the error of each of the methods of chemical analysis. Equation 16 alone does not contain enough information to allow the separation of the RMS difference into the contributions of the two methods. If the assumption that the mean error is zero is violated, a correction is easily made. By a derivation similar to that shown above it can be demonstrated that a nonzero mean error makes a contribution to the sum-square difference equal to D2,where D represents the mean difference between the errors of the methods (18). The two terms on the right-hand side of eq 15 are the variances of the individual errors; therefore eq 15 can be rewritten E122 = El2

+ E22

(17)

Thus we have shown that the total error is due to the combined errors of method 1 and method 2; the mathematical/statistical technique used is called analysis of variance. As applied here it shows that E12 (under the proper circumstances) can be separated into its two contributions E122 = El2

+ E22

Equation 17, however, does not contain sufficient information to separate the two contributions. If, however, a set of results from a third method is available, two more RMS differences, E13 and E23, can be computed by using formulas similar to eq 6 but using the differences between methods 1-3 and methods 2-3, respectively. Analysis of variance can then be applied to these two sets of RMS differences to form the additional two equations

+ E32 E232 = E22 + E32 E132 = El2

Equations 17-19 now constitute a set of three equations with three unknowns. The unknowns, E l , E2, and E3, are the errors of the individual analytical methods and can be solved without knowing the “truen values of the constituents in the specimens! A very important consideration for both ways of determining accuracy is that the errors of the individual methods must be independent, since eq 14 requires the errors to be uncorrelated. A priori considerations based on the nature of the analytical methods involved can help ensure this. For example, it may be expected that the errors from near-infrared, Kjeldahl, and neutron activation protein analysis would be independent, in the same way as near-infrared, oven drying, and Karl-Fischer moisture determination errors would be expected to be independent. On the other hand, if neutron activation and proton activation are both used for protein measurements, or vacuum drying and air-drying methods are both used for moisture determinations, then it is important to verify the independence of the errors. An important point to note, and we thank the reviewers for pointing out that it was missing in the initial draft, is the fact that both algorithms produce point estimates of the error standard deviations of the methods of chemical analysis under study. Since errors represent the effect of random variables, it is always important to place confidence limits around the point estimates. For algorithm 1, the error of each method is computed with n(m - l ) / m degrees of freedom. Consequently, the error values computed this way will be distributed as x2with that number of degrees of freedom, and the confidence intervals can be computed from standard tables (20).

401

For algorithm 2, the confidence limits are not so easily determined. Each computed error is subject to the variability of the three values of sum-square-difference that are used in the computation. Each of those values is itself distributed as x2 with n degrees of freedom. In principle, propagation of error considerations (ref 11, pp 72-77) could be used to trace the variability of the individual errors through the simultaneous equation computations, and by so doing determine their effect on the final results. In practice, the complexity of the computations makes this procedure unwieldly. We used a simpler approach, which was to do the computations after replacing each of the values of SSD with their respective confidence limits. To obtain the 95 % confidence limit for the results, we used the cube of 0.95 (=0.857) to determine the value of x2 to use in setting the limits of SSD (21).

EXPERIMENTAL SECTION The data used in this study were the same data used for the previous investigation of the accuracy of protein in wheat measurements (13)except for the near-IR values, which were measured by using the scanning spectrophotometer at the USDA facility in Beltsville (19). The near-IR results were obtained by averaging five spectral curves for each specimen using one grind per sample, ground with a Udy Cyclotec mill fitted with a l-mm screen. Therefore, we report the results for comparison of near-infrared, the definitive Kjeldahl, ordinary Kjeldahl, neutron activation analysis (22),proton activation analysis (231, thermal decomposition analysis (24),and Kjel-foss (25) and Kjel-tec (26). For convenience in nomenclature, we abbreviate the names of the various analytical techniques thusly: near-infrared (NIRS), the definitive Kjeldahl (KjD), ordinary Kjeldahl (Kj), neutron acthermal tivation analysis (NAA),proton activation analysis (PAA), decomposition analysis (TDA), Kjel-tec (KT), and Kjel-foss (KF). The near-IR protein values were measured by using a calibration for as-is protein, and then converted to dry basis protein using the simultaneously measured moisture. The values from all the other methods were reported as dry-basis protein. Data from 100 samples of wheat was used for the current computations. The computations were performed on an IBM PC-AT, using the IBM Personal Computer APL language. RESULTS Algorithm 1. The results for the eight methods of measuring protein in wheat obtained by using algorithm 1 described in the theory section are listed in Table I. Table I also shows the rate of convergence of the algorithm using these data; the final values for the standard deviations of the several techniques were reached after seven iterations. However, for assurance of convergence, the algorithm was programmed to continue until the largest difference between successive values of the standard deviation of the errors fell below 1E-10. The values of the weighted averages for each sample converged with equivalent rapidity; however, since there were 100 samples in the set, this table is too large to present here. It is important to assess the independence of the errors of the different methods. For this purpose, the errors of each method were computed as the difference between that method’s values for the specimens and the weighted average. The degree of independence was assessed by computing the correlation coefficientbetween the errors of each pair of methods; these results are presented in Table IIA. These results were compared with standard tables (20),to determine if any of the correlation coefficients represented statistically significant associations between the errors of any pair of methods (a statistically significant value for the correlation coefficient would mean that the errors of the methods involved would be larger than the computed values). However, in attempting this comparison, a difficulty was encountered. The difficulty was that, since many correlation coefficients were being tested simultaneously, the values in the standard

402

ANALYTICAL CHEMISTRY, VOL. 61, NO. 5, MARCH 1, 1989

Table I. Computed Standard Deviation of the Errors of the Several Methods of Measuring Protein in Wheat, for Several Iterations of the Algorithm, and the Standard Deviation of the Weighted Meana iter no.

wgtd mean

NIR

KjD

Kj

NAA

PAA

TDA

KT

KF

1 2 3 4 5 6 7

0.0497 0.0474 0.0470 0.0470 0.0470 0.0470 0.0470

0.1624 0.1600 0.1597 0.1596 0.1596 0.1596 0.1595

0.0942 0.0834 0.0815 0.0811

0.1258 0.1191 0.1182

0.0810 0.0810 0.0810

0.1205 0.1130 0.1121 0.1120 0.1120 0.1120 0.1120

0.1181

0.1103 0.1010 0.0996 0.0994 0.0994 0.0994 0.0994

0.2134 0.2246 0.2268 0.2272 0.2272 0.2272 0.2273

0.2140 0.2259 0.2282 0.2286 0.2286 0.2287 0.2287

0.1836 0.1854 0.1859 0.1861 o.ia61 0.1861 0.1861

22

0.0470

0.1595

0.0810

0.1120

0.1181

0.0994

0.2273

0.2287

0.1861

upper lower

0.183 0.136

0.0954 0.0689

0.1181

o.iiai 0.1181

95% Confidence Limits 0.129 0.136 0.114 0.0953 0.100 0.0970

0.261 0.193

0.262 0.195

0.214 0.158

a Twenty-three iterations were required for the difference between successive values of standard deviation to decrease below 1E-10. The 95% confidence limits for the final result of each method are also shown.

Table 11. Statistics for the Differences between Each Method’s Set of Values for all 100 Samples and the Weighted Averagesa

NIR KjD Kj NAA PAA TDA KT KF

NIR

KjD

1.Oooo 0.2552 -0.3346 -0.2707 -0.2677 -0.0562 -0.0009 -0.2995

0.2552 1.0000 -0.3910 0.1893 -0.4055

0.0428

TDA

KT

KF

-0.0562 -0.0801 -0.2811 0.0292 -0.2663 -0.1571 0.0024

-0.0009 -0.0701 -0.2732 -0.2317 -0.2195 -0.1571 Loo00 -0.0243

-0.2995 -0.4504 -0.0185 -0.2010 -0.0041 0.0024 -0.0243 1.0000

B. Mean Differences of Each Method from Overall Mean (Bias) 0.0340 -0.0167 0.0030 0.0083 -0.1407

-0.0833

0.0628

-0.oaoi

-0.0701 -0.4504

Kj

NAA

PAA

A. Correlation Coefficients -0.3346 -0.2707 -0.2677 -0.3910 0.1893 -0.4055 1.oooO -0.2003 0.3a27 -0,2003 1.oooO -0.3317 0.3827 -0.3317 1.Oooo -0.2811 0.0292 -0.2663 -0.2732 -0.2317 -0.2195 -0.0185 -0.2010 -0.0041

1.oooO

This is the assessment of the independent of the errors.

tables are not valid, when used as is. However, this phenomenon has been considered in the past and can be compensated for as follows (21): when n statistics are being considered simultaneously,the correct critical level is the value corresponding to a probability level that is the nth root of the desired critical probability (21). There are eight different methods being compared; therefore there are 28 distinct correlation coefficients, so the correct critical value to use corresponds to a probability that is the 28th root of 0.95, which equals 0.9981. Checking ref 13 we fiid that the closest tabled critical value is for a probability level of 0.9975; the critical value of correlation for 100 samples and a critical probability level of 0.9975 is 0.276. Table IIA shows that the errors from ordinary Kjeldahl are highly (and significantly) correlated with several of the other methods, surprisingly so with the proton activation analysis. The effect of correlated errors is to reduce the computed value of true error for that method, and the computed error for these two method is therefore lower than the true value. The actual values of the correlation are small, however, even though real, therefore the decrease in the computed value of the error is minimal and we will ignore it. The next step is to determine if the standard deviations (or variances) of the different methods are the same. For testing multiple variances we use the method of Bartlett, as described by Hald (see ref 14, p 290). For the standard deviations in Table I, the computed values of x2 for this data is 199.97. The critical value of x2 with 7 degrees of freedom is 15.51, therefore the different methods do indeed have different accuracies. Table IIB presents the means of the differences for each method from the weighted mean; this determines if any method has constant systematic error (bias). The standard

deviation of the eight different means is 0.0687. Ordinarily, an F-test would be performed by comparing the variance corresponding to this standard deviation with the pooled within-method variance; in such a case the expected value for the standard deviation of the means of the differences would be the pooled standard deviation of the methods divided by the square root of 100. However, since the different methods have different accuracies, it is not valid to pool them. Therefore, instead of pooling, the correct value to test against was estimated by using the median value (0.173). From this, the expected value of the standard deviation of the means is 0.173/1001/2,which equals 0.0173. Then F = (0.0687/0.0173)2 = 15.7. Critical F(0.95, 6, 100) = 2.2; therefore we conclude that the different methods do indeed exhibit bias between them. We note here that even if the standard deviation of KT, the largest one in Table I, had been used for this test, we would have obtained a statistically significant result (F = 9.02 for this case). From Table I, we can also see the accuracy to which we know the weighted mean for each sample as calculated by eq 5: the standard error for this weighted mean is 0.0470%. To determine the confidence limits for these errors, we follow the discussion in the “theory” section and use the table values of x2 to calculate the confidence limits on the variance. For 100 samples and eight methods we find that the error of each method is determined with 100 X = 87 degrees of freedom. From statistical tables (20) we find that the 95% confidence interval for x2 with 87 degrees of freedom is 63.1-114.7. This works out to a ratio range of the standard deviation of 0.851-1.152, or a factor of 15% above and below the point estimate. Table I also presents the upper and lower 95% confidence limits for each of the methods.

ANALYTICAL CHEMISTRY, VOL. 61,

Algorithm 2. To illustrate the use of the second algorithm for estimating the absolute error of analytical methods, the results from near-IR, TDA, and K T were used. These were chosen on the basis of our interest in the near-IR results and also on the basis of Table I1 showing that the errors of the other two methods have minimal intercorrelation with the near-IR results and with each other. Also, the bias of these three methods is low enough that the inclusion of the intermethod bias in the accuracy calculation will be negligible. If the errors of these three methods are designated El, E2, and E3 respectively, and labeling the RMS differences according to eq 16-18, it is apparent that: E12 = 0.3037, E13 = 0.2662, E23 = 0.3221. Therefore the three equations to be solved are

El2 + E22 = 0.09223 El2 + E32 = 0.07086

(20) (21)

+

(22) E22 E32 = 0.1037 and solving these equations we find that the three errors are near-IR = 0.1721 X TDA = 0.2501 X K T = 0.2030 The corresponding confidence limits, found by using the method described in the “theory” section, are as follows: lower confidence limits, 0.075,0.196,0.131; upper confidence limits, 0.232, 0.295, 0.256. The results for these three methods from Table I are in reasonable agreement with the results calculated here. The difference between the two sets of results is well within the confidence interval of either method of determining the absolute accuracy of methods of chemical analysis. The error of the “definitive” Kjeldahl is considerably larger than would be expected from the procedure used to obtain these values; from the accuracy of ordinary Kjeldahl as computed here (approximately 0.11) and from the accepted value for the error of Kjeldahl protein measurements (0.15-0.20), the value of the definitive Kjeldahl error may be expected to be 0.015-0.02. Clearly Kjeldahl analysis is subject to systematic errors not removed by averaging many readings. One likely source of this systematic error is the use of the fixed conversion factor between nitrogen and protein. The magnitude of the systematic error is, perhaps, surprising.

CONCLUSIONS Two methods of calculating the accuracy of analytical methods have been presented; applying these methods to several methods for the determination of protein in wheat showed that the different methods do indeed have different accuracies. The accuracy of the near-infrared method, of most interest to us, is competitive with that of the other techniques,

NO. 5,

MARCH 1, 1989

403

being neither the best nor worst. The near-IR method is not quite as good as standard Kjeldahl or activation methods but is superior to automated Kjeldahl methods and to thermal decomposition analysis. The two ways of calculating the true accuracy agreed well with each other, lending support to the validity of both approaches. Extracting the true measurement error from the data also allows determining the existence of systematic error, and when applied to the definitive Kjeldahl data has shown that this method of protein measurement is subject to systematic errors, as has been suspected. These systematic errors are not removed by averaging many readings using the same technique.

LITERATURE CITED Massie, D.R.;Norris, K. H. Trans. Am. SOC.Agric. Eng. 1085, 8(1), 598-600. Ben-Gera, 1.; Norris, K. H. J. Food Sci. 1088, 33(1), 64-67. Norris, K. H. I s r . J. Agric. Res. 1988, 18(3), 117-124. Ben-Gera, I.; Draper, N.; Smith, H. Applied Regression Analysis9 2nd ed.; Wiley: New York, 1981; pp 122-123. Starr, C. E.; Lane, T. Anal. Chem. 1949, 27, 572. Wernimont, G. Anal. Chem. 1040, 21, 115. Dyroff, G. V.; Hamen, J. Anal. Chem. 1953. 25, 1898-1905. Wernimont, G. Anal. Chem. 1051, 23(11), 1572-1576. Youden, W. J.; Steiner, E. H. StatisticeiMnualof the AOAC; ASSOCIation of Official Analytical Chemists: Washington, DC, 1975. Daniel, C. Application of Statistics to Industrial Experimentation; Wiley: New York, 1976; pp 162-165. Mandel, J. The Statisticai Analysis of Experimental Data ; Wiiey: New York, 1964; pp 363-369. Cardone, M. The Detection and Determinatlon of Error in Analytical Methods, Parts I and 11. Presented at the Twenty-second Annual Conference on Pharmaceutical Analysis. Aug 2-6. 1962, Lake Delton, WI. Williams, P. C.; Norris, K. H.; Johnson, R. L.; Standing, R.; Friconi, R.; MacAffrey, D.; Mercier, R. Cereai Foods WorM 1978. 23(9). 544-547. Hald, A. Statistical Theory with Engineering Appiications; Wiley: New York, 1952; p 243. Dixon, W. J.; Massey, F. J. Introduction to Statistical Analysis; McGraw-Hill: New York, 1969; p 184. Mandel, J. The Statistical Analysis of Experimental Data ; Wlley: New YO&, 1984; pp 132-135. Box, G. E. P.; Hunter, W. G.; Hunter, J. S. Statistics forfxpefimenters; Wiley: New York, 1978; p 88. Workman, J.; Mark, H. Spectroscopy 1988, 3(3). Norris, K. H.; Barnes, R. F.; Moore, J. E. J. Animal Sci. 1978. 43(4), 889-897. Beyer, W. H. Handbook of Tables for probability and Statistics ; C h e m ical Rubber Co.: Cleveland, OH, 1962; p 299. Mark, H.; Workman, J. Spectroscopy 1988, 7(5), 39-46. Trevis, J. Cereal Sci. T&y 1974, 79, 182. Dohan, D. A.; Standing, K. C.; Bushuk. W. Cereai Chem. 1978, 53, 91. ASTM method D3172, ASTM, 1915 Race St., Philadelphm PA. Williams, P. C.; MacEachern. L. W.; Lab. Pract. 1081, 30, 125. Hjalmarsson, S.; Mindel, B. I n Focus; Tecatur AB: Hoganas, Sweden, 1981; Vol. 8, p 9.

RECEIVED for review September 22, 1987. Resubmitted November 11, 1988. Accepted November 23, 1988.