1942
Anal. Chem. 1989, 61, 1942-1945
Robust Statistics and Functional Relationship Estimation for Comparing the Bias of Analytical Procedures over Extended Concentration Ranges Michael Thompson
Department of Chemistry, Birkbeck College, Gordon House, 29 Gordon Square, London W C l H OPP, U.K.
Kane's Kjeklahl proteln data were assembled to compare the efficacy of a copper sulfate catalyzed procedure with that of the standard procedure employing mercury( I I ) oxide. The orlglnal data analysis was carrled out by a conventlonal analysis of variance at each level, after the rejection of outliers. The data have now been reexamlned by a functional relatlonshlp method after the estlmatlon of robust means and standard devlatlons. The comblned methods gave the rapid and unamMguous r e d l that there was no measurable relative blas between the two procedures.
In 1984 Kane (I) reported parallel collaborative trials of two variants of the manual Kjeldahl procedure for the determination of protein in feedstuffs. The same test and reference materials were used in both procedures. The purpose of the trial was to determine whether a variant procedure, which used a copper sulfate catalyst, gave the same results as the standard procedure requiring a mercury(I1) oxide catalyst. The parallel trials were designed in the normal manner, with 22 laboratories participating. Twenty-six test materials were analyzed in each laboratory by each procedure, with blind duplication. Two reference materials were also included, but were analyzed without duplication. The test materials were organized as 13 closely matched Youden pairs, so that a split level interpretation could be undertaken (2). The interpretation reported by Kane was carried out in the normal manner, consisting of a separate one-way analysis of variance for each analytical procedure and each Youden pair, after outliers had been identified and excluded from the data. The purpose of the trial, namely the detection of bias (if any) between the two procedures, was addressed by a separate comparison of the means for each material. The general conclusion was that, as significant differences between procedures were observed in only four of the 26 materials at the 95% confidence level, and those differences were small, the variant procedure with the copper catalyst could be adopted. In this study some newer statistical methods were used for the interpretation of this large suite of published interlaboratory data. The purpose was to test the applicability and power of the functional relationship for a comparison of the results of two analytical procedures carried out on a number of materials of the same class but with widely varying analyte levels. In addition the use of a robust method for estimating means and standard deviations of analytical data was also tested.
STATISTICAL METHODS Robust Mean and Standard Deviation. Robust methods are designed for use with parametric data that might be contaminated with outliers ( 3 ) . The potential applicability to analytical data is therefore obvious. The aim of robust statistics is to describe the parametric or "good" part of the data set. This is similar in intention to the rejection of outliers
before using classical statistics. However, robust methods accomplish this by downweighting rather than rejecting the outlying results. Unlike their classical counterparts, robust estimates cannot be calculated directly from a formula. Nevertheless, they can be readily obtained by numerical methods that can be simply implemented on a microcomputer. Many different robust methods have been proposed, but the one used in this study is based on the "Huber Proposal 2" ( 4 ) . Suppose that the data xl, x 2 , . .. x , come from a Gaussian population N(p,a2) but are contaminated with outliers subsequently. The "Winsorized" values (fi) of the variate are defined as follows:
Ri = x i if Ixi - ml C cs or
Ri = m
+ sign ( x i - m)cs if J x i- m( > cs
(1)
The value of the constant c depends on the expected proportion of outliers. A value of c = 1.5 seems widely accepted, protecting against up to 5% of outliers, and was used in this study. The robust estimates of mean (m)and standard deviation (s) are obtained from the equations
m = average s2 = var
(ei)
(2)
(Ri)/B
(3)
where B is a function of c. However, the Winsorized values are not defined until m and s are evaluated, so eq 1-3 have to be iterated to a suitable degree of convergence. Suitable initial estimates of m and s are
mo = median ( x ) so = 1.483 median (Ixi - median
(XI[)
The function B is defined as
B = E[min (c2,X2)] where X is a standard normal deviate. This function can be evaluated from the equivalent expression
where the probabilities (p) are obtained from tabulations of the area of the Gaussian curve. Some corresponding values of c and B are as follows: c
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
B 0.516 0.578 0.635 0.688 0.736 0.778 0.816 0.849 c
1.8
1.9
2.0
B 0.877 0.900 0.920
Functional Relationship. The obvious course to pursue in this study was to assume a linear relationship between the results of the two procedures. Other types of relationship are sometimes encountered (5), but in this case a glance at the scatter plot (Figure 1) shows that a linear model is a good initial assumption.
0003-2700/89/036 1-1942$0 1.50/0 0 1989 American Chemical Society
ANALYTICAL CHEMISTRY, VOL. 61, NO. 17, SEPTEMBER 1, 1989
-
Protein ( o/o 1 Copper Catalyst Figure 1. Comparison of robust means for the two procedures for
1943
A FORTRAN program for the maximum likelihood functional relationship method used in this study can be obtained from Prof. B. D. Ripley, Department of Mathematics, University of Strathclyde, Glasgow G1 l X H , U.K. If there is no overall bias between the two analytical procedures, then E ( a ) = 0 and E(b) = 1. Bias is demonstrated by a significant deviation from these criteria and is formalized by an examination of the values of a/se(a) and (b - l)/se(b), which are distributed approximately according to the t distribution. The adequacy of the linear model can be tested by an examination of the scaled residuals. Ideally they should be randomly positive or negative and show no trend in absolute magnitude when plotted against concentration and be approximately distributed as a sample from a standard normal deviate. Additionally the degree of fit of the model can be gauged by forming the sum of the squares of the scaled residuals, i.e.
protein determination. The functional relationship approach is similar to weighted linear regression in that it provides estimates (a and b) of the intercept and the slope coefficient together with their standard errors (se(a) and se(b), respectively). However, it differs from regression in an important way: It produces unbiased values of a and b when both of the variables are subject to errors, as in the comparison of analytical methods. A corollary of this advantage is that the interpretation of variables as “dependent” or “independent” is purely nominal, and the method gives the same relationship whichever way the two variables are designated. Methods of estimating the parameters of linear functional relationships have been suggested since the work of Adcock (6). A maximum likelihood treatment by York (7) obtained b by minimizing the expression Ci[(Xi- U i ) 2 / k i
+ (yi - a - P U i ) 2 / l i ]
where x i is the mean result of the first method on the ith material, yi is the corresponding result by the second method, ki and li are the respective variances of x i and y i , ui is the (unknown) population mean of x i , and a and @ are the parameters to be estimated by a and b. Note that kill2 and li1/2 are the standard errors of the means x i and y i . The minimization was carried out by an iterative method, after setting the derivatives to zero. Ripley and Thompson (8),concerned with accuracy comparisons in analytical science, showed that the problem was equivalent to minimizing
Ciwi(P)[yi-
- Pxi12
where the weight wi(@) is given by Wi(P)
= wi = l / ( l i
+ P2lZi)
and a(@)= Ciwi(yi - Pxi)/Ciwi
to find the value of b. The minimization was undertaken by a linear search with parabolic inverse interpolation, a method thought to have superior performance to setting the derivatives to zero (9). The iteration starts with a preliminary estimate of @ obtained by normal regression. Expressions such as a(@) indicate that the estimate a depends on the estimate of @ in the current iteration. The standard errors are given by se(a) =
[Ciwixi2/CiwiCiwi(xi-z,)~]’/~
se(b) =
[Ciwi(xi- f , ) 2 ] - 1 / 2
where the weighted mean is given by f , = CiWiXi/CiWi
For a perfect fit to the model S should have a chi-squared distribution with n - 2 degrees of freedom. Lack of fit is indicated by a value of S greater than that shown for the chi-squared distribution at p = 0.05. This circumstance might arise if the true relationship were curvilinear, or alternatively if cumulatively there were enough biases between individual pairs of results. In the present study this latter effect could conceivably arise from occasional failure of one or other of the two analytical procedures for particular trial materials. A significantly high value of S could also be obtained when no bias were present if the precisions of the analytical procedures were underestimated. This last feature is a common failing in within-laboratory analytical data. In this study, however, underestimated precision would not be expected, as the data originated independently from different laboratories.
RESULTS AND DISCUSSION Comparison of Means. The data from Table 2 of Kane’s paper, as outlined above, were used without any omission. The results for each blind duplicate were averaged. Then the robust means and standard deviations were estimated over all 22 laboratories, for each of the 28 materials and both analytical procedures. The “Youden pairs” of similar materials were treated as two independent materials. The reference materials were included in the data set and treated as were the test materials, except that there was no analytical duplication. The statistics are shown in Table I and 11. The functional relationship between the two sets of 28 means for each procedure was then investigated by the method of Ripley and Thompson, with the arbitrary selection of the results of the copper-catalyzed procedure as the “independent” variable. The results obtained were as follows: intercept, a = 0.005, se(a) = 0.040, t = 0.12; coefficient, b = 0.9997, se(b) = 0.0015, t = 0.20. The results indicate that overall there is no measurable bias between the two methods, as the respective t values are not significantly high. (In fact the t values are unusually low, perhaps indicating that the original analytical data from the two methods were not completely independent). The scaled residuals showed no trends or unusual features when plotted against concentration (Figure 2). The distribution showed no obvious excess of outlying points, showing that the general conclusion (of no bias between the results of the procedures) applied equally to the individual trial materials. The sum of the squared residuals was 18.5, less than the critical level of x2 (38.9 for p = 0.05 and 26 degrees of freedom), demonstrating the absence of significant lack of fit. The residuals from some of the “Youden pairs” seemed systematically close by visual inspection, suggesting small matrix effects for particular materials. However, an analysis
1944
ANALYTICAL CHEMISTRY, VOL. 61, NO. 17, SEPTEMBER 1, 1989
Table I. Robust and Classical Means ( % Protein) of Results from 22 Laboratories
material dehydrated alfalfa meal dehydrated alfalfa meal soy protein concentrate soy protein concentrate pullet grower layer meat and bone meal meat and bone meal custom mix cattle feed beef feed swine base mix swine base mix dry dog food dry puppy food hydrolyzed poultry feathers hydrolyzed poultry feathers cottonseed meal cottonseed meal blood meal blood meal swine developer swine grower milk replacer milk replacer soybean meal soybean meal NHdH2POd lysine HC1
mean, mean, Cu method Hg method classical robust classical robust 18.74 16.86 87.66 88.89 15.69 16.13 54.20 53.83 13.36 10.11 29.76 29.27 26.97 28.10 85.85 90.18 39.43 40.40 84.82 80.61 17.00 17.92 17.75 17.79 44.02 44.41
-
18.74 16.86 87.92 88.81 15.73 16.20 54.21 53.92 13.39 10.19 29.76 29.32 26.97 28.17 85.84 90.30 39.40 40.52 84.84 80.73 17.02 17.95 17.79 17.78 44.08 44.54 75.78 93.38
18.75 16.86 87.60 88.82 15.70 16.17 53.92 53.55 13.44 10.08 29.73 29.24 26.88 27.99 85.75 90.06 39.38 40.38 84.94 80.57 17.02 17.94 17.67 17.87 44.11 44.59
-
18.79 16.88 87.77 88.86 15.72 16.20 53.97 53.68 13.48 10.08 29.65 29.26 26.92 28.02 85.63 90.18 39.49 40.52 85.23 80.73 17.05 18.05 17.70 17.83 44.16 44.66 75.50 94.59
0.
a
(a)
0
Table 11. Robust and Classical Standard Deviations ( % Protein) of Results from 22 Laboratories
material dehydrated alfalfa meal dehydrated alfalfa meal soy protein concentrate soy protein concentrate pullet grower layer meat and bone meal meat and bone meal custom mix cattle feed beef feed swine base mix swine base mix dry dog food dry puppy food hydrolyzed poultry feathers hydrolyzed poultry feathers cottonseed meal cottonseed meal blood meal blood meal swine developer swine grower milk replacer milk replacer soybean meal soybean meal NHdH2P04 lysine HCl
std dev, std dev, Hg method Cu method classical robust classical :robust 0.30 0.25 0.94 0.98 0.20 0.33 0.50 0.55 0.25 0.20 0.46 0.36 0.38 0.35 0.93 1.08 0.50 0.82 0.91 0.85 0.20 0.20 0.24 0.20 0.42 0.52
-
0.35 0.28 0.99 1.26 0.22 0.23 0.56 0.68 0.19 0.23 0.47 0.33 0.44 0.30 1.30 0.99 0.56 0.67 1.01 1.05 0.26 0.26 0.26 0.29 0.48 0.59 0.96 2.26
0.40 0.27 0.88 1.13 0.22 0.27 0.77 1.00 0.27 0.20 0.48 0.32 0.35 0.44 0.89 1.05 0.71 0.62 1.14 0.99 0.26 0.28 0.29 0.26 0.34 0.51
-
0.32 0.31 0.94 1.33 0.20 0.30 0.75 0.94 0.28 0.27 0.60 0.34 0.42 0.55 1.28 1.10 0.71 0.90 0.95 1.10 0.34 0.34 0.34 0.29 0.40 0.66 0.77 1.61
of variance of the residuals between and within “Youden pairs” did not support this conclusion (F = 1.18 with 12 and 13 degrees of freedom; p = 0.384). Precision and Concentration. The relationship between precision and concentration for both analytical procedures was
Protein Concentration
- Copper
( O/*)
2.5 I
Catalyst 1
I
.A
2.01 1.5
1 I
I
20
40
60
BO
100
Protein Concentration ( o/o I - Mercury Catalyst Figure 3. Robust standard deviation vs concentration (a) for the copper-catalyzed procedure and (b) for the mercury-catalyzed procedure. The solid lines show the weighted regression lines. The broken curves show the Horwitz function. The circled point in a was omitted from the regression. investigated by weighted regression of the robust standard deviations against the robust means. The model fitted was ac = a. 8, where ac is the standard deviation at concentration c and uo and 0 are constants. The weights were estimated by 2n /sC2 where s, is the estimate of uc. The data are shown in Figure 3, together with the regression lines. For the copper-catalyzed results, the circled point was excluded from the regression. The regression statistics were as follows (%I: CuS04, a = 0.064, b = 0.0112, S = 23.6, dof = 24; HgO, a = 0.109, b = 0.0111, S = 52.4, dof = 26 where
+
ANALYTICAL CHEMISTRY, VOL. 61, NO. 17, SEPTEMBER 1, 1989
a estimated uo and b estimates 8. Overall, the levels of precision indicated by these values were virtually identical for the two procedures over the whole concentration range. Apart from the result for the reference material lysine (excluded from the regression), the copper-catalyzed procedure gave a good fit to the model, as shown by the x2 test. In contrast the mercury-catalyzedprocedure showed significant lack of fit (x2 = 38.9 for 26 degrees of freedom and p = 0.05). This demonstrates that the mercury-catalyzed method gave somewhat less predictable values of precision from material to material. In both instances the precision function is noticeably lower than that predicted by Horwitz’s conjecture (10) that reproducibility should follow the relationship Q,
=0.02~95
Comparison of Robust and Classical Estimates. Means. The means produced by the robust estimation were very close to those produced by classical statistics after outlier rejection (Table I). The average difference between the methods amounted to only 0.15%, well below the relative standard deviation of the procedures (about 1.1%). Standard Deviations. In contrast to the means, the standard deviations produced by robust estimation were noticeably greater on average than those produced by classical statistics afer outlier rejection as implemented by Kane (Table 11). The average increases amounted to 10% for the copper-catalyzed procedure and 13% for the mercury-catalyzed procedure. This difference probably reflects the unwarranted rejection of data by the outlier tests, i.e., a small proportion of “errors of the first kind”. While this has a relatively small effect on the mean, the standard deviation is more strongly affected. This difference may be of some importance if it is general in analytical data. Collaborative trial data are currently processed after outlier rejection, and values of repeatability and reproducibility may subsequently be used as standards for data quality control schemes. If these statistics are underestimated, then an unduly large proportion of data may be unnecessarily rejected by failing the precision tests. If the standard were set 13% lower than the true level, for example, about 9% of the observations would fall outside the putative 95% confidence limits.
1945
CONCLUSIONS Two nonclassical methods of statistical estimation have been used to examine an existing data set. The trial showed that the robust method used gave appropriate values of mean and standard deviation without the need for outlier tests. The maximum likelihood functional relationship provided a (statistically) unbiased comparison of the relative accuracy of the two methods over the whole range of materials analyzed, including the reference materials, and provided a validated fitting to a model with only two parameters. A comparison of the parameter estimates with their theoretical values showed that there was no significant (analytical) bias between the two procedures. Examination of the residuals showed that there were no individual materials that deviated from this overall conclusion. It seems that much greater consideration should be given to the possibilities in analytical science of statistical procedures such as robust estimation and functional relationship. Despite the apparent complexities of the iterative methods of estimation, both methods can be easily implemented on a microcomputer. ACKNOWLEDGMENT The author thanks B. D. Ripley for the programs and expert advice. Registry No. N2, 7727-37-9. LITERATURE CITED Kane, P. F. J. Assoc. Off. Anal. Chem. 1084, 67, 869. Youden, W. J.; Steiner, E. H. Statistical Manual of the AOAC; AOAC: Arlington, VA; pp 22, 23. Hampel, F. R.; Ronchetti, E. M.; Rousseeuw, P.J.; Stakel, W. A. Robust Statistics : The Approach Based on Influence Functbns ; Wiley: New York, 1986. Huber, P. J. Robust Statistlcs; Wlley: New York, 1981. Thompson, M. Analyst 1082, 107, 1169. Adcock, R. J. The Analyst (Des M i n e s , Iowa) 1878, 5 , 53. York, D. Can. J. Phys. 1068, 44. 1079. Ripley, 0. D.;Thompson, M. Analyst 1087, 112, 377. Nash, J. C. Compact Numerical Methods for Computers: Linear Algebra and Function Mnimization; Adam Hllger: Bristoi, U.K., 1979; pp 129, 130. Horwitz, W.; Kamps, L. R.; Boyer, K. W. J. Assoc. Off. Anal. Chem. 1080, 63, 1344.
RECEIVED for review August 26,1988. Accepted May 1,1989.