Comparison of conventional and robust regression in analysis of

models can account for theeffects of heteroatoms and ster- eochemistry in a chemical system that has been difficult to model previously. When sophisti...
1 downloads 6 Views 651KB Size
1134

Anal. Chem. 1083, 55, 1134-1138

unstable conformers for each of the compounds are given in the last column in the table. These values were reported by Lippmaa (2). The maximum energy difference is 1.05, while the mean difference is 0.79. For compounds 1-19, which are conformationally locked, the minimum energy difference is 1.65, while the mean difference is 3.78. As expected, the standard errors between the simulated and observed spectra are higher than those found for the reference set. They range from 0.907 to 2.156 ppm. The mean standard error is 1.488 ppm. There is a general relationship between the energy difference and the standard error. For example, the three smallest standard errors (i.e., the best predictions) correspond to three of the four largest energy differences. The question must be answered as to the usefulness of these predicted spectra for analytical purposes. To test this, the ten simulated spectra were searched against the spectral library discussed previously. In each case, the nearest match to the simulated spectrum is the corresponding observed spectrum. The calculated distances reveal the observed spectra to be clear choices. The average squared Euclidean distance between the ten simulated and observed spectra (the nearest matches) is 12.028. The average distance for the second nearest matches is 124.3. Thus, we conclude that even though the standard errors are slightly higher for the conformationally mobile compounds, the simulated spectra still possess analytical utility. The calculated models are judged to be sound.

ences which give rise to substantial differences in chemical shifts. By use of data from different sources, collected under different experimental conditions and measured in different solvents, models can be formed that are completely general for the class of compounds. No prior information is needed other than the chemical structures whose spectra are to be simulated. The simulated spectra are analyticaly useful, even for compounds whose spectra are conformationally averaged. We therefore conclude that spectrum simulation, when applied by using computer-based methodology, can be a practical tool, even for routine analyses. LITERATURE CITED (1) Small, G. W.; Jurs, P. C. Anal. Chem. 1983, 55, 1121-1127. (2) Pehk, T.; Kooskora, H.; Lippmaa, E. Org. Magn. Reson. 1976, 8 , 5-10. (3) Eggert, H.; VanAntwerp, C. L.; Bhacca, N. S.; Djerassi, C. J. Org. Chem. 1976. 41. 71-78. (4) Grover, S. H:; Stothers, J. B. Can. J. Chem. 1974, 52, 870-878. (5) Johnson, L. F.; Jankowski, W. C. “Carbon-13 NMR Spectra”; WileyIntersclence: New York, 1972. (6) Llndeman, L. P.; Adams, J. Q. Anal. Chem. 1971, 4 3 , 1245-1252. (7) Belsey, D. A.; Kuh, E.; Weisch, R. E. “Regression Diagnostics: Identifying Influential Data and Sources of Collinearity”; Wiiey-Intersclence: New York, 1960. (8) Wertz, D. H.; Allinger, N. L. Tetrahedron 1974, 30, 1579-1586. (9) DeiRe, G. J. Chem. Soc. 1958, 4031-4040. ( I O ) Ailen, D. M. Department of Statistics, University of Kentucky, 1971, Technical Report 23. (11) Snee, R. D. Technometrics 1977, 19, 415-428.

RECEIVED for review December 15,1982. Accepted February 7, 1983. This work was supported by the National Science Foundation, under Grant CHE-8202620. The PRIME 750 computer used in this research was purchased with partial financial support of the National Science Foundation. Portions of this paper were presented a t the 9th Annual Meeting, Federation of Analytical Chemistry and Spectroscopy Societies, Philadelphia, PA, Sept 1982.

CONCLUSIONS In this paper, we have shown that computer-based linear models can account for the effects of heteroatoms and stereochemistry in a chemical system that has been difficult to model previously. When sophisticated structural descriptors are used, one can effectively model subtle structural differ-

Comparison of Conventional and Robust Regression in Analysis of Chemical Data Gregory R. Phllllps and Edward M. Eyrlng” Department of ChemMy, Unlverslfy of Utah, Salt Lake C;ty, Utah 84 7 72

Robust regression using Heratlvely rewelghted least squares Is descrlbed and compared wlth tradltlonal least-squares regression. Appllcatlon of both technlques to a collectlon of actual data sets demonstrates that the performance of robust regression equals, and often exceeds, that of conventlonal least-squares regression. The practlcal advantages of robust methods to the chemlst are discussed.

The widespread availability of computers has led to the extensive use of mathematical techniques in the interpretation of chemical data. By far the most commonly used statistical technique is that of least squares (LS) which is based on the assumption of an independent and normal error distribution with constant variance. The term “normal” above refers to a specific probability distribution function, occasionally called Gaussian. In situations of nonconstant variance, LS techniques can still be used; however, weights must be employed 0003-2700/83/0355-1134$01.50/0

so that the weighted observations have constant variance. Garden et al. (1) and Schwartz (2) have considered the problem of nonuniform variance in the construction of calibration curves, while Franke et al. (3) have discussed the optimization of the standard addition method, assuming constant relative error. Detailed calculations for fitting a straight line under conditions of constant relative error have been given by Smith and Mathews ( 4 ) . Compared with the problem of nonconstant variance, chemists have paid less attention to the normality assumption underlying LS methods. Ames and Szonyi (5) have warned of the possibility of drawing incorrect conclusions when this assumption is violated and have proposed the use of nonparametric techniques for testing error distributions. On the other hand, Filliben (6) has emphasized graphical methods. Several studies have verified normal error distributions in chemical data; however, Clancy (7) has examined 250 error distributions based on 50000 chemical analyses and found less than 15% of the distributions can be considered normal for 0 1983 American Chemical Societv

ANALYTICAL. CHEMISTRY, VOL. 55, NO. 7, JUNE 1983

the purpose of applying common statistical techniques. This may be due to inherent properties of the error distribution or to the presence of one or more wild observations. Wernimont (8)has {wggestednonnormal distributions may be due to assignable causes, which if removed would allow the error distribution to lbecome approximately normal. The argument is often made that since least squares is the optimal estimation procedure for normally distributed errors, it should be almost optimal when the errors are approximately normal. These ideas seem to be reflected in the attitude prevalent among chemists that rejection of erratic data points provides sufficient protection against nonnormal error distributions and justifies the automatic use of least-squares procedures. The performance of LS estimators when the assumption of normality is violated has received much attention in the statistical literature recently (9). An estimator is called robust if it is insensitive to mild departures from the underlying assumptions and is only slightly inefficient relative to LS when these assumptions are true. In addition to providing protection against inonnormal error distributions, robust methods are also resistant to the presence of any outliers in the data. Monte Carlo studies have shown the least-squares estimator of location, the mean, l;o be one of the poorest estimators over a variety of error distributions other thlan normal (10). Yet chemists doubt the practical advantages of robust techniques and question wlhether simulation studies used to demonstrate these advantages are realistic. In respolnse to this situation, Rocke et al. (11) have compared traditional and robust estimators of location by using a series of historical data sets from the physical sciences and a collection of modern analytical data. Their results strongly indicate that either severely trimmed (i.e., 1 5 2 5 % ) means or robust estimators are needed for efficient estimationl. A 20% trimmed mean would involve discarding the lowest and highest 20% of the data and then averaging the remaining measurements. Beaton and Tukey (1.2)have extended the concept of robust estimation to rlegression problems, using the fitting of spectroscopic data to illustrate their ideas. They contend the analysis of the rotational spectrum of hydrogen fluoride requires robust methods of regression because errors of measurement may have a nonnormal error distribution, variation in the intensity of spectral lines suggests variable rather than constant error, and the power series used to predict the positions of spectral lines may not faithfully reflect quantum-mechanical perturbations. The robust method suggested by Beaton and Tukey is an iterative aplplication of weighted least squares, where the weights for each iteration are determined by the data themselves. Monte Carlo studies have supported these ideas (13,14) but have not led to modifieations in the use of tradtional regression analysis by chemists. The inssnsitiivity of robust regression to erroneous observations can be particularly advantageous in the construction of calibration curves. For example, suppose the (unknown) calibration equation is signal (Y) = concentration ( X ) and that a calibration run resulted in the data in Table I. A conventional least-squares regression yields the equation Y = 1.26X - 0.48, while robust regression results in the equation Y = 0.92X + 0.19. This latter is much closer to the true relationship (Y’ = X ) and will provide better estimates of unknown concentrations. The residuals listed in Table I show dramatic differiences between the two fits, reflecting the excessive influence wild observations have on least-squares techniques. The present article describes the technique of iteratively reweighted least squares (IRLS) proposed by Beaton and Tukey and compares it with traditional LS regression. Both procedures are applied to a series of chemical data sets and the estimated variances of the two procedures are compared.

1135

Table I. Comparison of Conventional and Robust Regression by Using Hypothetical Data residual concn signala conventional robust 0.32 -0.01 1 1.1 2 2.0 -0.04 -0.03 3 3.1 -0.20 0.15 4 5

3.8

-0.76 0.68

6.5

-0.07

1.71

True relationship, signal = concentration; computed relationship, (conventional regression) signal = -0.48 + (1.26 Concentration) and (robust regression) signal = 0.19 + (0.92 concentration). Observed signal - predicted signal. a

Since small variances allow regression parameters to be more precisely determined, comparison of variances should allow an evaluation of the practical benefits of IRLS to the chemist. The data sets used in this comparison require the estimation of two parameters, making this regression problem similar to that involved in the construction of calibration curves.

THEORY This article concerns itself with linear models, which are adequate to characterize most chemical problems. The concept of IRLS regression can be extended to nonlinear models in a straightforward manner. The term “linear model” implies a linear relationship between the parameters of a model, not necessarily the independent variables. Thus the model y =a

+ bx + c(lOx) + d(1og x) + ex2

(1)

is a linear model, while

y = a exp(-bx) (2) is nonlinear (16).Matrices provide a convenient and compact method for expressing linear models, with the additional advantage that once the regression problem is written and solved in matrix notation the solution can be applied to any problem, regardless of the number of terms in the model. In matrix notation a linear model can be expressed

Y=X@+t (3) where Y = (y,, ...,yn)’ is an n-dimensional vector of observations, p = (&, ...,&)’ a p-dimensional vector of unknown parameters, t := (el, ..., en)’ an n-dimensional vector of errors, and X is an n by p matrix of known constants. Here n is the number of observations, p is the number of parameters, and the slanted prime denotes transpose. The LS lestimator of the unknown parameter vector B, denoted by 0, is

8 = (X’X)-lX’Y

(4)

The covariance matrix

c = uZ(X’X)-l (5) contains the variances (diagonal terms) and covariances (off-diagonal terms) of the estimates of the parameters. The quantity SLs2I- (Y’Y - j3’XIY)/(n - p ) is used as an estimate of the variance, u2. Confidence intervals for the parameter estimates can be computed by - t ( a / 2 ; n-. p)’sL,(BJ, 5 p, I + t ( a / 2 ; n- p ) .

BJ

B1

SLSCBJ(6) In eq 6, PJ is the parameter to be estimated, fjl is its leastsquares estimate, t ( a l 2 ; n- p ) is the upper tail value of the Student’s t disitribution with n - p degrees otfreedom, and &s(fjJ) is the estimated standard deviation of bJ. The square root of the $11 diagonal element of the covariance matrix equals SLS(@. The IRLS technique consists of an iterative application of weighted least squares, with the weights for each iteration

1136

ANALYTICAL CHEMISTRY, VOL. 55, NO. 7, JUNE 1983

I

'

'

I

EXPERIMENTAL SECTION For the comparison of iteratively reweighted and traditional regression, 38 data sets, each containing ten or more observations, were randomly chosen from dissertationsin chemistry or chemical engineering completed at the University of Utah. Data sets from dissertations rather than "typical" data sets in the literature were used in the hope that fewer of the original measurements will have been deleted or adjusted to conform to the overall pattern of the data. To the extent this hope was not met, the advantages of robust procedures will have been underestimated. These data sets represent research in NMR (18), chromatography (19), electrochemistry (20,21),energy transfer (22),and kinetics (23, 24).

0.0

-6

I

-4

-2 0 2 NORMALIZED RESIDUAL

4

6

Figure 1. A graph of the welght given to observations by least-squares (-) and robust regression using the blweight with k = 6 (---) and 9 (- - -). The weight, w , is plotted along the ordinate and the normalized

residual, r , / S , along the abscissa. determined by the data themselves. At each iteration the parameter vector 0 is estimated by

6 = (X'WX)-1X/wy

(7) where W is an n by n matrix with weights wi on the main diagonal and zeros elsewhere; the tilde is used to distinguish IRLS and LS estimators. A frequently used weight function is the biweight of Tukey lril < k S w . = [ l- ( r J k S ) 2 ] z l o IriI > k S where rj is the residual, k is a constant which determines how harshly residuals are treated, and S is a measure of scale. A popular measure of scale is the median of the absolute values of the residuals (16) S = median (Iril) (9) Figure 1shows a plot of wi vs. the normalized residual, ri/S, for the LS and biweight functions with k = 6 and 9 . This figure illustrates the excessive influence wild observations have on LS estimates and the sensitivity of the biweight to observations with small residuals and insensitivity to those with large residuals. As can be seen from Figure 1,a smaller value of k gives observations with large residuals less weight. The estimate of scale, S, given by eq 9 equals 0.67a, hence for k = 6 the biweight ignores those observations more than 4 standard deviations from the fit. As the tuning constant, k , becomes large, IRLS estimation using the biweight approaches ordinary LS estimation. After each iteration, new weights are computed and eq 7 is used to obtain a new estimate of B. Iteration continues until the change in fit between iterations satisfies a specified convergence criterion. The covariance matrix will have the same form as eq 5, but with a2 for the biweight estimated by (17)

Approximate confidence intervals for the IRLS parameters can be computed analogously to those for LS parameters Bj - t ( a / 2 ; n * ) . s B k ( B j5) pj IBj + t(a/2;n*)-SBk(Bj) (11) Because the error distribution is not known, the number of degrees of freedom to be used in the t statistic is not completely resolved. In the calculation of confidence intervals for biweight regression estimates Gross has suggested using n* = [(3n- 1 9 ) / 4 ] for k = 6 and n* = [(8n/7) - 61 for k = 9 (14).

Computations for this article were carried out on a DEC-20 computer with an TOPS-20operating system. A Fortran program coded in single precision performed both LS and IRLS calculations using matrix subroutines from LINPAC (25). For each data set, LS estimates were calculated using the pseudoinverse of the X matrix obtained from a QR algorithm. The calculation of IRLS estimates involves an iterative process and hence requires good starting values for the parameter estimates. Due to their lack of robustness, LS estimates can lead to either divergence or convergence to incorrect results, while a simple correction to the LS estimates provided satisfactorystarting values. This correction consisted of Winsorizing the residuals, r,, from the LS fit (using the scale estimate, S, given in eq 9) rlw =

-1.5s rl

t1.5S

rl < 1.58 lrl I < 1.55' rl > 1.5s

(12)

followed by a least-squares fit of the equation R" = X Q where Rw = (rlw,..., r,")' is the vector of Winsorized residuals, Q = (ql, ...,qp)' is a p-dimensional vector of correction factors, and X is the matrix of known constants. Winsorizing limits the residuals to being no larger (in magnitude) than the quantity 1.5S, which allows outlying observations to be partially, but not entirely, used in the calculation of correction factors. The concept of Winsorization is discussed by Dixon and Tukey (26). The initial values for the parameter estimates used in the IRLS calculation are obtained by adding the corrections, qI,to the LS parameters, o1. Once these initial values were obtained, IRLS estimates were calculated by the following procedure: (1)estimate the scale, (2) compute new weights, ( 3 ) use a weighted least-squares procedure to calculate a new estimate of j3,and ( 4 ) return to (1)for another iteration until the relative change in each parameter was less than 0.001, at which point the asymptotic variance was calculated by using eq 10. Copies of the data sets and the regression program are available from G.R.P.

RESULTS AND DISCUSSION The performance of traditional and robust regression can be evaluated by the variance of parameter estimates, since smaller variances allow the parameters to be more precisely determined. For each data set, IRLS regression has been performed with k equal to 6 and 9 in Tukey's biweight, referred to as the B6 and B9 estimators. As discussed above, the estimated covariance matrix is proportional to the matrix ( X / X ) - l ,the proportionality constant being SLs2for least squares and S B e 2 or S B g 2 for robust estimation. These constants will be used to evaluate the regression procedures. Specifically, let V , be the proportionality constant obtained by applying estimator j to data set i, and let n, be the number of observations in data set i. For purposes of notation, j = 1,2,3 will refer to the LS, B6, and B9 estimators, respectively. In analogy to the work of Andrews et al. (IO)and Rocke et al. (11),the quantity wLl = log n,V,, will be used to compare LS and IRLS regression using the chemical data sets selected for this study. Inclusion of n, in w,] allows comparisons to be made among data sets with different numbers of observations. The variation among wL,values is due to the effects of the data sets, regression procedures, and random errors of measurement, while this article is concerned only with the effect

ANALYTICAL CHEMISTRY, VOL. 55, NO. 7,JUNE 1983

Table 11. Two-way Analysis of Variance of Linear Regression Procedures mean source of sum of degrees of variation squares freedom square mean 146.130 1 146.130 data set 197.690 37 5.343 estimator 7.836 2 3.918 error

10.789

74

F

36.65 26.87

+

at

+ 7 j + Ycj +

IL

2 L

0.146

of the regression method on this variation. The variance within a set of data, such as this one, depending on several effects operating simultaneously can be partitioned among the contributing effects by the statistical technique of analysis of variance (ANOVA). ANOVA is a well-established teehnique, used by chemists in collaborative studies and quality control, the analytical applications of which have been described by Hirsch (27) and Massart et al. (28). The individual response w , ~is modeled by =P

1137

chj

(13)

where i = 1, 2, ..., 38 and j = 1, 2, 3. The parameter p is a general average, cy, is the effect of the ith data set upon the response, 7Jis the effect of the j t h estimator, y V is an interaction term, and cl, is a random error. Interaction occurs when the effect of the data set depends on the particular estimator being used (and vice versa) and significantly complicates the interpretation of ANOVA. Since there is only one observation per combinatioin of i and j , interactions will be assumed to be zero in this work. This assumption was supported by calculation of Tukey’s one degree of freedom for nonadditivity. Additional support resulted from treating the data sets as coming from seven sources (i.e., dissertations) and calculating a (dissertation x estimator) interaction term, which proved to be insignificant. Examination of the T~ values from eq 13 allows conclusions to be drawn about different regression procedures. The effects of the LS, B6,and B9 regression procedures are measured by the parameters rl,r2,and 73, respectively. Table I1 gives the results of analysis of variance, which shlows F = 26.87 to be much larger than the critical F value at a confidence level of 99% (F(O.O1; 2, 74) = 4.86). Hence the LS,B6,and B9 regression procedures are not all equally efficient on the average for the sort of data considered here. The smaller the value of a particular T), the more precisely the corresponding regression technique determines the parameters. The difference between a pair of estimator effects, called a contrast, allows the relative performance of the two estimators to be evaluated; i.e., if r, - T ~ is, less (or greater) than zero, estimator j is more (or less) efficient than estimator j ! By construction of a confidence intervid for a contrast, both the value and reliability of the ANOVA estimates are taken into account in comparing estimators. For differences between pairs of estimator effects, confidence intervals based on the differences wlJ were constructed at an individual confidence level of 99% -0.89 I 7 2 - 71 I -0.31 wLJI

-0.79 I7 3 - 71 I -0.21 -0.15 -< 7 2 - 7 3 I -0.04 These confidence intervals clearly indicate the superior performance of both the R6 and B9 robust estimators to traditional LS regression. The two robust estimators are very similar, with the B6 estimator slightly superior to the B9;the particular data being analyzed perhaps determining their relative performances. A less exact but more graphic illustration of the relative advantages of the robust B6 and B9 estimators compared to the conventional LS estimator can be obtained from plots of

-2

-4

-2

0 LEAST SQUARES

2

4

Flgure 2. A graph of w12vs. w I 1 . The quantity w12is plotted on the ordinate, while wI1 is plotted on the abscissa. The diagonal line is the locus of points w12= w I 1 . Depending on whether the point ( w I 1 ,w i 2 ) lies below, on, or above thls Ilne, the efficiency of B6 is greater than, equal to, or less than that of LS.

t 2t

-4

-4

-2

0

2

4

LEAST SQUARES

Flgure 3. A graph of w13 vs. w I 1 . The quantity w i 3 is plotted on the ordinate, while w I 1is plotted on the abscissa. The diagonal line is the wig) locus of points ~ 0 =, ~wI1. Depending upon whether the point (oI1, lies below, on, or above this line, the efficiency of B9 is greater than, equal to, or less than that of LS.

wh2and wi3 vs. wil, as in Figures 2 and 3. The diagonal lines wL2= oil and wh3= wil are drawn in these figures to facilitate

comparisons of estimators; points lying below these lines correspond to data sets for which the robust estimator is more efficient than the LS estimator. Figures 2 and 3 demonstrate that for most data sets robust regression appears superior to LS regression; while in those few instances in which this situation appears reversed, the performance of robust regression appears only slightly poorer than that of LS regression. Were allowance for sampling variation to be made, all the results might well be compatible with better performance of the robust regression. Robust estimation will be inferior to LS regression when the data set being analyzed satisfies the criteria for LS estimation, described in the introduction to this article. This poorer performance under ideal conditions is the “premium” paid for the extra protection robust procedures supply against deviations from normality or erratic observations. Figure 4 is a similar graph of wL2vs. wt3, indicating that B6 and B9 regression perform similarly for the data considered in this study.

CONCLUSIONS Techniques based on the principle of least squares are the optimal estimation procedure for the analysis of data possessing a normal error distribution but perform very poorly in situations involving a nonnormal error distribution. Robust

1138

ANALYTICAL CHEMISTRY, VOL. 55, NO. 7, JUNE 1983

ACKNOWLEDGMENT The authors are grateful to J. M. Harris for providing a sorting routine used in the calculation of scale estimates and for helpful discussions.

4

2

LITERATURE CITED

+ Garden, J.

L1 w

-$

S.;Mitchell, D. G.; Mills, W. N. Anal. Chem.

1980, 5 2 ,

2310-23 15.

0

Schwartz, L. M. Anal. Chem. 1979, 51, 723-727. Franke, J. P.; de Zeeus, R. A.; Hakkert, R. Anal. Chem. 1978, 50,

ID

1374-1380. -2

-4

-4

-2

0 (9)BlWElGHT

2

4

Flgure 4. A graph of w i 2 vs. The quantity oi2is plotted on the ordinate, whlle w i 3 is plotted on the abscissa. The diagonal line is the locus of points wi2 = q3.Depending upon whether the point (wI3,q2) lies below, on, or above this line, the efficiency of B6 is greater than, equal to, or less than that of B9.

estimation is a complementary technique which is insensitive to large deviations in a small number of data points and is relatively efficient over a broad range of error distributions. The performance of robust regression has been shown to be equal to, and often better than, that of conventional leastsquares regression for a collection of chemical data sets. A major advantage of robust regression is the automatic editing of data in regression analysis. This has been highly successful in the analysis of trajectory data (29) and promises to be invaluable to industrial and clinical laboratories processing large numbers of chemical analyses on a routine basis. Measurements which have been assigned small biweights by the iteratively reweighted least-squares regression have been marked for special attention. This attention includes examination of the appropriateness of the model being fitted as well as the possibility of erroneous data points. Least-squares regression is superior in the ideal case of normally distributed errors but can result in erroneous conclusions if used in a nonideal situation. Robust regression more closely reflects behavior in the real world, recognizing that even in careful work the distribution of errors is not always ideal. Robust methods are designed for use with various error distributions which can arise in practice, as well as normal data contaminated with wild observations. In this latter sense, Tukey's biweight is particularly attractive. The amount of influence a point receives is greatest when that point conforms to the other observations, decreases as it becomes less characteristic, and eventually becomes zero.

Smith, E. D.; Mathews, D. M. J. Chem. Educ. 1967, 44, 775-759. Ames, A. E.; Szonyi, G. I n "Chemometrics: Theory and Applications"; Kowaiskl, B. R., Ed.; American Chemical Society: Washington, DC, 1977;ACS Symposium Series, No. 52, pp 219-242. Fliliben, J. J. In "Validation of the Measurement Process"; De Voe, J. R., Ed.; American Chemical Society: Washington, DC, 1977; ACS Symposium Series, No. 63;pp 30-113. Clancy, V. J. Nature (London) 1947, 159, 339-340. Wernimont, G. Anal. Chem. 1949, 21, 115-120. Huber, P. J. "Robust Statistics"; Wlley: New York, 1981. Andrews, D. F.; Bickel, P. J.; Hampel, F. R.; Huber, P. J.; Rogers, W. H.; Tukey, J. W. "Robust Estimates of Location, Survey and Advances"; Princeton University Press: Princeton, NJ, 1972. Rocke, D. M.; Downs, G. W.; Rocke, A. J. Technometrics 1982, 2 4 ,

95-101

I

Beaton, A. E.; Tukey, J. W. Technometrics 1974, 16, 147-185. Ramsay, J. 0.J. Am. Stat. Assoc. 1977, 72, 606-1315. Gross, A. M. J. Am. Stat. Assoc. 1977, 72, 341-354. Deming, S. N.; Morgan, S. L. Clln. Chem. ( Winston-Salem, N.C.) 1979, 2 5 , 840. Mosteiler, F.; Tukey, J. W. "Data Analysis and Statistics": AddlsonWesley: Reading, MA, 1977;p 358. Mosteller, F.; Tukey, J. W. "Data Analysis and Statistics"; AddlsonWesley: Reading, MA, 1977;p 208. Brown, T. D. Ph.D. Dissertation, The University of Utah, Salt Lake City, UT, 1966. Kesner, L. F. Ph.D. Dissertation, The University of Utah, Salt Lake City, UT, 1974. Pemberton, J. P. Ph.D. Dissertation, The University of Utah, Salt Lake City, UT, 1971. Wang, D. N. M.S. Dissertation, The University of Utah, Salt Lake City, UT, 1972. Nogar, N. S.Ph.D. Dissertation, The University of Utah, Salt Lake City, UT, 1976. Ramadorai, G. Ph.D. Dissertation, The University of Utah, Salt Lake City, UT, 1971. Smith, P. P. Ph.D. Dissertation, The University of Utah, Salt Lake City, UT. - , 1976. ----

Dongarra, J. J.; Moler, C. 8.; Bunch, J. R.; Stewart, G. W. "Linpack User's Guide"; Society for Industrlal and Applied Mathematics: Philadelphia, PA, 1979. Dixon, W. J.; Tukey, J. W. Technometrics 1968, 10, 83-98. Hirsch, R. F. Anal. Chem. 1977, 49, 691A-700A. Massart, D. L.;Dijkstra, A,; Kaufman, L. "Evaluation and Optimization of Laboratory Methods and Analytical Procedures"; Elsevier: New York, 1978;Chapter 4. Agee, W. S.;Turner, R. H. I n "Robustness I n Statistics"; Launer, R. L., Wilkinson, G. N., Eds.; Academic Press: New York, 1979; pp

107-126.

RECEIVED for review September 17,1982. Accepted February 16,1983. Financial support of this work by a contract from the Department of Energy (Office of Basic Energy Sciences) is gratefully acknowledged.