Calibration of Analytical Instruments. Impact of Nonconstant Variance

The use of ordinary least squares, in the case where the variance of the calibration data is not constant, results in inaccurate assessment of the int...
0 downloads 4 Views 44KB Size
Anal. Chem. 2000, 72, 4762-4765

Calibration of Analytical Instruments. Impact of Nonconstant Variance in Calibration Data S. N. Ketkar* and T. J. Bzik

Air Products and Chemicals, Inc., 7201 Hamilton Boulevard, Allentown, Pennsylvania 18195

Failure to appropriately account for the nonconstant variance in the calibration data often leads to unreasonable estimates of calibration parameters. In this paper, we contrast the normal regression-based approach (ordinary least squares) as applied to trace metals data collected via inductively coupled plasma mass spectrometry with a modified least-squares approach which explicitly uses the variance of the calibration data (weighted least squares). The use of ordinary least squares, in the case where the variance of the calibration data is not constant, results in inaccurate assessment of the intercept. The impact of this is to provide inaccurate results, at low concentration values. This is precisely the region of the greatest interest. In contrast, the use of a weighted least-squares approach results in the accurate determination of the intercept, thus providing accurate results at low concentration values. Method detection limits can also be calculated based on the two regression approaches. The weighted least-squares approach generally results in lower detection limits. Recovery experiments performed using a spiked blank indicate that use of the weighted approach is indeed representative of the performance of the instrument used to generate the calibration data. In the field of semiconductor fabrication, the process needs are driving the specifications for ultrapure chemicals and gases to lower values. With the decreasing tolerance of the industry to contaminants, the specifications are increasingly moving closer to the detection limits of the analytical methods that are being employed. The detection limit is generally viewed as the lowest concentration that will produce a response statistically different from a blank. The detection limits are calculated based on the calibration model used to convert the measured instrument response to an inferred concentration. Due to the need to quantify contamination levels at very low concentration, it is important to ensure that the correct calibration model, dictated by the nature of the data at hand, has been used. A regression-based approach is applied to trace metals data collected using inductively coupled plasma mass spectrometry (ICPMS). Two regression-based means, ordinary least squares (OLS) and weighted least squares (WLS), are applied. OLS analysis implicitly assumes that signal variability is the same everywhere within the calibration window. WLS analysis allows signal variability to vary over the calibration window, but requires an appropriate set of weights to complete the analysis. Most 4762 Analytical Chemistry, Vol. 72, No. 19, October 1, 2000

current calibration or regression software, whether on the instrument or based on a post instrument analysis uses OLS as the default or may be the only algorithm available. These two approaches to calibration are compared with each other in this work. Failure to use appropriate curve-fitting methodology in calibration, even when using the correct mathematical form of the calibration model can lead to entire regions of substandard performance by the calibration model. EXPERIMENTAL METHODOLOGY Calibration data were collected on 31 trace elements [Li 7, Be 9, Na 23, Mg 24, Al 27, K 39, Ca (43, 44), V 51, Cr (52, 53), Mn 55, Fe (56, 57), Ni (58, 60) Co 59, Cu (63, 65), Zn (64, 66), Ga 69, As 75, Se (77,82), Rb 85, Sr 88, Mo (95, 98), Ag (107, 109), Cd (111, 114), In 115, Sb (121, 123), Cs 133, Ba 138, Tl (203, 205), Pb 208, Bi 209, U 238] on a VG Elemental PQ2 ICPMS. A dozen elements were measured at two ions each (Ca, Cr, Fe, Ni, Cu, Zn, Se, Mo, Ag, Cd, Sb, Tl). 1, 2, 5, 10, and 20 ppb standards were made for all elements investigated, in a 2% HNO3 matrix, from a 1 ppm stock solution. The data consisted of triplicate determinations at five levels of standard, and the blank recoveries on a spiked blank, spiked with 0.1 ppb of the multielement standard, were used as a check on the reliability of the two approaches to calibration, WLS and OLS, to accurately represent the performance of the instrument at low concentration values. REGRESSION ANALYSIS Regression-based calculation of calibration is described elsewhere in detail.1 The ICPMS data investigated obey a linear model, which is the only model considered herein. The least-squares approach is used to obtain the calibration curve. For the case of ordinary least squares, the sum of the prediction errors n ∑i)1 (Yi - Pred(Yi))2 is minimized. For the case of a weighted least squares, the sum of the weighted prediction errors n ∑i)1 wi(Yi - Pred(Yi))2, with wi as the weight, is minimized. Typically wi ) 1/si2, where si is the standard deviation of the ith observation. These two least-squares methods do not provide identical calibration models even when the model form and data used in the analyses are kept identical. Major differences can result from the use of differential error criteria in regression analysis. Consideration of which error criterion, if either, is appropriate is required. (1) Draper, N. R.; Smith, H. Applied Regression Analysis, 3rd ed.; John Wiley and Sons: New York, 1998; pp 223-229. 10.1021/ac000018s CCC: $19.00

© 2000 American Chemical Society Published on Web 09/06/2000

Figure 1. Sr 88 calibration data with linear model by OLS analysis. Table 1. Count Variability by Concentration for Sr 88 ppb

average count

std dev count

0 1 2 5 10 20

118 17090 33916 87766 171333 338667

34 186 291 700 1155 1528

USE OF OLS OR WLS? The determining factor in choosing whether the OLS or WLS analysis is more appropriate is the nature of the data itself. When the error in making a measurement of the signal is about the same magnitude (absolute size, not relative) over the entire calibration window, then OLS estimation is appropriate. If the error in making a measurement is a function of the level of standard in the calibration window, then WLS estimation is appropriate. In our experience with ICPMS data, even when calibration is over a relatively narrow range, WLS estimation is required. Even when needed strongly, the standard simple graphical analysis typically performed on calibration data will fail to show that an OLS analysis is inappropriate, as shown in Figure 1. Everything looks reasonable at this level of magnification; the fitted line seems to track the points well. Yet, if a simple table of concentration versus count standard deviation is constructed, it is apparent that counting variation is strongly a function of concentration. This is shown in Table 1. As illustrated, potentially very different reproducibility at different concentrations can hide within the scale of typical calibration graphs. The traditional graphically oriented method for detecting heterogeneity of variability in a regression analysis is a residual plot. Another means of identifying the need for WLS-based calibration can be argued, in part, by noting that an ICPMS provides count data. If the process that generates counts distributes “counts” randomly in the sample, a reasonable model for the variability from this effect alone is that of a Poisson distribution. The variance of a Poisson distribution is specifically related to its average; i.e., the standard deviation is the square root of the average. Unless another source of variability dominates this one, or the Poisson assumption for this variability source is incorrect, this alone suggests the need for a WLS analysis.

Figure 2. Variability of triplicate determinations versus average count level.

Figure 2 shows summary statistics from each set of triplicate determinations for all levels of the standards and the blank for all trace metals investigated. Variability (expressed in terms of standard deviation) is a function of the average number of counts. The straight line is an approximate estimate of the observed pattern for the graphed data. This level of variability exceeds that which would be associated with the hypothesis of all the variation coming from a Poisson distribution (the curve labeled xMEAN) over almost the entire range, often by a substantial margin. However, when the observed count is ∼10 000 or less, the Poisson model provides a fair approximation for the measurement uncertainty. The count data in Figure 2 were adjusted proportionally from the observed raw counts through use of a rhodium internal standard. The same patterns/conclusions identified from this analysis were found when the raw count data were subjected to analysis. Additionally, it was found that, on average, use of an internal standard in data quantification improved the reliability of the estimated concentrations for our system. The complexity of the calibration model required was checked for each trace metal by statistical analysis of the data via regression analysis. It was found that a linear model was sufficient over the calibration window investigated. While there is a theoretical basis to already anticipate that the WLS approach is more appropriate than the OLS approach, it is instructive to contrast what happens with application of each approach (Figures 3-6). Figures 3 (estimated intercepts) and 4 (estimated slopes) illustrate that use of different fitting algorithms for a linear relationship can provide very different model parameters for the same data. Note the order of magnitude differences in estimated intercept for some trace metals (Figure 3). Hence, it is critical that one selects the appropriate least-squares algorithm correctly. The WLS methodology typically provided estimated intercepts that were much closer to zero than did the OLS methodology, especially for the higher m/z results. For some of the low-m/z elements, both the OLS and the WLS analysis result in large intercept values. This should be expected since for elements such as Na, K, Ca, Cr, Fe, and Se there is a large instrument background present in the ICPMS. However, it should be noted (as shown in Figure 5) that even in these cases the WLS analysis results in a much lower standard error in the determination of Analytical Chemistry, Vol. 72, No. 19, October 1, 2000

4763

Figure 3. Estimated intercept: OLS vs WLS. Figure 6. Standard error slope: OLS vs WLS.

Figure 4. Estimated slope: OLS vs WLS.

Figure 5. Standard error intercept: OLS vs WLS.

the intercepts. Additionally, the OLS methodology provided a disturbingly large number of negative intercept estimates, once again especially with the higher m/z results. From Figure 4 it is apparent that there are typically only small differences in the estimated slopes of the OLS and WLS analyses. The large differences observed in the estimated intercepts imply that the greatest relative discrepancies in the OLS versus WLS calibration methodologies will be found at the lowest levels of concentration. 4764 Analytical Chemistry, Vol. 72, No. 19, October 1, 2000

Figures 5 (standard error intercept) and 6 (standard error slope) provide visual summaries of the standard error (se) with which each of the respective calibration model parameters in Figures 3 and 4 were obtained. These figures provide a direct statistical assessment as to the better estimation methodology; the methodology with the smaller standard errors will be preferred. The WLS analysis consistently provided much more reliable estimates of the intercept than did the OLS analysis. For the most part, the WLS analysis also provided similar to moderately better quality slope estimates than did the OLS analysis. This again strongly reinforces the point that the main area of difference between the reliability of these two estimation methodologies for these data is at very low concentration levels. Our experience with trace metals data via ICPMS have led us to the general conclusion that, for such data, the WLS methodology is generally more appropriate than the OLS methodology. If an analyst did not realize that an OLS analysis was inappropriate, they would have a tendency to be lead to two conclusions by the very large relative errors in the estimated intercepts. The first is simply that the intercepts are relatively poorly estimated. This conclusion is correct. The second is that most of the OLS intercept estimates were not statistically significantly different from 0; hence, one might be inclined to conclude that a no-intercept calibration would be acceptable. This conclusion cannot be made from the OLS analyses; use of an inappropriate error criterion corrupts not only the calibration model coefficient estimates but, in addition, the very judgment of statistical significance itself. Figure 7 illustrates the average percentage bias resulting from the measurement of a 0.1 ppb spiked blank in triplicate treated as a sample using both the OLS and WLS calibration models. Only trace metals with an estimated method detection limit, by WLS analysis, of 0.05 ppb or less are shown in Figure 7. The detection limit computation used a regression-based methodology using a 3 standard deviation probability level in the definition.2,3 The (2) SEMI International. C10-0299 Guide for Determination of Method Detection Limits, SEMI International Standards 1998: Book of SEMI Standards, Process Chemicals Volume, San Jose, CA, 1999. (3) Bzik, T. J.; Smudde, G. H., Jr.; Zatko, D. A.; Martinez de Piniellos, J. V. Limit of Detection. In Specialty Gas Analysis - A Practical Guidebook; Hogan, J. D., Ed.; John Wiley and Sons: New York, 1997; Chapter 8.

(in either direction) often exceeded 100%. The WLS analysis provided a much smaller average percentage bias. Once again, the issue of the use of OLS versus WLS is a fundamental calibration issue. With typical ICPMS data, OLS calibration model estimation provides a corrupt calibration curve exactly where we typically want it to be most reliable, on the low end! This reinforces the need for use of a WLS-based analysis. A software package to obtain and rapidly compare both OLS and WLS analyses for linear calibration as well as provide method detection limit estimation is available for free download from the worldwide web.4

Figure 7. Percent bias in estimated concentration at 0.1 ppb: OLS vs WLS.

underlying triplicate measurements used in obtaining the averages shown in Figure 7 tended to be tightly clustered relative to their departure from the “true” trace level, especially for the OLS calibration. For the OLS analysis, negative estimates of concentration were frequently obtained and the average percentage bias (4) The URL to download the MDL estimation software is as follows: http://www.airproducts.com/gases/mdl-estimator/.

CONCLUSION Regression analysis methods, when appropriately employed, provide a relatively robust and flexible methodology. For the ICPMS data analyzed herein, it was demonstrated that very different calibrations could be obtained from the same data depending on whether the calibration model was estimated via OLS or WLS. The OLS models were corrupt in their fits, especially at low concentrations. In our experience with ICPMS data, calibration should be fit using WLS rather than the commonly employed OLS fitting algorithm. The use of biased calibrations from the forced and common use of OLS can serve to greatly mask or exaggerate the presence of low levels of trace contaminants. AC000018S

Analytical Chemistry, Vol. 72, No. 19, October 1, 2000

4765