A primer on multivariate calibration - American Chemical Society

ple. calibration methods have been used in conjunction with ultrasonic measure- ments to predict the gestational age of human fetuses (i), and calibra...
1 downloads 0 Views 8MB Size
Primer on ------------------

* I I

I I I I

F

or centuries the practice of calibration has been widespread throughout science and engineering. The modern application of calibration procedures is very diverse. For example, calibration methods have been used in conjunction with ultrasonic measurements to predict the gestational age of human fetuses (I), and calibrated bicycle wheels have been used to measure marathon courses (2). Within analytical chemistry and related areas, the field of calibration has evolved into a discipline of its own. In analytical chemistry, calibration is the procedure that relates instrumental measurements to an analyte of interest. In this context, calibration is one of the key steps associated with the analyses of many industrial, environmental,and biological materials. Increased capabilities resulting from advances in instrumentation and computing have stimulated the develop ment of numerous calibration methods. These new methods have helped to broaden the use of analytical techniques

Edward V. Thomas Sandia National Laboratories 0003-2700/94/0366-795A/$04.50/0

0 1994 American Chemical Society

dent and inherently accurate assay (e.g., Calibration methods wet chemistry). Together, the instrumen: tal measurements and results from the inallow one to yelate dependent assays are used to construct a model (e.g., estimate a and b) that relates instrumental the analyte level to the instrumental meaThis model is then used to premeasuyements to surements. dict the analyte levels associated with fusamples based solely on the instruanalytes of interest in ture mental measurements. , In the past, data acquisition and analyindustrial, sis were often time-consuming,tedious activities in analytical laboratories. The adenvironmental, and biological materials 1

I

I I I

i

I I

I

I

-

(especially those that are spectroscopic in nature) for increasingly difficult problems. In the simplest situations, models such as y = a + x b have been used to express the relationship between a single measurement ( y ) from an instrument (e.g., absorbance of a dilute solution at a single wavelength) and the level (x) of the analyte of interest. Typically, instrumental measurements are obtained from specimens in which the amount (or level) of the analyte has been determined by some indepen-

I

I

Analytical Chemistry, Vol. 66, No. 15, August I, 1994 79s A

vent of high-speed digital computers has greatly increased data acquisition and analysis capabilities and has provided the analytical chemist with opportunities to use many measurements-perhaps hundreds-for calibrating an instrument (e.g., absorbances at multiple wave lengths). To take advantage of this technology, however, new methods (i.e., multivariate calibration methods) were needed for analyzing and modeling the experimental data. The purpose of this Report is to introduce several evolving multivariate calibration methods and to present some important issues regarding their use.

Beer’s law can be applied. In other situations, the model may be more complex and lack a straightforward theoretical basis. In general, this step is the most timeconsuming and expensive part of the overall calibration procedure because it involves the preparation of reference samples and modeling. Next, the indirect instrumental measurement of a new specimen (in combination with the model developed in the calibration step) is used to predict its associated analyte level. This prediction step is illustrated in Figure 1,which shows Sb d e termination by AAS.Usually, this step is repeated many times with new specimens using the model developed in the calibration step. Even in the simplest case of univariate calibration-when there is a linear relationship between the analyte level &) and instrumental measurement (y)-modeling can be done in different ways. In one approach, often referred to as the classical method, the implied statistical model is

Eosing the calibration s$. The estimate of b, can be expressed as b, = (xTx)-’xTy, where x = (x,, x,, .. .,xJT and y = (y,, y2, . . .,Y,)~. In this article, the “hat” symbol over a quantity is used to denote an estimate (or prediction) of that quantity. The predicted analyte level ayociated with a new specimen is*; = y*/ b,, where y* is the observed measurement associated with the new specimen. In another approach, often referred to as the inverse method, the implied statistical model is

xi

=

b,

yi + ei

(2)

where e, is assumed to be the measure ment error associated with the reference value xi. In the calibration step, the model To understand the evolution of multivariparameter, b,, is estimated by leastate calibration methods, it is useful to resquares regression of the reference values view univariate calibration methods and t n the instrument measurements (Le., their limitations. In general, these meth6 2 = (yTy)-’ yTx). In the prediction step, ods involve the use of a single measurex = 6, y*. In general, predictions obment from an instrument such as a spectained by the classical and inverse methtrometer for the determination of an anaods will be different. However, in many lyte. This indirect measurement can have cases, these differences will not be imsignificant advantages over gravimetric yi = b, *xi + ei portant. In the literature, there has been an or other direct measurements. Foremost where xi and yi are the analyte level and in- ongoing debate about which method is among these advantages is the reduction preferred (3,4).When calibrating with a in sample preparation (e.g., chemical sepa- strument measurement associated with single measurement, the inverse method the ith of n specimens in the calibration ration) that is often required with the use may be preferred if the instrumental meaof direct methods. Thus, indirect meth- set. The measurement error associated with yi is represented by ei.To simplify this surements are precise (e.g., as in near-IR ods, which can be rapid and inexpensive, spectroscopy). have replaced a number of direct meth- discussion, an intercept is not included in The point of this discussion isn’t to recEquation 1.In the calibration step, the ods. ommend one method over another; it is to model parameter, b,, is usually estimated The role of calibration in these analyby least-squares regression of the instru- show that, even in this relatively simple ses is embodied in a two-step procedure: situation, different approaches exist. Howcalibration and prediction. In the calibra- ment measurements on the reference values associated with the specimens com- ever, the breadth of the applicability of tion step, indirect instrumental measure these univariate methods is limited. ments are obtained from specimens in For example, let us reconsider the dewhich the amount of the analyte of intertermination of Sb concentration by AAS. est has been determined by an inherently Suppose the specimens to be analyzed accurate independent assay. The set of incontain Pb. It is well known that Pb has a strumental measurements and results strongly absorbing spectral line at from the independent assays, collectively 217.0 nm, which is quite close to the prireferred to as the calibration set or trainmary Sb line at 217.6 nm ( 5 ) .There are ing set, is used to construct a model that important ramifications of this fact. If an relates the amount of analyte to the instruanalyst fails to recognize the presence of mental measurements. .Pb, the application of univariate calibraFor example, in determining Sb contion using the 217.6-nm line can result in centration by atomic absorption spectrosinaccurate predictions for Sb because of copy (AAS), the absorbances of a number the additional absorbance attributable to of solutions (withknown concentrations Figure 1. Prediction of the Sb Pb (see Figure 2). If Pb is recognized as a of Sb) are measured at a strong absorbing concentration of a new specimen. possible interference, the usual apline of elemental Sb (e.g., 217.6 nm). A The calibration model (solid line derived from proach is to move to a less intense specmodel relating absorbance and Sb concen- the calibration set [dots]) relating the absorbance at 217.6 nm to the Sb concentration tral line for Sb (e.g., 231.2 nm); however, tration is generated. In this case, model and the absorbance of the new specimen are one can expect a poorer detection limit development is straightforward,because used for prediction. Univariate calibration

796 A Analytical Chemistry, Vol. 66, No. 15, August 1, 1994

A*

8

0.05

s 0.04

5 0.03

9 0.02 - -1 0 0

1 2 3 4 Concentration of Sb (ppm)

Figure 2. Effect of the presence of Pb on predicting Sb concentration.

and degraded analytical precision. The preceding example exposes a fundamental weakness of univariate methods: In the absence of an effective method for separating the analyte from the interferences, there is a need for measurements that are highly selective for the analyte of interest. Without a selective measurement, univariate methods may produce unreliable predictions. Furthermore, on the basis of a single measurement it is impossible to detect the presence of unknown interferences or to know when predictions are unreliable. To apply univariate methods successfully, an analyst will often need a great deal of specific knowledge about the chemical system to be analyzed. Furthermore, unless the system is relatively simple (or unless the analyst has substantial knowledge of the subject matter), a selective measurement with an appropriate level of sensitivity will usually be hard to find. On the positive side, univariate calibration can often be used with reasonable success in applications where selective measurements can be found (as in AAS) or when the analyte can be effectively separated from interferences. In such cases, the simplicity of a univariate method offers a significant advantage. Even in these ideal settings, however, care is needed to maintain the reliability ,of predictions based on univariate calibration. Given the data-rich environment in modem laboratories,numerous selective measurements might be available for analysis. To use univariate methods, it is necessary to specify a single measurement or condense the multiple measurements to a single entity, such as peak area. For example, suppose IR absorption spectros-

copy is chosen to determine trace levels of water vapor by using the spectral region displayed in Figure 3. Furthermore, s u p pose that the water vapor is the only absorbing species in the optical path in this spectral region. A common univariate a p proach would be to select the single wavelength that exhibits the strongest signal for the analyte, in this case at about 2595 nm. This procedure might form the basis for a usable prediction model. However, as we shall see in the next section, the use of measurements from many wavelengths (in conjunction with multivariate calibration methods) can provide more precise predictions. In the event that selective measurements are not available (which is frequently the case), univariate methods will not be reliable. For example, in the analysis of multicomponentbiological materials by near-IR spectroscopy, the spectral responses of the components frequently overlap and selective measurements for the analyte of interest are unavailable. Modern methods need to be reliable, rapid, and precise-even for difficult applications such as quality control, process monitoring, environmental monitoring, and medical diagnosis, which are important in the chemical, pharmaceutical, oil, microelectronics,and medical industries. The nature of these applications,which frequently requires in situ, noninvasive, or nondestructive analyses, precludes the use of tedious sample preparation to obtain highly selective measurements or separation. The result is that the materials to be analyzed by analytical instruments are often quite complex and involve a very large number of chemical components, some of which may be unknown. The combination of complex materials and the need for rapid, reliable, accurate, and precise determinations has motivated researchers to develop and use multivariate calibration methods. Multivariate Calibration

In the agricultural and food industries multivariate calibration methods are used with spectral data to determine protein in wheat (6),water in meat (7),and fat in fish (7). In the manufacturing industries, multivariate calibration methods are often used in process monitoring applications, including the fabrication of semicon-

ductor devices (8).In medical applications, significantdevelopments are being made in producing reagentless and noninvasive instruments for analyzing blood components (9-11).Additional examples have been reviewed by Brown et al. (12). As is true with univariate calibration, multivariate calibration consists of the calibration step and the prediction step. In the calibration step, multiple instrumental measurements are obtained from numerous specimens. These measurements could be the absorbances of each specimen at each of a number of wavelengths. As with univariate calibration, the level of the analyte in each specimen is determined by independent assay. By using multiple measurements it is sometimes possible to determine multiple components of interest simultaneously. Often, however, there is only a single analyte of interest. The multivariate instrumental measurements and results from the independent assays form the calibration set and are used to model the level of the analyte. In the prediction step, the model and the multivariate instrumental measurements of new specimens are used to predict the analyte levels associated with the new specimens. Often, predicted analyte values for each new specimen (k) are o b tained by evaluating a particular linear combination of the available instrumental measurements (y l , y2, . . .,y,) , that is A

n

=

a, t a,*y,

+ a2*y2+ ... + a;Yq

(3)

Individual calibration methods differ in the values of the coefficients (ao,a,, . . .,a,) used to form 2. More fundamentally,the method used (in the calibration step) to obtain the coefficients in Equation 3 distinguishes the different calibration methods. In some methods (e.g., multiple linear regression, MLR), the number of instrumental measurements (q) that can be used on the right-hand side of Equation 3 is constrained to be no greater than the number of specimens in the calibration set (n).In other methods (e.g., principal components regression, PCR, and partial least squares, PLS), the number of instrumental measurements is unrestricted. The evolution of multivariate calibration methods has been motivated by the

Analytical Chemistry, Vol. 66, No. 15, August 1, 1994 797 A

continuing desire to solve increasingly difficult problems. Many advances have occurred in conjunction with the use of spectral methods. Therefore, although the methods can be applied to areas outside spectroscopy,it is convenient to describe them in this context. Methods such as PCR and PLS, in which an unlimited number of measurements may be used, are often referred to as full-spectrum methods. In the rest of this section, some of the more common calibration methods (all producing predictions from linear combinations of instrumental measurements, as in Equation 3) will be described and illustrated by spectral data analysis. Strengths and weaknesses of each method will be discussed, and the breadth of application for each will be emphasized. Classical leastsqzcaresmethod. CLS is based on an explicit causal (or hard) model that relates the instrumental measurements to the level of the analyte of interest (and often the levels of interfering components) via a well-understood mechanism (e.g, Beer’s law). In the simplest case, CLS uses a linear model (an extension of Equation 1)that relates a single an-

alyte to q instrumental measurements that are selective for that analyte. For example, for the ith specimen, yij = bj *xi + eij, for j

1, 2,

.. ., q

(4)

where eijis the measurement error associated with thejth measurement on the ith specimen,xj.As in Equation 1, note that an intercept term is not included in this model for the sake of simplicity.The a p propriate method for estimating the model parameters, b = (b,, b,, . . .,bq)T, in the calibration step depends somewhat on the nature of the measurement errors. In the prediction step, a linear combination of the measurements from a new specimen in conj2nctitn Y t h estiAmatedmodel parameters, b = (b,, b,, . . ., bq)T, is used to predict the analyte level. If the errors across measurements are independent, it is appropriate to express the predicted analyte level as

3 = a l * y l + a 2 * y 2+ . . . + a;yq

(5)

where aj = ij/hTh. In essence, the predicted value provided by Equation 5 is the least-squares estimate of the slope of the relationship,:hrough the origin, among the various (bj,yj) pairs. That is, the predicted valuf is obtained by regressing the yj’s on the bj’s. To illustrate this graphically,let’s revisit the example associated with Figure 3a. The spectrum displayed in Figure 3a represents the estimated absorbance spectrum of water inAthzgas ph,ase at 1ppm concentration (bl, b2,. .., bq) that was o b tained from the calibration step. Suppose we Ash to predict the water concentration of a new specimen that exhibits the absorbance spectrum (yl,y2, . ..,yJ illustrated in Figure 3b. The prediction of the water concentration of this new speci%en can be visualized by plotting the ( bj, yj) pairs (Figure 4). The predicted value of the water concentration of this new specimen is 3 = OA457ppm, or the estimated slope of the (bj, yj) relationship. The reliability and precision of this prediction can be assessed by examining the nature and magnitude ofAthescatter in the relationship among the ( bj,yj) pairs. In this case, strong linear relationship among the ( bj,yj) pairs and the constant e l of scatter throughout the range of the b{s indkate a reliable prediction. If many of the ( bj,yj) pairs had deviated signifi-

pe

Figure 3. Absorbance spectra of (a) water vapor at 1 ppm concentration (estimated) and (b) a new specimen.

=

798 A Analytical Chemistry, Vol. 66, No. 15, August l , 1994

cantly from the typical relationship among the pairs, one would suspect that an unaccounted interfering species or some other unexpected phenomenon influenced the spectrum of the new specimen. Hence, the selectivity of the measurements and the reliability of the prediction would be questioned. In the case of univariaLe calibration, there would be only one ( bj, yj) pair and hence no ability to discover unusual behavior. Thus, one important advantage of using multivariate versus univariate methods is the ability to assess the reliability of predictions and identify outliers. For our purposes, an outlier is a specimen (a member of the calibration set or a new specimen) that exhibits some form of discordance with the bulk of the specimens in the calibration set. In the calibration set, an outlier specimen could result from an unusually large error in a reference determination. During prediction, a new specimen would be considered an outlier if it contained a chemical component (which affects the instrumental measurements) that was not present in the specimens composing the calibration set. The consequences of failing to detect an outlier differ, depending on whether the outlier is in the calibration or the prediction set. When outliers are present in the calibration set, the result will likely be a poor model. The performance of such models during prediction will often be adversely affected. When a prediction set specimen is an outlier, the predicted analyte value for that specimen may differ significantly from the true unknown analyte value. Thus, it is very important to identify outliers. For many multivariate methods, a number of diagnostics are available (13).Furthermore, the use of multivariate, rather than univariate, methods offers the potential for significantly improving the precision of the predicted analyte values. The example depicted in Figure 3a demonstrates this potential. If, for example, the whole spectral region in Figure 3a is used to model the water concentration, the precision of the predictions generated by using Equations 4 and 5 can be expected to be about four times better than if measurements at a single wavelength (2595 nm) are used with the classical univariate method. That is, the standard de-

are driven by correlation rather than causation. Empirical models often relate the characteristic of interest to simple functions (e.g., low-order polynomials) of the instrumental measurements. In calibration problems in analytical chemistry, particularly spectroscopy, the simple functions often consist of linear combinations of instrumental measurements. For example, calibration methods relating the concentration of an analyte to a linear combination of measurements (e.g., absorFigure 4. Absorbance of the new bance) from several wavelengths were specimen ( y axis) versus the estimated absorbance of water introduced to develop models for a vapor at 1 ppm concentration by broader range of conditions. Specifically, wavelength ( x axis). * such methods were designed to be used Each point represents a (4, y,) pair. when the spectral features of the analyte overlap with features of other, perhaps unknown components in the material to be viation of repeated determinations will be about four times smaller for the multivari- analyzed. These methods, which are based on a generalization of the univariate ate method. However, this gain in effiinverse model (see Equation 2), are of ciency will be realized in practice only if the form the precision of the reference method is sufficientlygood. The CLS method can be further gener- xi = bo + b1*Yil + b2*yi2 + ... + alized to include multiple components, among them other analytes or interferences (14).In this case the underlying where xjis the jth measurement associmodel given in Equation 4 is expanded to ated with the ith specimen. account for the effects of other analytes or Various methods exist for estimating interferences.The primary advantage of the model parameters (bo, b,, . . ., b,) usthis more general approach is that it does ing data from the calibration set. One not require selective measurements. method uses MLR (1:). p e resFlting However, the successful use of this or any parameter estimates (bo, b,, . . ., b,) are other method based on an explicit model used to predict the analyte levels of ney depends on complete knowledge of the specimens using Equation 3, with aj = bj. chemical system to be analyzed. For exUnlike the generalized CLS method, MLR ample, all interferences must be known; and other soft modeling methods do not furthermore, with regard to each speciexplicitly require knowledge of the levels men in the calibration set, the levels of all of interferences and other analytes for analytes and interferences must be specimens in the calibration set. A judiknown. cious choice of instrumental measureThese requirements greatly restrict the ments can compensate for interferences. applicability of any method (such as CLS) For example, transmission (or reflecthat is based on an explicit causal model, tance) spectroscopy involving two wave especially for the analysis of complex ma- lengths in the vis/near-IR region is used to terials. However, if an explicit causal noninvasively monitor the level of oxymodel can be developed, a method such as gen saturation of arterial blood in a surgiCLS can be quite valuable; it may provide cal or critical care environment (17). One a reasonable basis for extrapolation and wavelength (in the red portion of the understanding of the uncertainty in the spectrum) is sensitive to the pulsatile predicted values of the analyte (15). blood volume and oxygen saturation of Multiple linear regression. In many a p the blood. A second wavelength (in the plications, analysts lack the knowledge re- near-IR region), which is insensitive to the quired to use an explicit causal model level of oxygen saturation, provides a and instead use methods involving empiri- measure of the pulsatile blood volume. cally determined (or soft) models that From the total signal associated with the

first wavelength, the contribution of the interfering phenomenon ( blood volume) can be effectively removed by using information obtained from the second wavelength. The resulting signal can provide a useful measure of oxygen saturation. The number of measurements (q) that can be used with MLR is often severely restricted; it is usually in the range of 2-10, depending on the complexity of the materials being analyzed. The strong correlation among the instrumental measurements can introduce instability among the resulting model parameter estimates. Because MLR cannot use a large number of correlated measurements, selecting the appropriate set of instrumental measurements is important. If available, as in the case of the noninvasive oximeter, specific knowledge of the way in which the analyte and interfering components in the sample material affect the instrumental measurements can be used to select instrumental measurements. In the absence of specific information, it is necessary to use empirical selection methods. PLS, PCR, and related methods. Recently, soft-model-based methods (including PLS and PCR), in which a very large number of instrumental measurements are used simultaneously,have been successfully applied to analytical chemistry, particularly spectroscopy. Stone and Brooks (18)showed that PLS and PCR belong to a general class of statistical methods referred to collectively as continuum regression. In general, the assumed model for all such methods is of the form xi

=

bo + 61 *til + b,*ti2 +

... + b, etih + ei

(7)

where tikis the kth score associated with the ith sample. Each score consists of a linear combination of the original measurements; that is tik = Ykl*Yil +

Yl&!

YE e

* *

-!-

+ Ykq*#q

(8)

Data from the calibration set are used to obtain the set of coefficientsjrkj); the estimated model parameters { bk); and the model size that is given by the metaparameter, h, which is usually considerably smaller than the number of measure ments, q. In general, the coefficients tyMl are obtained in a manner such that the vec-

Analytical Chemistry, Vol. 66, NO. 15, August 1, 1994 799 A

tors (tlk, t2k, . - .,tnk) and (tlm, tzm, t,,), fork # m, are orthogona: ?us, the iesulting parameter estimates { bo, b,, . . ., bhl are stable. During the prediction step, the predicted analyte value for a new specimen is given by -9

A

=

A

A

bo + bl*tl + b2*t2 +

..

A

6

+ b,

th

(9)

and the centered values for the jth measurement are given by *

y.. 1J

. =

1

n

y.. - 1J

i=l

This operation can help make subsequent computations less sensitive to round-off and overflow problems. In addition, this operation generally reduces the size of the resulting model

wheretk=yk,*y,+yk,*y2+. . .+Ykq*yq. Thus, k is simply a linear combination of xi = x + b, til* + b, ti2* + the instrumental measurements associated with the new specimen (i.e., = a, + a, * y , + a2 my2 + . . .+aqmyq). where What differentiates these methods YL + = ykl Yi + from one another is the set of coefficients {aj)that is used and the way in which - + ykq yh (l1) the {aj]are obtained. One important differby one factor. During the prediction step, ence between PLS and PCR is the manner in which the {ykj) coefficients are o b the measurements associated with a new tained. In PCR, the instrumental measurements of the calibration set are used exclusively to obtain the (&j] coefficients; in PLS, the analyte values of the calibration set are used as well. Although it is beyond the scope of this article to explain precisely how the {ykj) and {aj)coefficients are obtained for these methods, detailed information is available (13, 14, 19,ZO). In addition, References 21 and 22 provide comparisons of competing calibration methods. In these and other comparative studies, the prediction performances of PCR and PLS were generally found to be quite similar over a broad range of conditions. Rather than specimen, { yjl, are similarly translated by the amounts delve into how these methods differ, we will focus on common issues that are critical to the successful use of soft-model1 ” d.= - yij based approaches such as PLS and PCR. n

ti

Before methods such us PLS m d PCR are med, a certah amozlnt of data pretreatment is often performed.

C

i- 1

Data pretreatment

Before methods such as PLS and PCR are used, a certain amount of data pretreatment (or transformation) is often performed. The most common and simplest type is centering the data. This operation is usually performed on both the analyte values and the instrumental measurements, taken one at a time. That is, the centered analyte values are given by xi*= xi - f ,where

z=

1 ”

-

n

i=l

xi

Less frequently, the instrumental measurements are differentially weighted. Each centered instrumental nieasurement is multiplied by a measurementdependent weight, wj (Le., y = yi * w j ) . The centered and weighted instrumental measurements {yi) are then used as the basis for constructing the calibration model. The purpose of using nonuniform weighting is to modify the relative influence of each measurement on the resulting model. The influence of thejth measurement is raised by increasing the magnitude of its weight, wj. Sometimes

800 A Analytical Chemistry, Vol. 66, No. 15, August 1, 1994

4

weighting is performed such that the standard deviation of the centered and weighted measurements are identical (autoscaling). That is, the standard deviation of (ylj,ylj,.. .,yAj) = 1forj= 1,2,. . .,q. Although centering promotes numerical stability during the model-building stage, differential weighting can drastically alter the form and performance of the resulting model. For example, if the weights of the least informative measurements were unilaterally increased, one would expect the model performance to suffer. On the other hand, if the weights of the most informative measurements were unilaterally increased, one would expect performance to improve. The key to using weighting successfully is the ability to identify informative measurements. Without knowledge of the relative information content of the various measurements, weighting is akin to “shooting in the dark.” To a certain extent, differential weighting reduces to variable selection. That is, for measurements that are not selected, the associated weights are set to zero. Because full-spectrum methods (such as PCR and PLS) can use many wavelengths, the prevailing belief among spectroscopists seems to be that measurement (wavelength) selection is unnecessary; thus, they often use all available wavelengths within some broad range. However, in many applications, measurements from many spectral wavelengths are noninformative or are difficult to incorporate in a model because of nonlinearities. Whereas to some degree full-spectrum methods are able to accommodate nonlinearities, the inclusion of noninformative (or difficult) spectral measurements in a model can seriously degrade performance. For many difficult problems, wavelength selection can greatly improve the performance of full-spectrum methods. Furthermore, in applications outside the laboratory (e.g. determination of components in an in situ setting), physical and economic considerations associated with the measurement apparatus may restrict the number of wavelengths (or measurements) that can be used. Thus, wavelength selection is very important, even when applying methods capable of using a very large number of measurements.

Currently few empirical procedures for wavelength selection are appropriate for use with full-spectrum methods such as PLS. Most procedures (e.g., stepwise regression) are associated with calibration methods (e.g., MLR) that are capable of using relatively few wavelengths. However, Frank and Friedman (22) showed that stepwise MLR does not seem to perform as well as PLS or PCR with all measurements. In general, wavelength selection procedures that can be used with full-spectrum methods (e.g., the correlation plot) search for individual wavelengths that empirically exhibit good selectivity, sensitivity, and linearity for the analyte of interest over the training set (23,24).In order for these methods to be useful, wavelengths specific to the analyte of interest with good S/N are needed. However, the required wavelength specificity is not usually available in difficult applications (e.g., analysis of complex biological materials). Procedures such as the correlation plot, which consider only the relationships between individual wavelengths and the analyte of interest, are ill-equipped for such applications. This has provided the motivation to develop methods that select instrumental measurements on the basis of the collective relationship between candidate measurements and the analyte of interest (25). A number of other procedures exist for data pretreatment, primarily to linearize the relationships between the analyte level and the various instrumental measurements. This is important because of the inherent linear nature of the commonly used multivariate calibration methods. For example, in spectroscopy, optical transmission data are usually converted to a b sorbance before analysis. In this setting, this is a natural transformation given the underlying linear relationship (through Beer’s law) between analyte concentration and absorbance. Other pretreatment methods rely on the ordered nature of the instrumental measurements (e.g., a spectrum). In near-IR spectroscopy, instrumental measurements-which are first converted to reflectance-are often further transformed by procedures such as smoothing and the use of differencing (derivatives). Smoothing reduces the effects of

performance of a specific calibration model (fixed method/model size) is based on a very simple concept. First, data from the calibration set are partitioned into a number of mutually exclusive subsets (SI, S,, . . ., Sv),with the ith subset (Si) containing the reference values and instrumental measurements associated with n, specimens. Next, Vdifferent models are constructed, each using the prescribed method/model size with all except one of the Vavailable data subsets. The ith model, MPi,is constructed by using all data subsets except Si.In turn, each model is used to predict the analyte of interest for specimens whose data were not used in its construction (i.e,, M-i is used to predict the specimens in Si).In a sense, this procedure, which can be computing-intenCross-validation, model size, and model validation sive, simulates the prediction of new Cross-validation is a general statistical specimens. A comparison of predictions method that can be used to obtain an obobtained in this way with the known reference analyte values provides an objective assessment of the errors associated with predicting the analyte values of new specimens. Partitioning the calibration set into the various data subsets should be done carefully. Typically, the calibration set is partitioned into subsets of size one (i.e., leave one out at a time cross-validation). However, difficulty arises when replicate sets of instrumental measurements are obtained from individual specimens. Many practitioners use “leave one out at a time crossvalidation” in this situation. Unfortunately, what is left out one at a time is usually a jective assessment of the magnitude of prediction errors resulting from the use of single set of instrumental measurements. In this case, the cross-validated predican empirically based model or rule in complex situations (27).The objectivity of tions associated with specimens with r e p licate instrumental measurements will be the assessment is obtained by comparing predictions with known analyte values for influenced by the replicate measurements (from the same specimen) used to specimens that are not used in develop ing the prediction model. In complex situa- construct Wi. Such use of cross-validation does not tions, it is impossible or inappropriate to simulate the prediction of new samples. use traditional methods of model assessment. In the context of multivariate cali- The likely result is an optimistic assessment of prediction errors. A more realistic bration, cross-validation is used to help identify the optimal size (hOpt)for soft- assessment of prediction errors would be obtained if the calibration set were partimodel-based methods such as PLS and PCR. In addition, cross-validation can pro- tioned into subsets in which all replicate measurements from a single specimen are vide a preliminary assessment of the prediction errors that are to be expected included in the same subset. To select the optimal model size (hopt), when using the developed model (of o p timal size) with instrumental measure- the cross-validation procedure is performed using various values of the metaments obtained from new specimens. parameter, h. For each value of h, an apThe cross-validated assessment of the

high-frequency noise throughout an ordered set of instrumental measurements such as a spectrum. It can be effective if the signal present in the instrumental measurements has a smooth (or lowfrequency) nature. Differencing the ordered measurements mitigates problems associated with baseline shifts and overlapping features. Another technique often used in near-IR reflectance spectroscopy is multiplicative signal correction (26),which handles problems introduced by strong scattering effects. The performance of the multivariate calibration methods described earlier can be strongly influenced by data pretreatment.

Cross validation can be used to obtain an objective assessment of the magnitude of prediction errom

Analytical Chemistry, Vol. 66, No. 15, August 1, 1994 801 A

propriate measure of model performance is obtained. A commonly used measure of performance is the root mean squared prediction error based on cross-validation RMSCV(h)

=

where ki[M-i(h)l represents the predicted value of the ith specimen using a model of size h, which was developed without using Si.Sometimes, to establish a baseline performance metric, RMSCV(0) is computed. For this purpose, ti[Mi(0)1 is defined as the average analyte level in the set of all specimens with the ith specimen removed. Thus, RMSCV(0) provides a measure of how well we would predict on the basis of the average analyte level in the calibration set rather than instrumental measurements. Often, practitioners choose hop, as the value of h that yields the minimum value of the RMSCV. The shape associated with RMSCV(h) in Figure 5 is quite common. When h < hopt,the prediction errors are largely a consequence of systematic effects (e.g., interferences) that are unaccounted for. When h > hop,,the prediction errors are primarily attributable to modeling of noise artifacts (overfitting). Usually, if the model provides a high degree of predictability (as in the case illustrated by Figure 5), the errors caused by overfitting are relatively small compared with those associated with systematic effects not accounted for.

Figure 5. Determination of optimal PLS model slze, hoPp The model relates near-IR spectroscopic measurements to urea concentration (mg/dL) in multicomponentaqueous solutions.

At this point, an optimal (or nearoptimal) model size has been selected. RMSCV(hopJ can be used as a rough estimate of the root mean squared prediction error associated with using the selected model with new specimens. This estimate may be somewhat optimistic, given the nature of the model selection process where many possible models were under consideration. A more realistic assessment of the magnitude of prediction errors can be obtained by using an external data set for model validation. An ideal strategy for model selection and validation would be to separate the original calibration set into two subsets: one for model selection (i.e., determination of hop,) and one strictly for model validation. Use of this strategy would guarantee that model validation is independent of model selection. Pitfalls The primary difficulty associated with using empirically determined modelsis that they are based on correlation rather than causation. Construction of these models involves finding measurements or combinations of measurements that simply correlate well with the analyte level throughout the calibration set. However, correlation does not imply causation (i.e., a causeandeffect relationship between the analyte level and the instrumental measurements). Suppose we find, by empirical means, that a certain instrumental measurement correlates well with the analyte level throughout the calibration set. Does this mean that the analyte level affects that particular instrumental measurement? Not necessarily. Consider Figure 6, which displays the hypothetical relationship between the reference analyte level and the order of measurement (run order) for specimens in the calibration set. Because of the strong relationship between analyte level and run order, it is difficult to separate their effects on the instrumental measurements. Thus, the effects of analyte level and run order are said to be confounded. In this case, simple instrument instability could generate a strong misleading correlation between analyte level and an instrumental measurement. Fortunately, a useful countermeasure for this type of confounding exists: randomization

802 A Analytical Chemistry, Vol. 66, No. 15, August 1, 1994

of the run order with respect to analyte level. Often, however, more subtle confounding patterns exist. For instance, in a multicomponent system, the analyte level may be correlated with the levels of other components or a physical phenomenon such as temperature. In such situations it may be difficult to establish whether, in fact, the model is specific to the analyte of interest. In a tightly controlled laboratory study, where the sample specimens can be formulated by the experimenter, it is possible to design the calibration set (with respect to component concentrations) so that the levels of different compe nents are uncorrelated. However, this countermeasure does not work when an empirical model is being used to predict analyte levels associated with, for example, complex industrial or environmental specimens. In such situations, one rarely has complete knowledge of the components involved, not to mention the physical and chemical interactions among components. The validity of empirically based models depends heavily on how well the calibration set represents the new specimens in the prediction set. All phenomena (with a chemical, physical, or other basis) that vary in the prediction set and influence the instrumental measurements must also vary in the calibration set over ranges that span the levels of the phenomena occurring in the prediction set. Sometimes the complete prediction set is at hand before the calibration takes place. In such cases, the calibration set can be obtained directly by sampling the prediction set (28).Usually, however, the complete prediction set is not available at the time of calibration, and an unusual (or unaccounted for) phenomenon may be associated with some of the prediction specimens. Outlier detection methods represent only a limited countermeasure against such dif6culties; valid predictions for these problematic specimens cannot be obtained. Sources of prediction errors Ultimately, the efficacy of an empirical calibration model depends on how well it predicts the analyte level of new specimens that are completely external to the development of the model. If the reference

f f

Figure 6. Relationship between the reference analyte level and run order of the calibration experiment.

values associated with m new specimens (or specimens from an external data set) are available, a useful measure of model performance is given by the standard error of prediction .

SEP

=

J' m

m

- xi}' i=1

(13)

If these m new specimens comprise a random sample from the prediction set spanning the range of potential analyte values (and interferences), the SEP can provide a good measure of how well, on average, the calibration model performs. Often, however, the performance of the calibration model varies, depending on the analyte level. For example, consider F i i r e 7a where the standard deviation of the prediction errors (e, = ii-xi) increases as the analyte value deviates from the average analyte value found in the training set, f . In this case, although the precision of predictions depends on the analyte value, the accuracy is maintained over the range of analyte values. That is, for a particular analyte level, the average prediction error is about zero. The behavior with respect to precision is neither unexpected nor abnormal; the model is often better described in the vicinity of 2rather than in the extremes. On the other hand sometimes there is a systematic bias associated with prediction errors that is dependent on the analyte level (Figure %). When the analyte values are less than f ,prediction errors are generally positive. Conversely, when analyte values are greater than 2,the prediction errors are generally negative.

This pattern is indicative of a defective model in which the apparent good predictions in the vicinity off are attributable primarily to the centering operation that is usually performed during preprocessing in PLS and PCR. That is, predictions based on Equation 10 effectively reduce to ii= f + nois:, if the estimated model coefficients, I bj}, are spurious. Spurious model coefficients are obtained if noninformative instrumental measurements are used to construct a model. Thus, one should be wary of models that produce the systematic pattern of prediction errors shown in Figure %, regardless of whether the predictions are based on cross-validation or a true external validation set. Several other factors affect the accuracy and precision of predictions, notably the inherent accuracy and precision of the reference method used. If the reference method produces erroneous analyte values that are consistently low or high, the resulting predictions will reflect that bias. Imprecise (but accurate) reference values will also i d a t e the magnitude of prediction errors, but in a nonsystematic way. Furthermore, errors in determining the reference values will affect the ability to assess the magnitude of prediction errors. The assessed magnitude of prediction errors can never be less than the magnitude of the reference errors. Thus, it is very important to minimize the errors in the reference analyte values that are used to construct an empirical model. Other sources of prediction error are related to the repeatability, stability, and reproducibility of the instrumental measurements. Repeatability relates to the ability of the instrument to generate consistent measurements of a specimen using some fixed conditions'(without removing the specimen .from the instrument), over a relatively short period of time (perhaps seconds or minutes). Stability is similar to repeatability, but it involves a somewhat longer time period (perhaps hours or days). Reproducibility refers to the consistency of instrumental measurements during a small change in conditions, as might occur from multiple insertions of a specimen into an instrument. Further classification of instrumental measurement errors is possible for cases in which the multiple instrumental errors are ordered (e.g., by wavelength). It is also

possible to decompose instrumental variation into features that are slowly varying (low frequency) and quickly varying (high frequency). Often the focus is on only the high-frequency error component, and usually only in the context of repeatability. This is unfortunate because multivariate methods that are capable of using many measurements are somewhat adept at reducing the effects of high-frequency errors. Practitioners should work to identify and eliminate sources of slowly varying error features. Other sources of prediction error may be unrelated to the reference method or the analytical instrument. Modeling nonlinear behavior with inherently linear methods can result in model inadequacy. Some researchers thus have adapted multivariate calibrations methods in order to accommodate nonlinearities (29,30). The ability to adequately sample and measure specimens in difticult environ-

Figure 7. Relatlonshipbetween the predicted analyte and reference analyte levels. (a) In the noma1 relationship,the average reference analyte level, R, is 11 (arbitrary units). The precision of the predicted values depends on the reference analyte level and is best in the vicinity of 2. (b) In the abnormal relationship, the precision and accuracy of the predicted values depend on the reference analyte level.

Analytical Chemistty, Vol. 66, No. 15, August 1, 1994 803 A

A COMPLETEFAMILY

n

b

ments can significantly affect the performance of calibration methods. In a laboratory it might be possible to control some of the factors that adverselv affect model performance. However, many emerging analytical methods (e.g., noninvasive medical analvses and in situ analvses of industrial and are intended for use outside the traditional laboratorv environment. where it miaht

(11) Robinson, M. R.; Eaton, R. P.; Haaland, D. M.; Koepp, G. W.; Thomas, E. V.; Stallard, B. R.; Robinson, P. L. Clin. Chem. 1992,38,1618-22. (12) Brown, S. D.; Bear, R. S.; Blank,T. B. Anal. Chem. 1992,64,22R-49R. (13) Martens, H.; Naes, T. Multivariate Calibration; Wiley: Chichester, England, 1989. (14) Haaland, D. M.; Thomas, E. V. Anal. Chem. 1988,60,1193-1202. (15) nomas, E. V. Technometrics 1991, 33, 405-14. (16) Brown, C. W.; Lynch, P. F.; Obremski,

ELECTROCHEMICAL will laraelv influence the success or failure of a calibration method in a given application. Thus, practitioners must strive to identify and eliminate the dominant sources of prediction error.

Western Reserve University, Cleveland, OH, 1983. Stone, M.; Brooks, R J. J. Royal Statistical SOC.Series B. 1990.52,237-69. Hoskuldsson, A. J. Che’mom. 1988,2, 211-28. Helland, I. S. Scandinavian J. Statistics 1990,17,97-114. Thomas, E. V.; Haaland, D. M. Anal. Chem. 1990,62,1091-99. Frank, I. E.; Friedman, J. H. Technometrics 1993.35.109-48. Hruschka,’W. R. In Near-Infiared Technology in the Agricultural and Food Industries;Williams, P.; Norris K., Eds.; American Association of Cereal Chemists, Inc.: St. Paul, MN, 1987; pp. 35-55.

OF S E L E C T

TOOLS

FOR THE DEMANDING

Ab

U

RESEARCHER

I

-

and industry with the finest selection of electrochemical instruments and software available. Add the outstanding support provided by our full staff of applications

(19)

(20) (21)

Summaw For more than 25 years, EG&G Princeton Applied Research has been supplying science

(18)

This article has provided a basic introduction to multivariate calibration methods with an emDhasis on identifving issues that are critical to their effective use. In the future, as increasingly difficult problems arise. these methods will continue to I

Y

(22) (23)

specialists, and it’s easy to see why the EG&G PARC family of electrochemical solutions is legendary. And the family is growing, with the recent addition of the Model 263, a 20 volt compli-

1thank Steven Brown, Bob Easterling, Res

ance, 200 mA potentiostatlgalvanostat with

Robinson, and Brian Stallard for their advice on this manuscript. Brian Stallard provided the water vapor spectra.

impressive speed, precision, and sensitivity. You can run static or scanning experiments from the front panel or from your PC. Also, the new Model 6310 Electrochemical Impedance Analyzer combines high performance potentiostat circuitry with stateof-the-art phase sensitive detection in one affordable unit.

To start your own family of electrochemical products or just to add on, give us a call at 609-530-1000. One of our expert applications people can help you make the right choice.

Eaxa INSTRUMENTS A p r i n c e t o n Applied Research P.O. Box2565*Princeton, NJ 08543.(609) 530-1000 FAX: (609) 883-7259 *TELEX: 843409 United Kingdom (44) 734-773003 Canada 905-827-2400 Netherlands (31)034-0248777 Italy (39)02-27003636 Germany (49)89-926920 France (33)01-69898920 Japan (03)638-1506

Circle 10 for Literature. Circle 11 for Sales Rep.

1989,43,328-35. (29) Hoskuldsson, A. J. Chewom. 1992, 6, 307-34. (30) Sekulic, s.;Seasholtz, M. B.; Kowalski, B. R.; Lee, S. E.; Holt, B. R. Anal. Chem. 1993,65,835A-845 A.

References (1) Oman, S. D.; Wax, Y. Biometrics 1984, 40,947-60. (2) Smith, R L.; Corbett, M. Applied Statistics 1987,36,283-95. (3) Krutchkoff, R. G. Technometrics 1967,9, 425-39. (4) Williams, E. J. Technometrics 1969,I I , 189-92. (5) Occupational Safety and Health Adminis-

tration Salt Lake Technical Center: Metal and Metalloid Particulate in Workplace Atmospheres (Atomic Absorption) (US DOL/OSHA Method No. ID-121). In OSHA Analytical Methods Manual Part 2, 2nd ed. (6) Fearn, T. APBlied Statistics 1983,32,7379. (7) Oman, S. D.; Naes, T.; Zube, A. J. Chemom. 1993, 7,195-212. (8) Haaland, D. M. Anal. Chem. 1988,60, 1208-17. (9) Small, G. A; Arnold, M. A; Marquardt, L. A. Anal. Chem. 1993,65,3279-89. (10) Bhandare, P.; Mendelson, Y.; Peura, R A; Janatsch, G.; Kruse-Jarres, J. D.; Marbach, R.; Heise, H. M. Appl. Spectrosc. 1993,47,1214-21.

804 A Analytical Chemistry, Vol. 66,No. 15, August 1 , 1994

Edward K Thomas is a statistician at the Statistics and Human Factors Department of Sandia National Laboratories, Albuquerque, NM 87185-0829. His B.A. degree in chemistry and M.A. degree and Ph.D in statistics were all awarded by the University of New Mexico. His research interests include calibration methods with a focus on their application to problems in analytical chemistry.