Anal. Chem. 2004, 76, 4364-4373
Discordance between Net Analyte Signal Theory and Practical Multivariate Calibration Christopher D. Brown*
InLight Solutions, Inc., 800 Bradbury SE, Albuquerque, New Mexico 87106
Lorber’s concept of net analyte signal is reviewed in the context of classical and inverse least-squares approaches to multivariate calibration. It is shown that, in the presence of device measurement error, the classical and inverse calibration procedures have radically different theoretical prediction objectives, and the assertion that the popular inverse least-squares procedures (including partial least squares, principal components regression) approximate Lorber’s net analyte signal vector in the limit is disproved. Exact theoretical expressions for the prediction error bias, variance, and mean-squared error are given under general measurement error conditions, which reinforce the very discrepant behavior between these two predictive approaches, and Lorber’s net analyte signal theory. Implications for multivariate figures of merit and numerous recently proposed preprocessing treatments involving orthogonal projections are also discussed. The fundamental problem of quantitative analytical chemistry is that of determining within some acceptable approximation the desired chemical or physical property for a sample of interest. The majority of such approximations are made using theoretically or empirically derived instrument calibration functions. Although these functions are easily interpreted for univariate analytical measurements such as pH, or single-wavelength absorbance, highly multivariate calibration functions, like those used in multiwavelength spectroscopy, can be much less transparent. In his germinal work of 1986, Lorber1 provided an elegant interpretation to these multivariate calibration functions with his concept of the net analyte signal (NAS) vector in linear additive systems. (Although rarely cited, Morgan2 had actually discussed a similar concept some years earlier, giving expressions similar to those provided by Lorber, although readers should note that Morgan’s work contains some errors.) Geometrically, the NAS vector represents the portion of the pure analyte signal vector that resides in a space orthogonal to the pure-component signals of all interfering species in a linear additive system (Figure 1).3 As the NAS vector indicates a direction only affected by changes in the analyte concentration, it can also serve as a fully selective property predictor in the IUPAC sensesthe extent to which a * E-mail: [email protected]
(1) Lorber, A. Anal. Chem. 1986, 58, 1167-1172. (2) Morgan, D. R. Appl. Spectrosc. 1977, 51, 404-415. (3) The term “net analyte signal” is often used interchangeably in the literature to describe Lorber’s vector quantity as well as sample-signal projections onto this vector (in which case each sample would have a net analyte signal). We at no point use the latter interpretation in this work.
4364 Analytical Chemistry, Vol. 76, No. 15, August 1, 2004
Figure 1. Illustration of Lorber’s net analyte signal vectors (NASi) in relation to the pure-component signals (ki).
method can be used to determine particular analytes in mixtures or matrices without interferences from other components.4 Sanchez and Kowalski,5 and later Booksh and Kowalski,6 employed the net analyte signal concept as theoretical scaffolding for a larger discussion of analytical calibration theory and figures of merit in linear multivariate calibration. Lorber’s original work gave expressions for calculating the NAS vector(s) based on the pure-component signals of the chemical components in the system as well as the so-called classical and inverse calibration models, and in subsequent years, the nuances of such calculations have been expanded upon.7,8-11 The relation of the NAS concept to analytical figures of merit was well recognized by both Morgan and Lorber, and it has become the basis of a considerable body of literature on multivariate measures of selectivity, sensitivity, limit of detection, and multivariate signal-to-noise ratios.6,12,13 By Lorber’s definition, the orientation and length of the NAS vector is independent of the properties of the instrumental (measurement) error or uncertainty corrupting a measurement system. Measurement error is, however, omnipresent in instrumental measurements, so even the true NAS vector will propagate measurement error variance if it is used as a predictor for the properties of new samples based on instrumental measurements. This leads to several paradoxes in prediction, a few of which we illustrate here. Paradox A. An interfering species A exists at such a low concentration that it is barely observable above the detector noise of a spectrometer. The NAS will be oriented orthogonally to the (4) Vessman, J.; Stefan, R. I.; Van Staden, J. F.; Danzer, K.; Lindner, W.; Burns, D. T.; Fajgel, A.; Muller, H. Pure Appl. Chem. 2001, 73, 1381-1386. (5) Sanchez, E.; Kowalski, B. R. J. Chemom. 1988, 2, 247-264. (6) Booksh, K. S.; Kowalski, B. R. Anal. Chem. 1994, 66, A782-A791. (7) Lorber, A.; Faber, N. M.; Kowalski, B. R. Anal. Chem. 1997, 69, 16201626. (8) Xu, L.; Schechter, I. Anal. Chem. 1997, 69, 3722-3730. (9) Faber, N. M. Anal. Chem. 1998, 70, 5108-5110. (10) Ferre, J.; Faber, N. M. Chemom. Intell. Lab. Syst. 2003, 69, 123-136. (11) Ferre, J.; Brown, S. D.; Rius, F. X. J. Chemom. 2001, 15, 537-553. (12) Kalivas, J. H.; Lang, P. M. Chemom. Intell. Lab. Syst, 1996, 32, 135-149. (13) Faber, K.; Lorber, A.; Kowalski, B. R. J. Chemom. 1997, 11, 419-461. 10.1021/ac049953w CCC: $27.50
© 2004 American Chemical Society Published on Web 06/12/2004
Figure 2. (a) Scatterplot of noisy x and y measurements (both corrupted with measurement error with σ ) 50), where the true values are actually related by a slope β ) 1, (b) Plot of RMSE of predictions of y as a function of the slope used to predict the true y from the noisy x. Also depicted are the true slope β ) 1 (vertical dashed line), and the theoretically optimal RMSE prediction slope of 0.9 (vertical dotted line).
pure-component spectrum of interference A; however, the detector noise contributes just as much spectral variance in the same direction and all others. The same NAS holds in theory even if the detector noise doubles in magnitude, swamping the variance of component A. Paradox B. The concentration variance of an interfering agent can freely range between zero, i.e., nonexistence relative to the instrument noise, and some very large value, but interferences only exist or do not exist in the definition (and equations) of the net analyte signal. Paradox C. Correlated measurement error often has a highly preferred, but never strictly one-dimensional, measurement error signal. Lorber’s net analyte signal vector does not account for such measurement error, even though, like chemical interferences, it has a highly preferred orientation in the system. Paradox D. Lorber’s net analyte signal does not depend on concentrations, yet if two interfering species are almost perfectly correlated in sample concentrations, they are effectively a single interference. If two species are highly, but not perfectly correlated in concentration, they are still two interferences in the NAS theory, regardless of whether the correlation makes them indistinguishable within the level of measurement error. The paradoxes above are reductio ad absurdum arguments for an incompatibility between the notions of the NAS as a meta parameter independent of measurement error and an optimal predictor for real applications. Our examination of this discrepancy begins with a single univariate calibration example. Figure 2 shows two related experimental variables, x and y. By design these variables are truly related by a slope of 1, but they are both corrupted with measurement error of σ ) 50 units, making the observed agreement less than perfect in Figure 2a. Consider first the situation in which the true slope, β, between these variables represents a physical parameter of interest, for example, activation energy in the log-linear Arrhenius relation between inverse temperature and reaction rate. Given the experimentally measured x and y values, the objective is to obtain an unbiased estimate of the true slope parameter that translates to a loss function (criteria to be minimized) of L ) 〈β - βˆ 〉, where βˆ denotes the estimated slope. Suppose instead that our objective is not to estimate the true slope β, but rather to find a “predictor slope”, b, that enables the prediction of y from future measured x values with minimum mean-squared error of prediction, MSEP
(or its square root, RMSEP). This “predictive” loss function is L ) 〈(y - yˆ)2〉, where yˆ is the prediction for the true y. Unfortunately, if both x and y variables are measured with error, a single estimate will never satisfy the parameter estimation and prediction loss functions simultaneously. Theory14,15 dictates that the true slope β and the optimal “predictor slope” b differ in this univariate case by a factor
b ) β(σx2/(σx2 + σex2))
where σx2 is the variance of x, and σex2 is the variance of the measurement error corrupting x. Figure 2b shows the predictive loss function, RMSEP, for a variety of different slopes. Even though the true relationship between x and y is given by β ) 1, Figure 2b shows that the minimum MSEP for y occurs with b ) 0.89 (using the experimental data shown in the figure), nearly identical to the theoretically ideal prediction slope of 0.9 (substitution into eq 1, σx2 ) 1502, σex2 ) 502). The best predictor slope depends on the amount of measurement error corrupting the measurements of x. Experiment design also differs markedly for parameter estimation versus prediction objectives. Designed experiments for parameter estimation problems, well-entrenched in introductory statistics courses, strive to provide optimal confidence for the estimated parameters; hence, designs that minimize the confidence (hyper)ellipsoids around parameters or parameter vectors, such as fractional-factorial,16 or D-optimal designs,17 are the experimental designs of choice. These designs are often inappropriate if the desired objective is prediction, since the confidence intervals on the parameter estimates are not involved in predictive loss functions. (In prediction problems involving measurement error, the population of unobserved but statistically similar predictors is of limited utility.) Experimental designs specifically optimized for prediction, such as I-optimal and A-optimal designs,18,19 ensure that the prediction models derived from an m-sample “calibration” experiment yield minimum expected prediction error over defined operating bounds. (14) Cheng, C.-L.; Van Ness, J. W., Statistical Regression with Measurement Error (Kendall’s Library of Statistics 6); Oxford University Press: New York, 1999. (15) Fuller, W. A. Measurement Error Models; J. Wiley & Sons: New York, 1987. (16) Box, G. E. P.; Hunter, W. G.; Hunter, J. S. Statistics for Experimenters; J. Wiley & Sons: New York, 1978. (17) St. John, R. C.; Draper, N. R. Technometrics 1975, 17, 15-23.
Analytical Chemistry, Vol. 76, No. 15, August 1, 2004
Countless works outside the field of chemistry deal with the effects of error propagation and experimental uncertainty in the multivariate regression problem, and the more eclectic aspects of problems in analytical chemistry have also been treated with considerable rigor.1,13,20,21 These works are very focused on the uncertainty associated with the parameter estimation problem, which relates to tactical matters such as calibration sample size, or model averaging and selection, for example. While not to be ignored, parameter estimation objectives are often discrepant with prediction objectives as discussed above. The effect of measurement error in prediction has been discussed in the chemistry literature,22,23 but the treatments are somewhat convoluted, in that the measurement error is linearly propagated through a fixed parameter vector, which itself is not dependent on the character of the measurement error. The singular focus of this work is the prediction problem, specifically, predicting sample physicochemical properties from imperfect measurements (e.g., spectroscopy). In the discussion that follows, we strive to clarify the role that X-block measurement error24 plays in the form of both the classical least-squares and inverse least-squares calibration models as predictors and discuss implications for the concept of net analyte signal as it is widely used today. There are countless tactical approaches to generating a multivariate linear predictor, but the following work is broadly applicable to classical and inverse linear calibration in the most general sense, provided the myriad implementations strive for a common objectivesprediction. Therefore, methods such as partial least-squares, principal components regression, and ridge regression are considered special cases of the general inverse calibration problem treated herein. Standard notation from linear algebra will be used throughout the work. Upper case boldface letters (e.g., X) will denote matrices, lowercase boldface letters denote vectors (e.g., x) and will be column vectors unless otherwise indicated, and lowercase italicized letters will denote scalars (e.g., x). Superscripts are assigned as follows: T, matrix/vector transpose; -1, matrix inverse; +, matrix/vector pseudoinverse. Vector Euclidean norms (i.e., 2-norms) are indicated by bracing the quantities as |q|, and the expected value by 〈q〉. The overstrike “∼” is used to differentiate between variables observed with error, X ˜ , and their error-free counterparts, X. An overstrike “∧” distinguishes estimated parameters from their theoretical values. THEORY To avoid unnecessary vagaries, we employ the terminology of absorption spectroscopy through the remainder of the work, although the approach below applies to any system that is under prediction with a linear model. (While a linear model may not be the best predictive choice in such a circumstance, the data themselves need not be linear for the discussion to apply.) The (18) Box, G. E. P.; Draper, N. R. Biometrika 1963, 50, 335-352. (19) Box, G. E. P.; Draper, N. R. J. Am. Stat. Assoc. 1959, 54, 622-654. (20) Faber, K.; Kowalski, B. R. J. Chemom. 1997, 11, 181-238. (21) DiFoggio, R. Appl. Spectrosc. 2000, 54, 94A-113A. (22) Lorber, A.; Kowaski, B. R. J. Chemom. 1988, 2, 93-109. (23) Faber, N. M. Chemom. Intell. Lab. Syst. 2000, 52, 123-134. (24) Our reason for focusing on X-block measurement error is 3-fold: (1) economy, (2) the effects of y-block measurement error have been thoroughly addressed in the literature, and (3) y-block measurement error has no impact on the theoretical concept of net analyte signal. The y-block measurement error effects are, however, given in the Supporting Information.
Analytical Chemistry, Vol. 76, No. 15, August 1, 2004
multiwavelength, multisample embodiment of Beer’s law is expressed as
X ) YK
where the p columns of the concentration matrix, Y (m × p), contain concentrations for each of the p components in each of the m samples. The rows of K (p × n) correspond to the purecomponent spectra (molar absorptivities times path length) for each of the p components in the system at unit concentration. The rows of X (m × n) contain the mixture spectra resulting from the sum of products of the concentrations and pure-component spectra. Lorber’s NAS vector for a given component was originally described as the projection of the target analyte’s pure-component spectrum, ki, into the space perpendicular to the interfering component signals K-i.1 The NAS vector for component i, nnet,i is often mathematically expressed as T T -1 nnet,i ) (In - K-i (K-iK-i ) K-i)ki
where In is the n × n identity matrix. Equation 3 uses the wellknown orthogonal projector from linear algebra, and straightforward algebra also shows that the projector in eq 3 can be constructed from any matrix that shares the row space with K-i, the foundation for other approaches to calculating or approximating the NAS.7,8 Lorber’s net analyte signal vectors can also be simply expressed as scaled versions of the columns of B, the pseudoinverse of K
B ) KT(KKT)-1 nnet,i )
where the bi in eq 5 is the ith column vector of B from eq 4. This alternative expression is also geometrically motivated. Since KB ≡ Ip by definition, it follows that the ith column of B must be orthogonal to all rows of K except i; in other words, bi is orthogonal to all interfering component signals. The scaling factor in the denominator of eq 5 is a consequence of simple trigonometry and is derived in section A of the Supporting Information. The classical least-squares approach to multivariate calibration and concentration prediction25-27 is detailed in Figure 3 and uses a regression of the observed mixture spectra, X ˜ , onto the space of the pure- component spectra K.28 In CLS, the concentrations are treated as parameters providing the least-squares estimate of X from X ˆ ) Yˆ K, so the CLS loss function associated with Y pertains to parameter estimation, penalizing concentration bias, Ly ) 〈y - yˆ〉. With the X-block measurement error independent and identically distributed across wavelength channels (iid) the CLS concentration predictor is usually given simply as a pseudo(25) Antoon, M. K.; Koenig, J. H.; Koenig, J. L. Appl. Spectrosc. 1977, 31, 518524. (26) Haaland, D. M.; Easterling, R. G. Appl. Spectrosc. 1980, 34, 539-548. (27) Haaland, D. M.; Easterling, R. G. Appl. Spectrosc. 1982, 36, 665-673. (28) In ‘indirect’ classical calibration, the pure-component spectra are first estimated from known concentrations, but the classical concentration predictor remains precisely the same.
Figure 3. Derivation of the classical and inverse least-squares concentration predictors BCLS and BILS under their native measurement error assumptions.
inverse of K
BCLS ) KT(KKT)-1
although under more complex measurement error scenarios, such as error heteroscedasticity or measurement error covariance, the more general version of eq 6 applies
X ˆ )X ˜ Σ-1KT(KΣ-1KT)-1K
BCLS ) Σ-1KT(KΣ-1KT)-1
where Σ is the covariance matrix of the measurement error (or a scalar multiple thereof). Equations 6 and 8 are both MoorePenrose pseudoinverses of K, and the loss functions in both cases penalize bias in the estimates of Y. A comparison of eqs 4 and 6 makes it clear that the classical least-squares concentration predictors are theoretically equivalent to Lorber’s net analyte signal vectors (with the appropriate scaling factor of eq 5), provided the measurement error is iid. When the measurement error is not iid (and eq 8 applies) then the proper classical least-squares predictors compensate for error covariance and no longer relate to Lorber’s net analyte signal vectors through eqs 3 and 4. Instead, a simple extension of eq 3 is required (a generalized NAS expression), as given below T T -1 nnet,i ) (In - Σ-1K-i (K-iΣ-1K-i ) K-i)ki
Regardless of the measurement error conditions (provided the appropriate expression is used), the theoretical CLS predictors of eq 8 are always orthogonal to the interfering component signals as per Lorber’s original definition (i.e., KBCLS ) Ip). The CLS predictors are also independent of the magnitude of measurement error in Xsmultiplication of Σ by a scalar does not change the predictorssand are not affected by the component concentrations. The CLS predictors are dependent on the covariance structure of the measurement error, but they remain orthogonal to the interfering component signals under all circumstances. The inverse least-squares calibration procedure29-31 was originally differentiated from the classical calibration procedure in its use of the “inverted” Beer’s law equation. As detailed in Figure 3, inverse calibration involves a regression of the observed concen(29) Brown, C. W.; Lynch, P. F.; Obremski, R. J.; Lavery, D. S. Anal. Chem. 1982, 54, 1472-1479. (30) Kisner, H. J.; Brown, C. W.; Kavernos, G. J. Anal. Chem. 1983, 55, 17031707. (31) Maris, M. A.; Brown, C. W.; Lavery, D. S. Anal. Chem. 1983, 55, 16941703.
trations, Y ˜ , on the mixture spectra X, implicitly assuming the measurement error in X is negligible. (Some practitioners now reserve the term “inverse least squares” for calibration on only a few selected wavelengths, but in this work, the term is used to generally refer to any calibration procedure in which concentrations are regressed onto instrumentally derived variables such as spectral wavelengths.) The estimated concentration predictors B ˆ ILS are the parameter vectors providing the least-squares approximation for Y, Yˆ ; unsurprisingly, the loss function for Y in inverse least-squares penalizes mean-squared error, Ly ) 〈(y yˆ)2〉. Although measurement error in Y clearly impacts the certainty with which B ˆ ILS is estimated, it does not actually alter the theoretical predictor under most circumstances.32 If X is corrupted by measurement error, a very different theoretical result is achieved, which we term the “inverse measurement-error” (ILSME) predictor
˜ TX ˜ )-1X ˜ TY ˜ B ˆ ILS-ME ) (X
(The pseudoinverse (X ˜ TX ˜ )-1X ˜ T in eq 10 can be replaced by near countless other pseudoinverses, e.g., the rank reduced pseudoinverses of principal components regression of partial least squares, but to maintain generality, we continue the discussion with a simple pseudoinverse.) Despite the X-block measurement error corrupting eq 10, the ILS-ME predictor minimizes the exact same loss function on Y: Ly ) 〈(y - yˆ)2〉. Although we leave the details as Supporting Information, the underlying forms of eq 10 are
B ˆ ILS-ME ) (XTX + ETXEX)-1XTY
and the theoretical expression
BILS-ME ) (KTΨK + Σ)-1KTΨ
where Σ is the expected measurement error covariance, Ψ is the expected concentration covariance, and it has been assumed that the X-block measurement error is independent of the component concentrations (〈eXyT〉 ≈ 0). Equation 12 partitions the denominator into KTΨK and Σ, but there is no theoretical necessity for this separation other than that it splits the system variance into terms of a chemical and nonchemical (e.g., instrumental, environmental noise) origin. Equation 12 applies under any general measurement error conditions (e.g., heteroscedastic, correlated error), provided the measurement error variance in X is not correlated with the concentrations in Y. If such a correlation exists, the more flexible expression in the Supporting Information is required, but the discussion below equally applies. The theoretical ILS-ME predictor of eq 12 is clearly different in form from the BCLS predictor of eq 8, since the BILS-ME predictor depends on pure-component spectra K, the concentration covariance structure Ψ, and the structure and magnitude of the measurement error covariance, Σ. The measurement error biases the ILS-ME predictor just as the prediction slope was biased in the univariate example given in the introduction, so the exact form (32) If the reference method is itself inaccurate, it will limit the accuracy of the inverse predictor, but the imprecision of the reference method makes no contribution to the form of the optimal predictor.
Analytical Chemistry, Vol. 76, No. 15, August 1, 2004
of the distortion depends on the theoretical character of the measurement error in magnitude, as well as structure (error covariance, heteroscedasticity). As the noise corrupting the system grows larger in magnitude, the ILS-ME predictor preferentially shrinks the predictor vectors and changes their orientation to avoid propagating an excessive amount of X-block noise through to predictions. Although rarely discussed in the chemistry literature, this general form of eq 11 is not novel. The univariate case was discussed (and referenced) in the introduction, and eq 11 is well recognized in the statistics literature as the precursor to an errorsin-variables regression solution.14 It is a solution that minimizes the quadratic loss function for optimal control,33 the rms error in linear signal processing applications,34 and in that sense is akin to a spectrometric “matched” filter.35,36 Equation 11 is also a naturally occurring form of Tihkonov regularization,37 widely used in regression and numerical analysis, of which ridge regression (Σ ) kI) is a zero-order case. Similar results can be found across such diverse fields as neural network optimization and econometric and time-series forecasting.38 In the context of this work, the importance of eqs 8 and 12 is their relation to the concept of net analyte signal. It is asserted in the literature that, since both BCLS and BILS-ME predict concentrations from spectra, they are at least theoretically exchangeable, and Lorber’s net analyte signal vectors can equivalently be extracted from either BCLS or BILS-ME. Equations 8 and 12 reinforce that this equality does not exist in either theory or practice so long as measurement error corrupts the spectra, as there is quite simply no requirement that BILS-ME be even theoretically orthogonal to the pure-component signals of the interfering component signals. The orientation and length of the predictors BILS-ME will change depending on the magnitude and character of the measurement error, pure-component spectra, and the component concentration variances and covariances in the system. The discordance between Lorber’s net analyte signal theory and CLS and ILS-ME as concentration predictors can be additionally examined in terms of their respective theoretical expressions for prediction error bias, variance, and mean-squared error of prediction. The prediction equation, where B* can be either BCLS or BILS-ME is
yˆ Tpred ) (yTpredK + eTX)B*
population space can also be evaluated. In practice, the analyst is often only interested in the concentrations associated with one particular component, in which case only the relevant column of B* is applied in prediction, b*; however, for generality we provide the multicomponent expressions. Prediction Bias. Classical Least Squares. Whether generated from eq 6 or 8, the classical least-squares predictor theoretically satisfies KBCLS ) Ip and eq 13 reduces to
yˆ Tpred ) yTpred + eTXBCLS
Subtracting yTpred from both sides of eq 13 and taking the expectation over the population of samples, the individual component prediction biases are
〈yˆ Tpred - yTpred〉 ) 〈eTXBCLS〉 ) 〈eTX〉BCLS
Therefore, the theoretical prediction bias is zero for all components in CLS, provided the X-block measurement errors eX have an expected value of zero. Inverse Least Squares. BILS-ME is not a pseudoinverse of K (unless there is zero measurement error, and Ψ is diagonal). Therefore, the ILS-ME predictor distributes into the prediction eq 13 as
yˆ Tpred ) yTpredKBILS-ME + eTXBILS-ME
Subtracting yTpred from both sides of eq 16 and grouping results in
yˆ Tpred - yTpred ) (yTpredKBILS-ME - yTpred) + eTXBILS-ME ) yTpred(KBILS-ME - Ip) + eTXBILS-ME (17) The biases resulting from the application of BILS-ME in prediction are given by
〈yˆ Tpred - yTpred〉 ) 〈yTpred〉W + 〈eTX〉BILS-ME
W ) (KBILS-ME - Ip)
For a single-mixture sample under prediction, yTpred are the true p component concentrations and eTX is the true instantiation of the error in the measured spectrum. yˆ Tpred are the concentrations as predicted by applying B*. The theoretical expressions that follow are obtained by examining the expected behavior of eq 13 over a relevant population of samples of varying constitutions and measurement-error observations. In most circumstances, the relevant population pertains to the domain of samples to be observed in true validation, although specific portions of the (33) Stengel, R. F. Optimal Control and Estimation; Dover: New York, 1994. (34) Gelb, A. Applied Optimal Estimation; MIT Press: Cambridge, 1974. (35) Erickson, C. L.; Lysaght, M. J.; Callis, J. B. Anal. Chem. 1992, 64, 1155A1163A. (36) Marbach, R. J. Biomed. Opt. 2002, 7, 130-147. (37) Tikhonov, A. N.; Arsenin. V. Y. Solutions of Ill-Posed Problems; Winston: Washington, DC, 1977. (38) Johnson, J. Econometric Methods; McGraw-Hill Book Co.: New York, 1963.
Analytical Chemistry, Vol. 76, No. 15, August 1, 2004
W is a p × p nonsymmetric matrix whose columns mix the mean concentrations of all components into the prediction bias for the analyte of interest; for example, the dot product of the ith column of W with 〈yTpred〉 contributes to the bias of component i. One can conceive of W as a matrix measure of “how far” BILS-ME is from being a pseudoinverse of K; as KBILS-ME tends to Ip, W tends to 0 and there is no intermixing of component concentrations in prediction. It can also be rewritten as
W ) K(BILS-ME - BCLS)
since KBCLS ) Ip, which indicates that W is also a measure of how far BILS-ME is from the CLS predictor.
For the ILS-ME predictions for component i to be unbiased, the mean concentrations of all components in the system must be zero (or equivalent to their mean calibration values if the data were centered). This implies that, in prediction, if the average concentration of a chemical interference shifts to a position significantly different from that represented in the calibration data, then the predictions for the analyte of interest will become sympathetically biased. Given their time-tested utility, admonitions against calibration extrapolation certainly do not require justification, but it is interesting that eq 18 provides the exact theoretical consequence of such extrapolations in the presence of X-block measurement error, even for the optimally derived predictor. The prediction error bias monotonically increases (absolute value) for samples increasingly removed from the center of the calibration data. Bias Summary. BCLS always renders theoretically unbiased predictions provided the measurement error has zero mean. The inverse least-squares bias for a given component is conditional on each of the following: (1) calibration X-block measurement error variance/covariance; (2) bias of the X-block measurement error relative to the calibration state; (3) average concentrations of all components relative to their calibration states; (4) concentration covariance of all components in calibration; (5) the degree of pure-component spectral correlation that exists between all species. Prediction Error Variance. Applying the predictors as per eq 13, and determining the variance of the expression gives the prediction error variance-covariance matrices for CLS and ILSME. Classical least-squares prediction error covariance:
cov(yˆ Tpred - yTpred)CLS ) BTCLSΣpredBCLS
(A further reduced form of eq 21 is derived in section C of the Supporting Information for situations in which Σ ) Σpred.) Inverse least-squares prediction error covariance:
cov(yˆ Tpred - yTpred)ILS-ME ) WTΨpredW + T ΣpredBILS-ME (22) ΒILS-ME
Σpred is the expected covariance of the X-block measurement error in prediction, Ψpred is the expected concentration covariance in prediction, and W is defined as above.39 Both the CLS and ILSME expressions depend on the propagation of the X-block measurement error, Σpred, although the ILS-ME prediction error covariance expression alone is conditionally dependent on W and Ψpred. The nature of W is such that it will propagate a portion of the concentration variance of the interferences (in Ψpred) into the prediction error variance of another species. Variance Summary. The BILS-ME prediction error variance expression is very complex, and the prediction error variance is conditional on each of the following: (1) calibration X-block measurement error variance/covariance; (2) validation X-block measurement error variance/covariance; (3) degree of purecomponent spectral correlation that exists between all species; (39) Like the derivation of eq 12, we have assumed that the X-block measurement error and the sample concentrations are independent. A similar assumption is made below for the mean-squared error of prediction.
(4) concentration covariance of all components in calibration; (5) concentration covariance of all components in validation. The prediction error variance from BCLS, in contrast, is only dependent on effects 1 (insofar as the calibration error covariance affects the CLS predictor) and 2, being a matter of pure propagation of the measurement error covariance, Σpred. Mean-Squared Error of Prediction. As the MSEP is a function of squared bias and variance, the expressions in eqs 23 and 24 are potentially affected by all of the individual conditions noted for the bias and variance in the preceding paragraphs. Single-component expressions for the ith component are provided for simplicity. Classical Least Squares. The expected MSEP of the ith CLS predictor is given by the expression T MSEPCLS,i ) [bCLS,i Σ predbCLS,i] + [biasCLS,i2]
Inverse Least Squares. The MSEP of the ith ILS-ME predictor is T MSEPILS-ME,i ) [wTi Ψpredwi + bILS-ME,i ΣpredbILS-ME,i] +
[biasILS-ME,i2] (24) where Ψpred and Σpred are defined as above and the bias terms for CLS and ILS-ME are given in eqs 15 and 18. wi represents the ith column of the W matrix from above. Net Analyte Signal Vectors in the Presence of Measurement Error. The classical least-squares predictors, like Lorber’s net analyte signals, are fully selective (for calibrated components) in the IUPAC sense, as their prediction error variance is theoretically unaffected by the mean or variance of other calibrated species in the system. In accordance with their loss function, they are theoretically optimal in terms of prediction accuracy (unbiased predictions unless a disturbance is introduced into the system) but they are in no way optimal predictors of concentration in the mean-squared error sense since the propagation of X-block measurement error is unconstrained. BILS-ME is an optimal concentration predictor in the meansquared error sense, but the expense of this predictive optimality is that the inverse approaches are not at all fully selective in the IUPAC sense. The prediction error bias and variance individually depend on the concentrations of all species in the system, not just the analyte of interest. Consequently, in any circumstance with X-block measurement error, the theoretical concentration predictors BILS-ME will not be consistent with Lorber’s NAS vectors, as their predictive performance characteristics depend on the concentration covariance of all species in the system as well as the magnitude and form of the measurement error covariance. EXPERIMENTAL SECTION To demonstrate the validity of the theoretical expressions for bias, variance, and mean-squared error of prediction, it was necessary to manufacture data with precisely known parameters. We used the method previously characterized by Brown et al.,40 which generates each pure-component spectrum as being the sum (40) Brown, C. D.; Vega-Montoto, L.; Wentzell, P. D. Appl. Spectrosc. 2000, 54, 1055-1068.
Analytical Chemistry, Vol. 76, No. 15, August 1, 2004
Figure 4. Three examples of pure-component spectra generated by the discussed simulation method.
of four Gaussian bands centered at randomly determined positions throughout a wavelength range. Spectra were generated with 100 wavelength channels, with Gaussian bands of width σband ) 15 channels. Such a simulation generates a wide distribution of degrees of spectral overlapsthe degree of correlation between pure-component spectrasbut three examples of pure-component spectra are shown in Figure 4. Concentration and measurement error properties are discussed further in the text where applicable. Simulations of correlated measurement error are further described in section D of the Supporting Information. All calculations were performed in Matlab 6 (The Mathworks, Natick, MA) in a Microsoft Windows environment; simulation code is freely available from the author. RESULTS AND DISCUSSION The major novelty of this work involves the form and function of the inverse least-squares concentration predictor in the presence of measurement error. As equations do not always provide the most transparent understanding of highly multivariate behavior, Figure 5 illustrates the effect of changing only the magnitude of the iid X-block measurement error on the ILS-ME predictor for a two-component system. Although these simulated instrumental measurements have 100 wavelength channels, the pure-component signals and BILS-ME are represented in a simple 2D plane. The subspace representations of the two pure-component signal vectors k1 and k2 are shown in Figure 5a, separated by ∼16°, and both of unit length for simplicity. In a spectroscopic system, these might represent two pure-component IR spectra of moderate
overlap with similar integrated molar absorption. The concentration covariance matrix, Ψ, contains no covariance, but component 1 has a concentration variance of 100 and component 2 has a concentration variance of 1. The theoretical ILS-ME predictor for component 1, b1, was calculated from “first principles” using eq 12, under iid measurement error conditions: Σ ) σ2I, with σ2 running from 0 to 5. The angles between the component 1 concentration predictor and the two pure-component signal vectors are plotted as a function of measurement error variance in Figure 5b. What is asserted to be the NAS vector for component 1, n1 ) b1/|b1|2 (via eq 5), is shown at various levels of measurement error in Figure 5a. With X-block measurement very near zero, n1 aligns closely to where Lorber’s net analyte signal vector and the CLS predictor would besorthogonal to the interfering signal of component 2sbut as the measurement error grows larger in the system, n1 smoothly reorients (positions ii-iv) to a position more collinear with k1 while growing progressively longer. This corresponds to a progressive shortening of the concentration predictor b1. Again, we emphasize that both the length and orientation of the optimal predictor change as a function of the magnitude of the iid measurement error; no error covariance exists in the example. This simple example also serves to illustrate a fallacy that inverse least-squares predictors (i.e., regression vectors) can be interpreted based on the chemical signals alone. Without knowledge of the measurement error properties and component concentration covariance, it is impossible to determine what is “correct” and what is “incorrect” for an inverse predictor, as there are an infinite number of chemically consistent predictors. As the Nobel laureate economist Milton Friedman has said, “the only relevant test of the validity of a [prediction model] is comparison of prediction with experience.”41 Figure 6 compares the measured MSEPCLS and MSEPILS-ME for a controlled set of simulation experiments under variable X-block measurement error conditions. Pure-component spectra were generated by the methods described in the Experimental Section for a six-component system. Sample concentrations were drawn from a multivariate normal distribution with mean [ 500 50 5 0.5 0.05 0.005 ] and Ψ ) diag([ 100 10 1 0.1 0.01 0.001 ]). X-block measurement error variance, σ2, ranged between 0 and 502 units with a simple iid structure Σ ) σ2I. We used the appropriate theoretical expressions (eqs 6 and 12) to generate
Figure 5. Geometric illustration of the orientation of a BILS-ME predictor as a function of the variance of X-block measurement error. (a) Orientation of the two pure-component spectra k1 and k2 separated by an angle θ (16°). The orientation of the scaled predictor, n1, for component 1 (dotted arrows) is shown at increasing levels of measurement error in (i-iv). (b) The angles between n1 and the pure-component spectra k1 and k2 at geometric positions corresponding to those in panel a. 4370
Analytical Chemistry, Vol. 76, No. 15, August 1, 2004
Figure 6. Comparison of the observed mean-squared error of prediction (MSEP) on a common 500-sample validation set for the CLS theoretical predictor versus the ILS-ME theoretical predictor over a wide range of measurement error conditions. The line of identity (equivalent CLS and ILS-ME performance) appears near the left side of the pane.
each of the predictors, and we directly compared their predictive performance based on their MSEPs for component 1 (the largest
component) on a common 500-sample prediction set. The prediction set is a randomly drawn sample from population characteristics identical to the calibration parameters (i.e., Ψ and Σ). The line of identity is plotted in Figure 6 as a visual guide, showing that the MSEP for the ILS-ME predictor is universally better than the MSEP for CLS. The measured biases and MSEPs are shown individually against the appropriate theoretical expressions (eqs 14-24) in Figure 7. Panels a and b compare the observed bias of the ILSME and CLS predictors. Since the CLS predictor has zero expected bias (optimal accuracy), the observed bias is plotted as a function of increasing measurement error. The ILS-ME predictor has an expected nonzero bias from eq 18, so this theoretical value is used on the x-axis in (b). The ILS-ME and CLS predictors show excellent agreement with the theoretical expressions, aside from the sampling error associated with estimating the bias on the y-axis. Panels c and d of Figure 7 show the observed MSEP against the theoretically expected values for MSEP from eqs 23 and 24. There is an almost 2 orders of magnitude difference in the y-axis scale for the ILS-ME and CLS MSEPs, but both plots demonstrate agreement with the theoretical expressions within sampling error. Additional verification of the theoretical expressions under noniid measurement error (Σ * kI) conditions are provided in section D of the Supporting Information.
Figure 7. (a, b) Comparisons of the CLS and ILS-ME bias in terms of their theoretically expected values, and observed values in simulation (CLS theoretical bias is zero, so observed is shown versus X-block error variance). (c, d) Comparisons of the mean-squared error of prediction for the CLS and ILS-ME predictors in terms of their observed MSE prediction versus the theoretical expressions. Dashed lines represent expected 95% coverage intervals for sampling variability.
Analytical Chemistry, Vol. 76, No. 15, August 1, 2004
Figure 8. Norms of the theoretical ILS-ME predictors for a sixcomponent system as a function of the X-block measurement error variance. (A) Component 6 extinction, (B) component 5 extinction, and (C) only the largest component, component 1, remains in the system with a nonzero regression vector.
Also included in Figure 7 are intervals representing the expected 95th percentile for the measured bias and MSEP (biaˆs, and MSˆ EP) assuming normally distributed concentrations and X-block measurement errors:
m˘ MSEILS-ME m˘ MSEILS-ME
bias - tm˜ ,R/2
se2 bias + tm˜ ,R/2 m
x ] se2 m
(m˘ ) m - 1, se2 is the variance of the prediction errors). These interval estimates for MSE will not provide exact 95% coverage, because the MSE is a linear combination of squared bias and variance estimates which are both χ2 random variables, but eq 25 is quite reasonable when the MSEP is dominated by the variance term (as with this simulation). More accurate interval expressions could be calculated using Satterthwaite’s procedure42,43 but this is beyond the scope of this work. As with many central tendency estimators, the expressions for theoretically expected bias and MSEP are not particularly sensitive to the distribution of the component concentrations or measurement errors, that is, whether they are distributed normally, uniformly, etc. We have compared the theoretical MSEP results of eq 24 for both uniformly and lognormally distributed concentrations with Monte Carlo simulation and have found that the MSEP expression remains accurate. (Inferences, or interval estimates relying on the normal distribution assumption would clearly not be applicable in such circumstances, but bootstrap or Monte Carlo methods could easily be used in such circumstances.) The principle advantage of the ILS-ME predictor, its flexibility in balancing sources of variance and bias, is also a source of highly complex behavior not immediately evident from Figure 6 or 7. Figure 8 conveys more of this complexity as it follows the changes (41) Friedman, M., Essays in Positive Economics; University of Chicago Press: Chicago, 1966. (42) Satterthwaite, F. E. Biometrics Bull. 1946, 2, 110-114. (43) Gaylor, D. W.; Hopper, F. N. Technometrics 1969, 11, 691-706.
Analytical Chemistry, Vol. 76, No. 15, August 1, 2004
in a family of optimal concentration predictors purely as a function of X-block measurement error variance. In this illustration, purecomponent spectra and concentration covariance for a sixcomponent system has been generated as it was in Figures 5 and 6. The lengths of the theoretical BILS-ME concentration predictors (|bi|) are shown simply with varying amounts of measurement error. When the measurement error variance is minimal, all components are prominently active in the system relative to the measurement noise. As error variance increasingly corrupts the system, component 6, the smallest component, begins to effectively disappear and its concentration predictor shrinks toward the zero vector (Figure 8A). The zero vector becomes the preferred concentration predictor as the measurement noise swamps the chemical signal, since from the MSEP perspective it is best to propagate no measurement error and predict the mean (or zero). This “species extinction” of component 6 induces a cascade of breakdowns in the predictors for all other species in the system as evidenced by the rapid change in the lengths of their concentration predictors as component 6 becomes extinct. There is a phase of predictor reorientation as the spectral variance of component 6 becomes no more deleterious to MSEP than the measurement errorsthere are diminishing returns to preferentially avoiding the direction of the pure-component 6 signal, so the other predictors reorient to maintain minimum MSEP. This process occurs again as component 5 begins to disappear into the measurement noise (Figure 8B), and once again the predictors for the other species remaining in the system go through breakdowns as component 5 becomes effectively extinct. Moving forward to such a point that only component 1 (the largest component) remains (Figure 8C), we arrive at the expected result that the predictor for component 1 settles nears a length of 1 as it aligns with its pure-component spectrum (which by design in the simulation has a length of 1). Beyond this point even the predictor for component 1 breaks down as the optimal MSE concentration predictor becomes the zero vector. In recent years, a broad class of preprocessing methods has appeared in the literature which employ orthogonal projections to annihilate dimensions in the data associated with interfering signals or combinations of interfering signals, a procedure that appears justifiable from Lorber’s net analyte signal theory. Recent examples of this approach can be found in Xu and Schechter,8 Berger et al.,44 Andersson,45 DiFoggio,21 Ferre and Brown,46 Hansen,47 Roger et al.,48 and Goicoechea and Olivieri49 to name a few. These projection methods theoretically ensure that such orthogonalized concentration predictors are completely impervious to any variances in the annihilated dimensions. The theoretical form of the ILS-ME concentration predictor, however, suggests that orthogonalizations are the most radical corrections that can possibly be made for an interfering signal and are always suboptimal in the MSEP sense. The appropriate adjustment is oblique with respect to the interfering pure-component spectra and requires an assessment of the relative variability that the (44) Berger, A. J.; Koo, T.; Itzkan, I.; Feld, M. S. Anal. Chem. 1998, 70, 623627. (45) Andersson, C. A., Chemom. Intell. Lab. Syst. 1999, 47, 51-63. (46) Ferre, J.; Brown, S. D. Appl. Spectrosc. 2001, 55, 708-714. (47) Hansen, P. W. J. Chemom. 2001, 15, 123-131. (48) Roger, J.-M.; Chauchard, F.; Bellon-Maurel, V. Chemom. Intell. Lab. Syst. 2003, 66, 191-204. (49) Goicoechea, H. C.; Olivieri, A. C. Chemom. Intell. Lab. Syst. 2001, 56, 7381.
interferences contribute to the data. One option is to simply adjust the predictor using the regularization that would naturally occur in the ILS-ME predictor. For example, a predictor developed from a series of instrumental measurements X ˜ and component concentrations y using the standard ILS-ME formula of eq 10 could be modified to accommodate a covariance estimate of the interfering spectral effects, S ) XTnewXnew/m
bˆ adj ) (X ˜ TX ˜ + kS)-1X ˜ Ty
k is included as a tuning parameter, adjusted to appropriately reflect the relative variation of the components in S compared to X ˜ TX ˜ . This expression enables an oblique adjustment to the concentration predictor, although any natural concentration correlations between the effects in Xnew and X will be lost in eq 27. This general procedure is also obviously appropriate for calibration transfer and maintenance adjustments. Another already published option is to generate synthetic calibration data by adding linear combinations of the new interfering signals onto existing calibration data.50 This approach, like the one below in eq 28, cannot account for any concentration covariance effects, but it does permit an oblique adjustment to the predictor if it is beneficial in the MSEP sense. Prediction error bias, variance, and MSE are important practical measures in prediction problems, but the topic of analytical figures of merit has yet to be addressed. If the ILS-ME concentration predictor is (erroneously) used to approximate the net analyte signal vector by eq 5, a number of absurdities result from existing multivariate figure of merit definitions.6 Multivariate sensitivity, defined as the length of the NAS vector ()1/|bILS-ME|), increases as the signal-to-noise ratio of the instrument degrades, and most proposed expressions for multivariate selectivity can provide values much greater than 100% (more than perfectly selective). If Lorber’s original definition of NAS is adhered to for figures of merit calculation, analysts must use either a classical least-squares predictor (which is an impossibility in most applications) or an appropriately unbiased inverse least-squares errors-in-variables estimator, such as
B ˆ EIV ) (X ˜ TX ˜ - Σ)-1X ˜ TY ˜
components regression have demonstrated exceptional utility in constraining and reducing the uncertainty associated with estimating the inverse predictor (the parameter estimation problem). However, since measurement error is ubiquitous and invades any selected subspace of the data, the NAS theory and inverse latent variable regression methods are still discordant in the many ways discussed above. Regardless of their advantages in parameter estimation, the best possible theoretical predictor simply remains that of eq 12, with all of its dependencies. CONCLUSIONS Net analyte signal theory, while a highly intuitive representation of the multivariate calibration process, does not accurately represent the theoretical behavior of either the classical or inverse least-squares predictors in the presence of general X-block measurement error conditions. Lorber’s original NAS concept has an exact theoretical relation to classical least-squares calibration when measurement error is iid; however, a generalization is required under non-iid measurement error conditions (eqs 8 and 9) to maintain theoretical consistency. NAS theory and CLS calibration share a common objective: optimally accurate (unbiased) predictors of concentration. In contrast to literature assertions, the NAS theory is in no way tied to the inverse least-squares calibration approaches. In stark contrast to “optimal accuracy”, the inverse class of methods strives for optimal mean-squared error in prediction and handles measurement error completely differently from the classical leastsquares approach, or even a generalized NAS theory. Despite these significantly deviant outcomes, we eschew attempts to expand Lorber’s NAS theory to encompass the complex behavior of the inverse methods. The complications that are required undermine the elegance and intuitiveness that have made the NAS such a powerful representation in the first place. However, practitioners and educators must be cognizant of this discordance and the limitations it imposes in net analyte signalbased method development, existing multivariate figures of merit, and their intuition about the interplay of measurement error and optimal predictors. ACKNOWLEDGMENT The author gratefully acknowledges Drs. Stephen Vanslyke and Cliona Fleming for suggestions on the manuscript and fruitful discussions on matters related to this work.
Unfortunately neither the BCLS or the BEIV predictors are optimal in the mean-squared error sense for predicting sample constitution, so while this approach leads to figures of merit free of some rather uncomfortable properties, they are only marginally related to the potential performance of the multivariate analytical method. The resolution of these issues is the focus of work currently in preparation. Latent variable regression techniques, while not explicitly addressed in the discussions above, are a special case of the discussed general inverse calibration problem and deserve some mention. Methods such as partial least squares and principal
SUPPORTING INFORMATION AVAILABLE S-A, derivation of trigonometric scaling factor for eq 5; S-B, derivation of the ILS-ME predictor for most general measurement error conditions; S-C, derivation of CLS prediction error variance shortcut under non-iid measurement error; S-D, expanded simulation results validating the theoretical expressions for correlated concentrations and measurement error. This material is available free of charge via the Internet at http://pubs.acs.org.
(50) Haaland, D. M. Appl. Spectrosc. 2000, 54, 246-254.
Received for review January 8, 2004. Accepted May 10, 2004.
Analytical Chemistry, Vol. 76, No. 15, August 1, 2004