Computing Sensitivity and Selectivity in Parallel Factor Analysis and

Jun 29, 2005 - Computing Sensitivity and Selectivity in Parallel Factor Analysis and Related Multiway Techniques: The Need for Further Developments in...
0 downloads 9 Views 122KB Size
Anal. Chem. 2005, 77, 4936-4946

Computing Sensitivity and Selectivity in Parallel Factor Analysis and Related Multiway Techniques: The Need for Further Developments in Net Analyte Signal Theory Alejandro C. Olivieri*

Departamento de Quı´mica Analı´tica, Facultad de Ciencias Bioquı´micas y Farmace´ uticas, Universidad Nacional de Rosario, Suipacha 531, Rosario (S2002LRK), Argentina

Sensitivity and selectivity are important figures of merit in multiway analysis, regularly employed for comparison of the analytical performance of methods and for experimental design and planning. They are especially interesting in the second-order advantage scenario, where the latter property allows for the analysis of samples with a complex background, permitting analyte determination even in the presence of unsuspected interferences. Since no general theory exists for estimating the multiway sensitivity, Monte Carlo numerical calculations have been developed for estimating variance inflation factors, as a convenient way of assessing both sensitivity and selectivity parameters for the popular parallel factor (PARAFAC) analysis and also for related multiway techniques. When the second-order advantage is achieved, the existing expressions derived from net analyte signal theory are only able to adequately cover cases where a single analyte is calibrated using second-order instrumental data. However, they fail for certain multianalyte cases, or when thirdorder data are employed, calling for an extension of net analyte theory. The results have strong implications in the planning of multiway analytical experiments. The recent analytical literature has witnessed interesting developments in three-way chemometric modeling techniques.1 Popular applications are the processing of second-order data such as fluorescence excitation-emission matrices (EEM),2-18 pH, or * To whom correspondence should be addressed. E-mail: aolivier@ fbioyf.unr.edu.ar. (1) Faber, N. M.; Bro, R.; Hopke, P. K. Chemom. Intell. Lab. Syst. 2003, 65, 119-137. (2) Arancibia, J. A.; Escandar, G. M. Talanta 2003, 60, 1113-1121. (3) Damiani, P. C.; Nepote, J. A.; Bearzotti, M.; Olivieri, A. C. Anal. Chem. 2004, 76, 2798-2806. (4) Arancibia J. A.; Olivieri, A. C.; Escandar, G. M. Anal. Bioanal. Chem. 2002, 374, 451-459. (5) Mun ˜oz de la Pen ˜a, A.; Espinosa Mansilla, A.; Gonza´lez Go´mez, D.; Olivieri, A. C.; Goicoechea, H. C. Anal. Chem. 2003, 75, 2640-2646. (6) JiJi, R. D.; Booksh, K. S. Anal. Chem. 2000, 72, 718-725. (7) Escandar, G. M.; Gonza´lez Go´mez, D.; Espinosa Mansilla, A.; Mun ˜oz de la Pen ˜a, A.; Goicoechea, H. C. Anal. Chim. Acta 2004, 506, 161-170. (8) Hergert, L. A.; Escandar, G. M. Talanta 2003, 60, 235-246. (9) Cao, Y. Z.; Mo, C. Y.; Long, J. G.; Chen, H.; Wu, H. L.; Yu, R. Q. Anal. Sci. 2002, 18, 333-336.

4936 Analytical Chemistry, Vol. 77, No. 15, August 1, 2005

kinetically modulated UV-visible spectral information,19,20 or data from hyphenated techniques such as the tandem chromatography-mass spectrometry.21 Third-order data generated by time variations of EEM22-26 or two-dimensional NMR spectra27 due to a chemical reaction have also been recently described. The parallel factor (PARAFAC) method has become the standard for high-order data analysis,28,29 often achieving decomposition of high-dimensional arrays in a unique manner, though requiring the data to be trilinear. This allows concentrations and spectral profiles of sample components to be extracted directly, forming the basis of the so-called second-order advantage,30 an interesting property holding a great potentiality in the analysis of complex samples. Other useful methods for obtaining the secondorder advantage are generalized rank annihilation (GRAM)31 and multivariate curve resolution coupled to alternating least squares (10) Esteves da Silva, J. C. G.; Litao, J. M. M.; Costa, F. S.; Ribeiro, J. L. A. Anal. Chim. Acta 2002, 453, 105-115. (11) Saurina, J.; Leal, C.; Compan ˜o´, R.; Granados, M.; Tauler, R.; Prat, M. D. Anal. Chim. Acta 2000, 409, 237-245. (12) Saurina, J.; Tauler, R. Analyst 2000, 125, 2038-2043. (13) Cao, Y. Z.; Chen, Z. P.; Mo, C. Y.; Wu, H. L.; Yu, R. Q. Analyst 2000, 125, 2303-2310. (14) Beltran, J. L.; Guiteras, J.; Ferrer, R. Anal. Chem. 1998, 70, 1949-1955. (15) Trevisan, M. G.; Poppi, R. J. Anal. Chim. Acta 2003, 493, 69-81. (16) Rodrı´guez-Cuesta, M. J.; Boque´, R.; Rius, F. X.; Pico´n Zamora, D.; Martı´nez Galera, M.; Garrido Frenich, A. Anal. Chim. Acta 2003, 491, 47-56. (17) Moberg, L.; Robertsson, G.; Karlberg, B. Talanta 2001, 54, 161-170. (18) Mahedero, M. C.; Mora Dı´az, N.; Mun ˜oz de la Pen ˜a, A.; Espinosa Mansilla, A.; Gonza´lez Go´mez, D.; Bohoyo Gil, D. Talanta 2004, 65, 806-813. (19) Espinosa-Mansilla, A.; Mun ˜oz de la Pen ˜a, A.; Goicoechea, H. C.; Olivieri, A. C. Appl. Spectrosc. 2004, 58, 83-90. (20) Levi, M. A. B.; Scarminio, I. S.; Poppi, R. J.; Trevisan, M. G. Talanta 2004, 62, 299-305. (21) Wilson, I. D.; Brinkman, U. A. J. Chromatogr., A 2003, 1000, 325-356. (22) Olivieri, A. C.; Arancibia, J. A.; Mun ˜oz de la Pen ˜a, A.; Dura´n-Mera´s, I.; Espinosa Mansilla, A. Anal. Chem. 2004, 76, 5657-5666. (23) Nikolajsen, R. P. H.; Booksh, K. S.; Hansen, A. M.; Bro, R. Anal. Chim. Acta 2003, 475, 137-150. (24) Tan, Y.; Jiang, J. H.; Wu, H. L.; Cui, H.; Yu, R. Q. Anal. Chim. Acta 2000, 412, 195-202. (25) Gui, M.; Rutan, S. C.; Agbodjan, A. Anal. Chem. 1995, 67, 3293-3299. (26) Nahorniak, M. L.; Cooper, G. A.; Kim, Y.-C.; Booksh, K. S. Analyst 2005, 130, 85-93. (27) Jaumont, J.; Marcha´n, V. Gargallo, R.; Grandas, A.; Tauler, R. Anal. Chem. 2004, 76, 7094-7101. (28) Bro, R. Chemom. Intell. Lab. Syst. 1997, 38, 149-171. (29) Andersen, C. M.; Bro R. J. Chemom. 2003, 17, 200-215. (30) Booksh, K. S.; Kowalski, B. R. Anal. Chem. 1994, 66, 782A-791A. (31) Sanchez, E.; Kowalski, B. R. Anal. Chem. 1986, 58, 496-499. 10.1021/ac050146m CCC: $30.25

© 2005 American Chemical Society Published on Web 06/29/2005

(MCR-ALS),32 both valid for second-order data, and the recently introduced combinations bilinear least-squares/residual bilinearization (BLLS/RBL)3,33,34 and trilinear least-squares/residual trilinearization (TLLS/RTL),22 which are applicable to second- and third-order data, respectively. Multiway partial least-squares (nPLS) is a methodology extending the well-known PLS method to higher dimensions, but unfortunately it does not exhibit the second-order advantage.35 Likewise, the unfolded variants of PLS and principal component regression (PCR),36 both working after unfolding the various data arrays, i.e., converting them into vectors, are also available for analyzing high-order data, but they do not share the second-order advantage. Research on figures of merit is a challenging subject of modern chemometrics,37-51 both in the multiway domain and also in the traditional first-order multivariate calibration scenario (i.e., when instruments deliver vector data for each studied sample instead of two- or higher-dimensional arrays). Figures of merit are parameters that analysts employ for characterizing, comparing, or developing new analytical methods. A highly useful concept in estimating and interpreting figures of merit in multivariate calibration is the net analyte signal (NAS), first introduced by Lorber.52 It involves decomposition of the total signal in two parts, one that could be uniquely assigned to the analyte of interest (the NAS) and the remaining one arising from other components. The sensitivity can then be defined as the NAS generated by unit analyte concentration and the selectivity as the ratio between the NAS and the total analyte signal.52 Knowledge of the sensitivity provides access, by error propagation, to the standard prediction error in estimated concentrations. If unknown samples are not high-leverage ones, the latter furnish realistic estimations of the uncertainty expected in real applications. The usefulness of mathematical expressions providing the sensitivity has been thoroughly discussed in the first-order domain.44,46 This analytical approach, however, does not work in general for higher-order data, as supported by experimental53,54 as well as theoretical45 evidence. Sensitivity expressions based on (32) Tauler, R. Chemom. Intell. Lab. Syst. 1995, 30, 133-146. (33) Linder, M.; Sundberg, R. Chemom. Intell. Lab. Syst. 1998, 42, 159-178. (34) Linder, M.; Sundberg, R. J. Chemom. 2002, 16, 12-27. (35) Bro, R. J. Chemom. 1996, 10, 47-61. (36) Wold, S.; Geladi, P.; Esbensen, K.; Øhman, J. J. Chemom. 1987, 1, 41-56. (37) Messick, N. J.; Kalivas, J. H.; Lang, P. M. Anal. Chem. 1996, 68, 15721579. (38) Ho, C-. N.; Christian, G. D.; Davidson, E. R. Anal. Chem. 1980, 52, 10711079. (39) Wang, Y.; Borgen, O. S.; Kowalski, B. R.; Gu, M.; Turecek, F. J. Chemom. 1993, 7, 117-130. (40) Appellof, C. J.; Davidson, E. R. Anal. Chim. Acta 1983, 146, 9-14. (41) Faber, N. M.; Buydens, L. M. C.; Kateman, G. J. Chemom. 1994, 8, 147154. (42) Faber, N. M. J. Chemom. 2001, 15, 743-748. (43) Faber, K.; Lorber, A.; Kowalski, B. R. J. Chemom. 1997, 11, 419-461. (44) Faber K.; Kowalski, B. R. J. Chemom. 1997, 11, 181-238. (45) Liu, X.; Sidiropoulos, N. IEEE Trans. Signal Process. 2001, 49, 2074-2086. (46) Olivieri, A. C. J. Chemom. 2002, 16, 207-217. (47) Faber, N. M.; Boque´, R.; Ferre´, J.Chemom. Intell. Lab. Syst. 2001, 55, 6790. (48) Faber, N. M.; Boque´, R.; Ferre´, J.Chemom. Intell. Lab. Syst. 2001, 55, 91100. (49) Boque´, R.; Ferre´, J.; Faber, N. M.; Rius, F. X. Anal. Chim. Acta 2002, 451, 313-321. (50) Faber, N. M.; Bro R. Chemom. Intell. Lab. Syst. 2002, 61, 133-149. (51) Riu, J.; Bro R. Chemom. Intell. Lab. Syst. 2003, 65, 35-49. (52) Lorber, A. Anal. Chem. 1986, 58, 1167-1172. (53) Olivieri, A. C.; Faber, N. M. Chemom. Intell. Lab. Syst. 2004, 70, 75-82.

Table 1. Second-Order NAS Definitions and Sensitivity for Different Algorithms method unfold PLS and PCR nPLS GRAM GRAM PARAFAC PARAFAC BLLS/naı¨ve predictor BLLS/LS predictor BLLS/RBL nPLS PARAFAC PARAFAC TLLS/LS predictor TLLS/RTL a

refa

second-order advantage

Second-Order Data 36 no 35 no 31 no 31 yes 28 no 28 yes 33,34 no 33,34 no 3,34 yes Third-Order Data 35 no 28 no 28 yes 22 no 22 yes

NAS definition MKL ? HCD HCD MKL HCD HCD MKL ?

refb 61 42 42 53 54 34 34

? ? ? ? ?

Reference to method. b Reference to NAS definition.

the NAS concept have only been developed for GRAM,49 for BLLS not exploiting the second-order advantage,33,34 and also for PARAFAC,53,54 in the latter case by only probing simple examples, where a single analyte is calibrated and a single interference occurs in unknown samples. Little is known in the third-order scenario. The known multiway sensitivity equations are based on either of two second-order NAS versions, independently developed by Messick, Kalivas, and Lang (MKL)37 and by Ho, Christian, and Davidson (HCD).38 Although both are straightforward extensions of the first-order NAS, they are seemingly different, with the HCD sensitivity lower than the MKL one. A third alternative should be mentioned: Wang et al. have defined the NAS by only considering the effect of the analyte of interest, regardless of the profile overlap with other (calibrated or uncalibrated) components.39 This definition is of limited use within the present discussion. Table 1 collects the current knowledge on the connection between the MKL and HCD sensitivity definitions and several multiway methodologies, illustrated with the corresponding literature references. It is remarkable that the HCD value, rigorously proved to be operative for GRAM, is also applicable to a radically different least-squares procedure such as PARAFAC exploiting the second-order advantage (Table 1). It should be stressed, however, that the latter conclusion has been reached on the consideration of rather simple analytical systems.54 The presently discussed results imply that the latter is by no means a general phenomenon. It is also important to note that there are still gaps to be filled in Table 1, notably for algorithms processing thirdorder data with the second-order advantage. An educated guess is that the MKL approach would be connected to PARAFAC and TLLS for cases where the second-order advantage does not operate, as with the second-order counterparts. However, important situations handling unsuspected interferences are still undefined. Furthermore, as will be shown below, even for second-order data where the connection has already been established, the NAS definitions need to be revisited. (54) Olivieri, A. C. J. Chemom. 2004, 18, 363-371.

Analytical Chemistry, Vol. 77, No. 15, August 1, 2005

4937

One way to assess the sensitivity in multiway analysis is by means of Monte Carlo simulations, which constitute a useful link between theory and experiment. They have been previously employed in the analytical chemistry context for tackling a wide variety of problems, such as performing principal component analysis,55-57 estimating the degrees of freedom for complex modeling procedures,58 or assessing the quality of latent variable calibrations.59 In this report, extensive Monte Carlo numerical simulations have been carried out by adding random noise to second- and third-order data for several binary and ternary examples, in the absence and presence of the second-order advantage. They allow one to estimate the so-called variance inflation factor for a particular component in a sample (VIFn, with units of concentration2 × signal-2 and n specifying the component number). The value of VIFn quantifies the inflation of the instrumental noise variance when transmitted to a specific estimated concentration. Therefore, the sensitivity for a particular component (SENn) can be derived from VIFn by the expression

SENn ) (VIFn)-1/2

(1)

and then employed to estimate SENn in cases where no formal equation is currently available. In principle, it is not evident that the operational definition provided by eq 1 should be identical to a closed expression derived from NAS theory. Some of the present Monte Carlo results do confirm the validity of MKL and HCD sensitivity expressions in certain scenarios. However, previously overlooked cases, particularly multianalyte systems, seem to require further developments in NAS theory to be fully understood. Furthermore, when the order of the instrumental data is higher than two, and the second-order advantage is exploited, none of the existing equations takes proper account of the presently reported results. In a context where inconsistencies in calibration based on net analyte theory in the first-order domain have been noted,60 the present results are pointing to a revalorization of the NAS theory in the second-order scenario. A full multiway NAS approach will allow one to assess important figures of merit with all the advantages displayed by analytical expressions in terms of physical interpretability. Until such developments are realized, however, the present class of Monte Carlo calculations may provide multiway users with realistic measures of sensitivity. The results herein discussed should be highly useful in experimental planning. It is important to notice that the secondorder advantage is attained using data of any order higher than one, with only an apparent gain in sensitivity when the data order is increased. Increasing the order might require a considerable experimental effort. Thus, it is important to be able to estimate the sensitivity gain stemming from a particular combination of instrumental orders, for properly planning and optimizing a multiway analytical protocol. (55) Keller, H. R.; Massart, D. L. Anal. Chim. Acta 1991, 246, 379-390. (56) Thomas, E. V.; Haaland, D. M. Anal. Chem. 1990, 62, 1091-1099. (57) Gemperline, P. J.; Long, J. R.; Gregoriou, V. G. Anal. Chem. 1991, 63, 23132323. (58) Ye, J. J. Am. Stat. Assoc. 1998, 93, 120-131. (59) Keller, H. R.; Roettele, J.; Bartels, H. Anal. Chem. 1994, 66, 937-943. (60) Brown, C. D. Anal. Chem. 2004, 76, 4364-4373. (61) Faber, N. M.; Ferre´, J.; Boque, R.; Kalivas, J. H. Chemom. Intell. Lab. Syst. 2002, 63, 107-116.

4938

Analytical Chemistry, Vol. 77, No. 15, August 1, 2005

THEORY Terminology. A given instrument may deliver first-order (vector) data for a single sample. Joining these types of data for several samples into a matrix produces a two-way array, and methodologies for its analysis are called first-order multivariate calibration methods, of which PCR and PLS are prime examples. When the sample data become second-order (a matrix), a data set for several samples can be conveniently arranged into a threeway array, and methods of analysis are called second-order (or three-way) multivariate calibration methods (PARAFAC, BLLS, GRAM, nPLS). The latter terminology is easily generalized to higher orders: for example, if a sample produces third-order data, a data set can be arranged into a four-way array, and third-order (also called four-way) calibration methods can be employed for analysis. Further possibilities are as follows: (1) to vectorize the higher-order sample data and then employ first-order multivariate methods (the unfolded variants of PCR and PLS) or (2) to arrange the second-order data set into an augmented matrix instead of producing a three-way array, as is regularly done in MCR-ALS. Among the above-mentioned multivariate methods, the most interesting one for the analytical chemist is PARAFAC (with BLLS/TLLS rapidly becoming good competitors) and hence they are the main focus of the present work. Multiway Algorithms. The theory of the second- and thirdorder chemometric methods employed in the present report is known, and thus, a brief description is given below. PARAFAC decomposes a three-way array obtained by joining second-order data for the calibration and test samples according to28 N

Xijk )

∑a

inbjnckn

+ Eijk

(2)

i)1

where Xijk is the intensity recorded for the ith. sample at j and k channels, N is the total number of responsive components, ain is the relative concentration (or score) of component n in the ith sample, bjn and ckn are the profile intensities in each dimension, and Eijk are the elements of the array E, which is a residual error term of the same dimensions as X [(I + 1)J × K, I is the number of calibration samples, and hence, the first dimension of X is (I + 1), since the test sample is also included in X]. The profiles are collected, normalized to unit length, into matrices B and C, of size J × N and K × N respectively, where J and K are the number of data points in each dimension. Equation 2 is usually fitted by alternating least squares. A pseudounivariate regression of the calibration scores against reference values, and subsequent interpolation of the test score, allows one to estimate the component concentration in the test sample. A similar scheme is applied to four-way arrays, requiring the introduction of an extra profile intensity dln for component n at channel l in eq 2, collected into an L × N matrix D. Notice that components not considered within the calibration samples are accounted for during the least-squares decomposition, leading to the obtainment of the second-order advantage. In the case of GRAM, which is only applicable to second-order data, component profiles and scores are found by resorting to an eigenvalue problem involving matrix data for two samples.31

The BLLS method does not include the test sample in the calibration step, in which approximations to pure analyte matrices Sn at unit concentration are found by direct least squares. To estimate the latter ones, the calibration data matrices are first vectorized and grouped into a JK × I matrix VX:61

VX ) [vec(Xc,1) | vec(Xc,2) |...| vec(Xc,I)]

(3)

where the profiles bint and cint are obtained by a procedure that involves the minimization of the residual matrix Eu, computed while fitting the sample data to the sum of the various component contributions: Nc

Xu )

∑g

n′

bn′ (cn′)Tyn′,u + gintbint(cint)T + Eu

(10)

n′)1

In the latter equation, Xc,i is the matrix of second-order signals for calibration sample i, and “vec” implies the unfolding operation. Then a procedure analogous to classical least-squares is performed:61

During the minimization, the profiles are estimated by SVD of a residual matrix which is obtained by subtracting the analyte contributions from the total signal: Nc

VS ) VX YT+

(4)

(gint, bint, cint) ) SVD1[Xu -

∑g

n′

bn′ (cn′)Tyn′,u] (11)

n′)1

where Y is an I × Nc matrix collecting the nominal calibration concentrations, Nc is the number of calibrated analytes, + implies the pseudoinverse operation, and VS (size JK × Nc) contains the vectorized Sn matrices:

VS ) [vec(S1)| vec(S2) |...| vec(SNc)]

(5)

To obtain the profiles in both dimensions, which are present in Sn, two procedures have been discussed: the least-squares (LS) profile estimator and the singular value decomposition (SVD) profile estimator;33,34 the most reliable and simple seems to be the latter one. In this case, the component profiles are obtained by single-component singular value decomposition (SVD1) of each of the J × KSn matrices:33,34

(gn, bn, cn) ) SVD1(Sn)

(6)

where gn is the first singular value and bn and cn are the J × 1 and K × 1 left and right eigenvectors of Sn, respectively. This completes the calibration process. After calibration, there are two procedures for concentration estimation, a so-called naı¨ve predictor and a least-squares predictor.33,34 The latter one has been shown to be more reliable and leads to the following prediction equation, provided no interferences occur in the unknown sample:61

yu ) Scal+ vec(Xu)

(7)

where yu is an Nc × 1 vector holding the concentrations of the Nc analytes in the sample and Scal is a JK × Nc matrix given by

Scal ) [g1 (c1 X b1) | g2 (c2 X b2) |...| gNc (cNc X bNc)] (8)

The analyte concentrations are still provided by an expression similar to eq 7, except that Scal is replaced by Sexp. The TLLS method operates in a similar manner, except that matrices are replaced by third-order arrays, and the profiles are found by a Tucker3 operation instead of by SVD.22 Error Propagation. Relevant to the present study is the fact that, for a system composed of the same chemical constituents, different sensitivities are expected, depending on whether one resorts to the second-order advantage or not.54 The discussion usually starts with the expression for estimating concentrations in a test sample:

yn,u ) βnT vec(Xu)

(12)

where yn,u is the estimated concentration of component n in the unknown sample, βn is the vector of regression coefficients for this particular sample component, and Xu the data array for the test sample. Particularly simple expressions are obtained when the secondorder advantage is not exploited and the calibration data are precise, i.e., βn does not carry significant uncertainties because the relative uncertainty in calibration concentrations is significantly smaller than in instrumental signals. In this case, simple error propagation can be applied to eq 12, showing that the variance in the estimated concentrations is directly proportional to the variance in the instrumental signals (assumed to be identically and independently distributed). The proportionality constant is the variance inflation factor VIFn, equal to the inverse squared sensitivity (SENn), and also to the squared length of the regression coefficients:

var(yn,u) ) VIFn,without sX2 ) SENn,without-2sX2 ) ||βn||2 sX2 (13)

where X implies the well-known Kronecker product. The occurrence of uncalibrated compounds in a test sample is investigated after the calibration step is completed. If an interference occurs, the situation is handled by a separate residual bilinearization (RBL), in which profiles for the interference are found and incorporated into an expanded version of Scal:34

Sexp ) [Scal | gint(cint X bint)]

(9)

where sX2 is the variance in the instrumental signals, and “without” implies that the second-order advantage is not employed. On the other hand, if the second-order advantage is applied, the uncertainties in the test sample signals affect the regression coefficients in such a way that the variance is higher and the sensitivity is lower than that expected on the basis of simple error propagation analysis of eq 12, i.e.:54 Analytical Chemistry, Vol. 77, No. 15, August 1, 2005

4939

var(yn,u) ) VIFn,with sX2 ) SENn,with-2sX2 > ||βn||2 sX2 (14)

On the other hand, the exact form of the HCD sensitivity is

SENn,HCD ) kn{[(BTB)-1]nn[(CTC)-1]nn}-1/2 where “with” implies usage of the second-order advantage. In both GRAM and PARAFAC, this phenomenon is due to the fact that the test sample is processed together with the calibration set in order to obtain the regression coefficients for a particular analyte. In the combinations BLLS/RBL and TLLS/RTL, on the other hand, the inequality (14) arises from the RBL and RTL methodologies, where the analyte concentrations are estimated by a procedure involving data for the test sample. For the success of the approach taken in the present work, an important fact is that all test samples processed with the secondorder advantage show a common sensitivity parameter SENn,with, independent of the sample component concentrations. In a real case, this implies that the chemical components and spectral overlapping in the studied samples are similar. This allows one to estimate the sensitivity for each particular calibration situation, by computing the variance inflation factors accompanying the estimated concentration of a given analyte at various noise levels and subsequently regressing the Monte Carlo statistical variances var(yn,u) against the values of sX2. The selectivity for analyte n (SELn), in turn, can be calculated by dividing the sensitivity by the total signal ascribed to a particular analyte. The latter one is available to the various algorithms, because they efficiently separate the analyte signals from the overall data. Sensitivity and Selectivity Expressions. Two important sensitivity equations have been discussed in the literature, both based on extensions of Lorber’s NAS theory to the second-order domain. The expressions are usually given in terms of a matrix projecting the instrumental data for a particular sample onto the NAS space. More useful equations are obtained by considering that the various high-order chemometric methods provide estimated component profiles in the different dimensions. The mathematical form of the MKL sensitivity for constituent n, as a function of the component profiles, is

SENn,MKL ) kn{[(BTB)‚(CTC)]-1}nn-1/2

(15)

for second-order data, while its straightforward extension to thirdorder data is

SENn,MKL ) k′n{[(BTB)‚(CTC)‚(DTD)]-1}nn-1/2 (16)

In eqs 15 and 16, nn designates the (n,n) element of a matrix, kn and k′n are the total signals (second- or third-order, respectively) for component n at unit concentration, and the symbol “‚” is the element-wise Hadamard product matrix. Notice that the full sensitivities (kn or k′n) decrease in the presence of other sample constituents, by a degree that depends on the profile overlapping governed by the square root factor in eqs 15 and 16. This MKL approach to NAS has already been rigorously proved to be connected to cases where the second-order advantage is not exploited, for example, when employing PARAFAC53 or BLLS with the so-called least-squares predictor.34,61 4940

Analytical Chemistry, Vol. 77, No. 15, August 1, 2005

(17)

for second-order data and extended one further dimension

SENn,HCD ) k′n{[(BTB)-1]nn[(CTC)-1]nn[(DTD)-1]nn}-1/2 (18) for third-order data. Notice that the degree by which the full sensitivity is affected by profile overlapping in the HCD expressions differs from that for the MKL counterparts. The HCD approach to NAS has been proved to be connected to the GRAM method42 and also to BLLS with the so-called naı¨ve predictor.34,61 In the case of PARAFAC achieving the second-order advantage, only simple cases have been studied involving secondorder data, with the connection to the HCD approach recently established on the basis of Monte Carlo numerical simulations.54 Less is known of either BLLS/RBL or TLLS/RTL, both showing the second-order advantage, and, in general, on the use of thirdorder data. Since the selectivity is defined as the ratio between the NAS and the total signal produced by a given sample component, the corresponding HCD and MKL selectivities for second- or thirdorder data can be simply obtained by dividing eqs 15-18 by either kn or k′n, as required. Numerical Simulations. To perform Monte Carlo noise addition simulations, noiseless profiles for three different components in three dimensions (at unit concentration) are first defined (see Figure 1). They are similar to those recently described in the four-way analysis of complex biological samples using excitation-emission fluorescence matrices modulated by the kinetics of a chemical reaction.22 The three dimensions contain 32, 11, and 10 data points respectively, and mimic the emission, excitation, and time profiles for this particular experiment. From these theoretical profiles, data arrays were created for different binary and ternary systems, as summarized in Table 2. These arrays were built for sets of calibration samples using specific calibration designs (Table 2). Five different test samples having component concentrations taken at random from the corresponding calibration ranges were also produced in each case. Three different binary systems are formed by components 1 and 2. In system B1, both components are present in the calibration samples, while the test samples contain no spectral interference. In systems B2 and B3, each of these components is calibrated in turn, with the remaining one acting as an interference that is only present in the test samples, requiring strict adherence to the second-order advantage for successful concentration estimation. In the ternary systems, composed of 1, 2 and 3, three different situations occur: T1, where components 1 and 2 are both calibrated and 3 is the interference, and T2/T3, where each of the analytes 1 or 2 is calibrated, and two interfering agents appear in the test samples (see Table 2). All systems described in Table 2 were studied using different algorithms and also with different combinations of profiles in the three dimensions. Table 2 collects the nomenclature selected for distinguishing the combinations of multivariate calibration methods and dimensions, which were employed in analyzing each of the systems described above. For example, when a given system,

Table 2. Binary and Ternary Systems Analyzed with Monte Carlo Numerical Simulations, and Nomenclature Concerning Methods and Dimensions Employed for Multivariate Calibration

system

Description of Systems number of calibrated calibration analyte(s) samplesa interference

second-order advantage required

B1 B2 B3

1 and 2 1 2

Binary Systems 9 none 3 2 3 1

no yes yes

T1 T2 T3

1 and 2 1 2

Ternary Systems 9 3 3 2 and 3 3 1 and 3

yes yes yes

Nomenclature dimensions

method PARAFAC

1 and 2 1 and 3 2 and 3 1, 2, and 3 1 and 2 1 and 3 2 and 3 1 and 2 1 and 3 2 and 3 1, 2, and 3

BLLSb

GRAM TLLSc

nomenclature P-12 P-13 P-23 P-123 B-12 B-13 B-23 G-12 G-13 G-23 T-123

a When two calibrated analytes occur, the calibration design is a full factorial at three levels for each analyte (0, 0.5, 1). When a single analyte occurs, three equally spaced levels (0, 0.5, 1) are employed. b The BLLS version applied uses the SVD profile estimator and the least-squares predictor. When the second-order advantage is required, BLLS is coupled to RBL. c The TLLS version is employed as described in ref 22. When the second-order advantage is required, TLLS is coupled to RTL.22

Figure 1. Component profiles in the three dimensions, normalized to unit length. (A) First dimension, (B) second dimension, and (C) third dimension. Components are indicated by their corresponding numbers. The profiles intend to mimic a kinetic fluorescence excitation-emission experiment, where the first dimension is emission, the second one is excitation, and the third one is time.

B2, is analyzed using BLLS data generated with second-order data involving the second and third dimensions (see Figure 1), the nomenclature is B2-B-23 (Table 2). In mathematical terms, second-order calibration data for one of the most complex ternary system T1 (Xc,i matrices) were created from 2

Xc,i )

∑y

n,c,iSn

(19)

n)1

where yn,c,i is the nominal concentration of analyte n in the calibration standard i and Sn is the corresponding matrix of signals

at unit concentration, i.e., Sn ) knbncnT. The specific profiles used to build the Sn matrices could in turn be any of the three possible combinations “12”, “23”, or “13”. Unknown sample data matrices Xu for system T1 were created in a manner similar to that described by eq 19, but adding the signal for the interference and random noise: 2

Xu ) [

∑y

n,uSn]

+ y3,uS3 + RsX

(20)

n)1

where yn,u is the nominal concentration of each component in the unknown, R is a matrix of Gaussian random numbers with unit standard deviation, of the same size as Xu, and sX is the standard deviation of the noise added to signals. Similar considerations were applied in creating data for the binary systems described in Table 2 and also when data were of third order. It is important to consider that when second-order data are employed, the total signal for a given component is taken at a certain value for the corresponding profile in the remaining dimension. For the combination of dimensions “12”, for example, the component signals were multiplied by the value of the corresponding profile at sensor 10, at which both analytes display maximum intensity (Figure 1). For the “13” combination, the Analytical Chemistry, Vol. 77, No. 15, August 1, 2005

4941

values were taken at the compromise sensor 4. On the other hand, for the combination “23”, it is difficult to find a sensor for dimension 1 at which both analytes show a comparable intensity. Therefore, the scaling factor depended on the system: for B2 and T2, where analyte 1 is calibrated, the maximum for this component was considered (i.e., at sensor 28), for B3 and T3, the maximum for analyte 2 was taken (at sensor 10), and for B1 and T1, data were simulated as if two separate calibrations were performed, one for each analyte, at their corresponding maximums. In all cases, the analytes of interest were those calibrated, i.e., both components 1 and 2 in systems B1 and T1 or a single one (component 1 in B2 and T2, and component 2 in B3 and T3; see Table 2). PARAFAC was applied in the usual way in which these steps are followed: (1) data for each unknown sample in turn are joined with calibration data to create a three- or four-way array, (2) the latter array is decomposed, (3) relative concentrations or scores are regressed against nominal concentrations, and (4) the score for the test sample is interpolated in the pseudounivariate graph to estimate its concentration.29 The number of components (N in eq 2) was set to two for binary systems and to three for ternary ones. In the case of BLLS, the so-called SVD estimator was employed for component profile estimation, and the leastsquares predictor was used for concentration estimation.33,34 When the second-order advantage was required, BLLS and TLLS were coupled to RBL and RTL, respectively. Both for BLLS and TLLS, Nc was set at the number of calibrated analytes. It should be noticed that GRAM works with data for two samples, and thus when this method was applied, the test sample was joined to a virtual sample corresponding to the average of the calibration standards. The number of components for GRAM were identical to those for PARAFAC. Six different values of instrumental uncertainty (sX ) 0.0005, 0.001, 0.0025, 0.005, 0.0075 and 0.01) were considered to be only present in the unknown sample data arrays, while keeping the calibration precise, as shown by eqs 19 and 20. The calibration/ prediction procedure was repeated for these six values using different random seeds in each case (usually 10 000 times until proper convergence was achieved) and a statistic was registered of the estimated concentrations for each test sample. The Monte Carlo calculated variances were averaged and regressed against the values of sX2, to estimate the variance inflation factor for each situation. The sensitivity for each calibrated analyte was taken as the inverse square root of the variance inflation factor, and the selectivity was computed as the sensitivity divided by the corresponding total signal for the analyte. In each case, the differences between nominal concentrations in the test set and those estimated by the various methods were also computed, allowing to assess the root-mean-square error (RMSE) in the test set. Software. All calculations were implemented with suitable MATLAB 6.0 routines.62 The algorithms for multivariate calibration have been obtained from the Internet page maintained by Bro63 (GRAM and PARAFAC) or developed in our laboratory (BLLS or TLLS) as already described.3,22 RESULTS AND DISCUSSION Variance Inflation Factor and Sensitivity. As shown by eqs 13 and 14, variance inflation factors quantify the inflation of the (62) MATLAB 6.0, The MathWorks Inc., Natick, MA, 2000. (63) http://www.models.kvl.dk/source/

4942 Analytical Chemistry, Vol. 77, No. 15, August 1, 2005

Figure 2. (A) log-log plot of Monte Carlo standard deviations versus noise level for component 1 in the case of the two combinations system/method/dimensions B1-P-12 (circles) and B2-P-12 (triangles). The straight lines indicate the results expected on the basis of the MKL (solid) and HCD (dotted) NAS theories. (B) Same as (A) for component 2 in the case of the combination T1-P-13 (circles). The straight lines have the same meaning as in (A).

instrumental noise when transmitted to a specific estimated concentration. The main assumption in the present work is that the inverse square roots of the variance inflation factor [(VIFn)-1/2] can be taken as a measure of the sensitivity toward a particular analyte (see eq 1). As discussed above, it is important that Monte Carlo simulations support the independency of the VIFn parameter on component concentrations, especially in cases where the test sample takes part into the calibration process, as required by the second-order advantage. As an example, Figure 2 shows the Monte Carlo standard deviations in estimated concentrations for a given component in five different test samples, as a function of the noise level introduced in the test sample signals, for three different combinations of system, method, and dimensions (see Table 2). The results clearly show that (1) the VIFn factor does not depend on analyte concentrations and (2) there is a linear relationship between [var(yn,u)]1/2 and sX, as required by eqs 13 and 14. Further details on the (VIFn)-1/2 values properly representing the sensitivity will be given below, in terms of the comparison of estimated and Monte Carlo RMSE for all systems studied. Sensitivities and NAS Theory. Typical Monte Carlo results are pictorially shown in Figure 2A for systems B1 and B2, which are seen to follow the existing alternative NAS theories, because

Table 3. Monte Carlo Results on Variance Inflation Factors for the Binary Systems, and Sensitivities and Selectivities Computed with MKL and HCD Net Analyte Signal Approachesa sensitivity (selectivity) systemmethoddimensions

(SELn)

analyte 1

analyte 2

B1-P-12 B1-B-12 B1-G-12

0.50 (0.99) 0.51 (0.99) 0.24 (0.47)

B1-P-13 B1-B-13 B1-G-13

MKL

HCD

analyte 1

analyte 2

analyte 1

analyte 2

0.39 (0.99) 0.39 (0.99) 0.20 (0.51)

0.51 (0.99)

0.39 (0.99)

0.26 (0.51)

0.20 (0.51)

0.45 (0.99) 0.46 (0.99) 0.10 (0.22)

0.40 (0.99) 0.42 (0.99) 0.10 (0.24)

0.45 (0.99)

0.42 (0.99)

0.11 (0.24)

0.10 (0.24)

B1-P-23 B1-B-23 B1-G-23

0.19 (0.53) 0.19 (0.53) 0.05 (0.14)

0.19 (0.53) 0.19 (0.53) 0.04 (0.14)

0.20 (0.55)

0.20 (0.55)

0.04 (0.12)

0.04 (0.12)

B1-P-123 B1-T-123

0.99 (0.99) 0.99 (0.99)

0.99 (0.99) 0.99 (0.99)

0.99 (0.99)

0.99 (0.99)

0.12 (0.12)

0.12 (0.12)

B2-P-12 B2-B-12 B2-G-12

0.26 (0.51) 0.24 (0.47) 0.27 (0.53)

0.51 (0.99)

0.26 (0.51)

B2-P-13 B2-B-13 B2-G-13

0.11 (0.24) 0.11 (0.24) 0.10 (0.22)

0.45 (0.99)

0.11 (0.24)

B2-P-23 B2-B-23 B2-G-23

0.04 (0.11) 0.04 (0.11) 0.04 (0.11)

0.20 (0.55)

0.04 (0.12)

B3-P-12 B3-B-12 B3-G-12

0.19 (0.49) 0.19 (0.49) 0.20 (0.51)

0.39 (0.99)

0.20 (0.51)

B3-P-13 B3-B-13 B3-G-13

0.11 (0.26) 0.10 (0.24) 0.11 (0.26)

0.42 (0.99)

0.10 (0.24)

B3-P-23 B3-B-23 B3-G-23

0.04 (0.11) 0.05 (0.14) 0.04 (0.11)

0.20 (0.55)

0.04 (0.12)

B2-P-123 B2-T-123 B3-P-123 B3-T-123 a

(VIFn)-1/2

0.99 (0.99)

0.73 (0.73) 0.73 (0.73) 0.78 (0.78) 0.78 (0.78)

0.12 (0.12) 0.99 (0.99)

0.12 (0.12)

Values in boldface type are considered anomalous, in the sense that they do not agree with the expectations based on either NAS theory.

the slope of the straight line connecting the Monte Carlo simulated values is close to that expected on the basis of eq 15 for system B1 and of eq 16 for system B2. However, Figure 2B shows a deviation from the expectations of either NAS theory for component 2 in the ternary system T1, although a linear relationship between standard errors and noise is still preserved. Specific Monte Carlo results are shown in Table 3 for the binary systems, analyzed by various second- and third-order chemometric methods, and using different combinations of dimensions. In these binary systems, the Monte Carlo (VIFn)-1/2 values generally agree with the sensitivity parameters computed from NAS theory for all three methods PARAFAC, BLLS, and GRAM. When the second-order advantage does not operate (i.e., system B1), PARAFAC and BLLS display the MKL sensitivity, as expected.34,53 Notice that the smallest figures of merit are obtained for second-order data using dimensions 2 and 3, for which the degree of profile overlap is apparently more intense (Figure 1). On the other hand, when the second-order advantage is exploited (systems B2 and B3), both PARAFAC and BLLS (now combined with RBL) show the lower HCD sensitivity, which is the expected

result for the former method,54 but new for the BLLS/RBL combination. The GRAM method, on the other hand, achieves the HCD sensitivity in all cases, as expected.37 These results mean that PARAFAC and BLLS are more efficient than GRAM when the second-order advantage is not strictly required. If attention is paid to the second-order advantage, however, GRAM is as efficient as PARAFAC and BLLS for the binary systems. Notice also that the smallest HCD sensitivities are reached for the high-overlapping combination of dimensions (2 and 3). The results for third-order data are compatible with MKL results when the second-order advantage is not achieved (system B1), as previously anticipated. However, “anomalous” values are found in Table 3 corresponding to systems B2 and B3 requiring the second-order advantage, where (VIFn)-1/2 are significantly higher than those expected from HCD considerations. This important result can be interpreted as implying sensitivity and selectivity for the second-order advantage using third-order data, which are higher than the expectations based on a simple extension of HCD theory to the third dimension. In fact, eqs 1518 show that the HCD selectivity associated with third-order data Analytical Chemistry, Vol. 77, No. 15, August 1, 2005

4943

Table 4. Monte Carlo Results on Variance Inflation Factors for the Ternary Systems, and Sensitivities and Selectivities Computed with MKL and HCD Net Analyte Signal Approachesa sensitivity (selectivity) systemmethoddimensions

a

(VIFn)-1/2

(SELn)

MKL

HCD

analyte 1

analyte 2

analyte 1

analyte 2

analyte 1

analyte 2

T1-P-12 T1-B-12 T1-G-12

0.04 (0.08) 0.04 (0.08) 0.03 (0.06)

0.12 (0.31) 0.11 (0.28) 0.08 (0.21)

0.37 (0.74)

0.26 (0.67)

0.03 (0.06)

0.08 (0.21)

T1-P-13 T1-B-13 T1-G-13

0.27 (0.60) 0.26 (0.58) 0.06 (0.13)

0.20 (0.48) 0.19 (0.45) 0.04 (0.10)

0.44 (0.97)

0.39 (0.92)

0.06 (0.13)

0.04 (0.10)

T1-P-23 T1-B-23 T1-G-23

0.03 (0.08) 0.03 (0.08) 0.01 (0.03)

0.12 (0.33) 0.12 (0.33) 0.03 (0.08)

0.20 (0.55)

0.19 (0.53)

0.01 (0.02)

0.03 (0.07)

T1-P-123 T1-T-123

0.71 (0.71) 0.70 (0.70)

0.56 (0.56) 0.57 (0.57)

0.97 (0.97)

0.93 (0.93)

0.01 (0.01)

0.04 (0.04)

T2-P-12 T2-B-12 T2-G-12

0.03 (0.06) 0.02(0.04) 0.03(0.06)

0.37 (0.74)

0.03 (0.06)

T2-P-13 T2-B-13 T2-G-13

0.06 (0.13) 0.05(0.10) 0.06(0.13)

0.44 (0.97)

0.06 (0.13)

T2-P-23 T2-B-23 T2-G-23

0.01(0.02) 0.01(0.02) 0.01(0.02)

0.20 (0.55)

0.01 (0.02)

T2-P-123 T2-T-123

0.39(0.39) 0.38(0.38)

0.97 (0.97)

0.01 (0.01)

T3-P-12 T3-B-12 T3-G-12

0.08 (0.21) 0.07 (0.19) 0.08 (0.21)

0.26 (0.67)

0.08 (0.21)

T3-P-13 T3-B-13 T3-G-13

0.04 (0.10) 0.03 (0.08) 0.04 (0.10)

0.39 (0.92)

0.04 (0.10)

T3-P-23 T3-B-23 T3-G-23

0.03 (0.07) 0.02 (0.04) 0.03 (0.07)

0.19 (0.53)

0.03 (0.07)

T3-P-123 T3-T-123

0.36(0.36) 0.35(0.35)

0.93 (0.93)

0.04 (0.04)

Values in boldface type are considered anomalous, in the sense that they do not agree with the expectations based on either NAS theory.

will always be lower than the second-order selectivity, because it is obtained as the product of the “partial selectivities” for each mode, each of which is lower than 1. In the specific case of Table 3, there is a 3-fold increase in sensitivity for component 1 in system B2 when third-order data are employed, in comparison with the most sensitive combination of dimensions (“12”, i.e., first and second, which provide the less overlapped profiles). Likewise, the sensitivity increase for component 2 in system B3 is ∼4-fold. It may be noticed that third-order data can also be processed by reducing the dimensionality to second order by a suitable unfolding process. This still preserves the second-order advantage, and make the data amenable to processing by algorithms such as GRAM, three-way PARAFAC, or MCR-ALS instead of by fourway PARAFAC or TLLS. The HCD sensitivities for the calibrated component in systems B2 and B3 in Table 3 when the data are compressed can be computed through eq 17. If dimensions 1 and 2 are previously unfolded into a single mode, and are combined with dimension 3 as the second mode in order to produce secondorder data, the sensitivity is calculated as 0.24. Likewise, previously unfolding dimensions 1 and 3 gives a sensitivity of 0.54, while 4944

Analytical Chemistry, Vol. 77, No. 15, August 1, 2005

unfolding dimensions 2 and 3 provides a sensitivity of 0.50. Hence, reduction of the dimensionality of third-order data for application of second-order algorithms27 implies a sensitivity loss in comparison with genuine four-way chemometric tools (see Table 3, where the Monte Carlo SENn are 0.73 and 0.78 in systems B2 and B3). This result is in line with previous observations based on an approach different from that employed in the present report.45 Table 4 collects the corresponding results for the ternary systems, where at first sight the agreement with NAS theory appears to be less pleasant than for binary ones. It is important to note that, as far as second-order data are concerned, GRAM shows a sensitivity value compatible with the HCD approach, as expected from previous studies,37 as well as PARAFAC and BLLS for a single calibrated analyte (systems T2 and T3).54 Somewhat surprisingly, significantly higher sensitivities are computed for PARAFAC and BLLS when both analytes 1 and 2 are calibrated (system T1). This indicates that the HCD values constitute a lower bound for sensitivity and that in certain multicomponent cases, where more than one analyte is present in the calibration samples, least-squares techniques such as PARAFAC and BLLS are able

Table 5. Monte Carlo RMSE Results for Both Calibrated Analytes in the Different Systems Studieda systemmethoddimens B1-P-12 B1-B-12 B1-G-12 B1-P-23 B1-B-23 B1-G-23 B1-P-13 B1-B-13 B1-G-13 B1-P-123 B1-T-123 B2-P-12 B2-B-12 B2-G-12 B2-P-23 B2-B-23 B2-G-23 B2-P-13 B2-B-13 B2-G-13 B2-P-123 B2-T-123 B3-P-12 B3-B-12 B3-G-12 B3-P-23 B3-B-23 B3-G-23 B3-P-13 B3-B-13 B3-G-13 B3-P-123 B3-T-123 a

RMSE analyte 1 analyte 2 0.010 0.010 0.021 0.011 0.011 0.050 0.026 0.025 0.110 0.005 0.005 0.020 0.020 0.018 0.050 0.048 0.049 0.115 0.112 0.110 0.009 0.009

0.013 0.013 0.025 0.013 0.013 0.048 0.026 0.025 0.115 0.005 0.005

0.025 0.024 0.025 0.045 0.052 0.051 0.130 0.110 0.125 0.008 0.008

systemmethoddimens T1-P-12 T1-B-12 T1-G-12 T1-P-13 T1-B-13 T1-G-13 T1-P-23 T1-B-23 T1-G-23 T1-P-123 T1-T-123 T2-P-12 T2-B-12 T2-G-12 T2-P-13 T2-B-13 T2-G-13 T2-P-23 T2-B-23 T2-G-23 T2-P-123 T2-T-123 T3-P-12 T3-B-12 T3-G-12 T3-P-13 T3-B-13 T3-G-13 T3-P-23 T3-B-23 T3-G-23 T3-P-123 T3-T-123

RMSE analyte 1 analyte 2 0.120 0.123 0.170 0.019 0.020 0.090 0.176 0.180 >0.5 0.008 0.008 0.155 >0.5 0.160 0.085 0.090 0.090 >0.5 >0.5 >0.5

0.042 0.044 0.065 0.026 0.025 0.135 0.035 0.038 0.185 0.010 0.010

0.062 0.065 0.063 0.130 >0.5 0.135 >0.5 >0.5 >0.5 0.014 0.015

When the noise level sX is 0.005.

to achieve a higher sensitivity than this lower bound. Hence, the precision exhibited by GRAM, PARAFAC, and BLLS, which was comparable for binary systems, favors the latter two for ternary systems, and possibly for other multicomponent samples (see Supporting Information for further details). Also noticeable is the fact that the “anomalous” Monte Carlo (VIFn)-1/2 values in the case of second-order data for system T1 do not seem to have a relationship to MKL or HCD values (Table 4). These results point to the need of a deepening of NAS theory to account for this currently anomalous cases. When referring to third-order data for the ternary systems, all of which require the second-order advantage, (VIFn)-1/2 values are higher than expectations based on the HCD approach, in line with results for binary systems. Moreover, there is a net increase between 2- and 7-fold in sensitivity when going to third-order data, in comparison with that achieved for the best combination of second-order dimensions. The sensitivity gain is more pronounced in cases where a single analyte is calibrated, in comparison with cases where two analytes are calibrated. Having discussed the variance inflation values, Table 5 and Figure 3 present perhaps the most convincing arguments in favor of (VIFn)-1/2 as a good estimator of the sensitivity. They show how the ratios [sX/(VIFn)-1/2] are close to the RMSE for most studied cases (when sX ) 0.005 unit), across values differing by ∼2 orders of magnitude. In the absence of a substantial bias for

Figure 3. log-log plot of the ratio between instrumental noise and (VIFn)-1/2, as a function of the average RMSE for the test set of five samples, as obtained by numerical Monte Carlo simulations (quoted in Table 5) when the instrumental noise sX is set at ) 0.005 unit. Circles, component 1; triangles, component 2. The solid line indicates the perfect agreement.

the estimated concentrations,47 the RMSE for a group of samples should be similar to the uncertainty in concentration propagated by the instrumental noise. According to eqs 13 and 14, this propagated error is [sX/(VIFn)-1/2], and hence, a correlation between the latter and the RMSE is expected, and corroborated in Figure 3. Experimental Planning. Although the results provided by Tables 3 and 4 are limited to the systems analyzed in the present report, they provide an idea of the relevant information that can be gathered from Monte Carlo calculations, given that the latter appear to be the only method of estimating sensitivities and selectivities for some of these complex systems. Relevant issues to be taken into account in the planning of multiway analytical protocols are as follows: (1) the need of the second-order advantage, (2) the selection of the chemometric algorithm, (3) the order of the instrumental data, and (4) the observance of the trilinearity condition. An important lesson to be learned from the presently discussed results is that the achievement of the second-order advantage significantly affects the sensitivity, even for identical system components. In this context, the use third-order data leads to an increase in sensitivity ranging from 2- to 7-fold, implying a reduction of the limit of detection by the same factors. In comparison, analytical applications not requiring the second-order advantage will also benefit when the instrumental data order is increased, although to a lesser extent: according to Tables 3 and 4, the maximum gain is 2-fold. Another relevant issue in connection with the sensitivity is the proper algorithmic selection: Tables 3 and 4 reveal up to 5-fold increments when using least-squares methodologies in comparison with GRAM for multianalyte samples, but also show cases where the sensitivity of the latter technique is comparable to other methodologies. It is also important to be able to estimate the sensitivity stemming from a particular combination of dimensions, which is possible provided the component profiles in each of these dimensions can be separately obtained. According to the present results, the figures of merit always improve on increasing the instrumental order, but the specific improvement should be quantified. Once the sensitivities for all possible experiments are Analytical Chemistry, Vol. 77, No. 15, August 1, 2005

4945

assessed, pertinent questions regarding the experimental setup needed for the analysis of a component in a complex sample are as follows: (1) if the sensitivity increment is really significant in light of the analytical problem at hand and (2) if the experimental effort required by the setup is worth of the sensitivity gain. Finally, the compliance to the trilinearity condition required by certain algorithms (notably PARAFAC) is an important aspect that should also be taken into account. As an example, the present simulations are mimicking an experimental setup where fluorescence intensity developed in time by a chemical reaction. Measuring the third-order excitation-emission matrices as a function of time for each sample came at no extra cost than recording either fluorescent mode, because a fast-scanning spectrofluorometer was available.22 However, the time needed to measure each EEM was significantly larger than that required for a single emission spectrum at a fixed excitation wavelength. Since the reaction proceeded to some extent during EEM acquisition, the higher sensitivity provided by the third-order data came at the price of losing the desirable property of trilinearity. Other possible sources of trilinearity loss are the lack of reproducibility in chromatographic retention times involved in multiway tandem measurements, or in sample-to-sample pH or time profiles in spectral-pH or spectral-kinetic experiments.

4946

Analytical Chemistry, Vol. 77, No. 15, August 1, 2005

The above discussion illustrates the types of trade-off faced by the analyst in multiway protocols, especially when requiring the second-order advantage. To be able to answer them by using appropriate mathematical expressions would require further work on the extension of NAS theory to the second and higher dimensions. In the meantime, Monte Carlo numerical calculations arise as a convenient way of assessing the sensitivity of these experiments. ACKNOWLEDGMENT Financial support from the University or Rosario and CONICET (Consejo Nacional de Investigaciones Cientı´ficas y Te´cnicas) is gratefully acknowledged. Thanks are extended to N. M. Faber (Chemometry Consultancy, The Netherlands) for helpful discussions. SUPPORTING INFORMATION AVAILABLE Additional information as noted in text. This material is available free of charge via the Internet at http://pubs.acs.org. Received for review January 25, 2005. Accepted May 31, 2005. AC050146M