Analytical Advantages of Multivariate Data ... - ACS Publications

Jul 9, 2008 - Multidimensional data are being abundantly produced by modern analytical instrumentation, calling for new and powerful data-processing ...
0 downloads 0 Views 305KB Size
Anal. Chem. 2008, 80, 5713–5720

Perspective Analytical Advantages of Multivariate Data Processing. One, Two, Three, Infinity? Alejandro C. Olivieri* Departamento de Quı´mica Analı´tica, Facultad de Ciencias Bioquı´micas y Farmace´uticas, Universidad Nacional de Rosario, and Instituto de Quı´mica Rosario (IQUIR), Consejo Nacional de Investigaciones Cientı´ficas y Te´cnicas (CONICET), Suipacha 531, Rosario S2002LRK, Argentina Multidimensional data are being abundantly produced by modern analytical instrumentation, calling for new and powerful data-processing techniques. Research in the last two decades has resulted in the development of a multitude of different processing algorithms, each equipped with its own sophisticated artillery. Analysts have slowly discovered that this body of knowledge can be appropriately classified, and that common aspects pervade all these seemingly different ways of analyzing data. As a result, going from univariate data (a single datum per sample, employed in the well-known classical univariate calibration) to multivariate data (data arrays per sample of increasingly complex structure and number of dimensions) is known to provide a gain in sensitivity and selectivity, combined with analytical advantages which cannot be overestimated. The first-order advantage, achieved using vector sample data, allows analysts to flag new samples which cannot be adequately modeled with the current calibration set. The second-order advantage, achieved with second- (or higher-) order sample data, allows one not only to mark new samples containing components which do not occur in the calibration phase but also to model their contribution to the overall signal, and most importantly, to accurately quantitate the calibrated analyte(s). No additional analytical advantages appear to be known for third-order data processing. Future research may permit, among other interesting issues, to assess if this “1, 2, 3, infinity” situation of multivariate calibration is really true. MULTIVARIATE DATA AND THEIR ADVANTAGES From the analytical chemistry standpoint, multivariate calibration can be defined as the development of mathematical models relating unselective multiple instrumental signals with analyte concentrations.1,2 In contrast to univariate calibration, measuring multivariate signals enables one to compensate for varying * Contact information. E-mail: [email protected]. (1) Massart, D. L.; Vandeginste, B. G. M.; Buydens, L. M. C.; de Jong, S.; Lewi, P. J.; Smeyers-Verbeke, J. Handbook of Chemometrics and Qualimetrics; Elsevier: Amsterdam, The Netherlands, 1997; Parts A and B. (2) Martens, H.; Naes, T. Multivariate Calibration;Wiley: Chichester, U.K., 1980. 10.1021/ac800692c CCC: $40.75  2008 American Chemical Society Published on Web 07/09/2008

Figure 1. Number of publications concerning analytical applications of multiway data in the last 30 years. Information up to 2006 has been taken from ref 5.

contributions of nonanalytes in an unknown sample, allowing one to perform quantitative analysis in intrinsically unselective multicomponent systems.3 The main success of multivariate calibration can be sized through the large number of applications in nearinfrared (NIR) spectroscopy, which date back to the decade of 1970.4 Many other instrumental signals (spectroscopic, electrochemical, chromatographic, etc.) have been incorporated into this fascinating multivariate calibration world in the last decades. Multidimensional instrumental signals, which are made available by modern instrumentation, have recently led to a blossom of quantitative determinations in increasingly complex samples.5–7 A recent literature survey has been conducted concerning experimental applications of multiway data to different areas of analytical chemistry, allowing one to appreciate the rapid growth of this research area (the term “multiway” refers to data with two or more dimensions per sample, see below).5 Figure 1 suggests an explosion of papers in the regular literature in the last 7 years. (3) Bro, R. Anal. Chim. Acta 2003, 500, 185–194. (4) Martens, H.; Martens, M. Multivariate Analysis of Quality: An Introduction; Wiley: Chichester, U.K., 2000. (5) Escandar, G. M.; Faber, N. M.; Goicoechea, H. C.; Mun ˜oz de la Pen ˜a, A.; Olivieri, A. C.; Poppi, R. J. Trends Anal. Chem. 2007, 26, 752–765. (6) Bro, R. Crit. Rev. Anal. Chem. 2006, 36, 279–293. (7) Smilde, A.; Bro, R.; Geladi, P. Multi-Way Analysis with Applications in the Chemical Sciences; Wiley: Chichester, U.K., 2004.

Analytical Chemistry, Vol. 80, No. 15, August 1, 2008

5713

Figure 2. Representation of data of increasing complexity, with order and number of ways indicated in parentheses, both for a single sample and for a sample data set. As can be seen, the term multiway is a subdivision of multivariate.

This is so even when the theoretical foundations of the multiway advantages date back to the 1980s.8 For a proper discussion of the full potentialities of multivariate data, an introduction to nomenclature is required, particularly in what concerns the concepts of “data order” and “data ways”. This nomenclature has been borrowed from tensor algebra: “order” is usually employed to denote the number of data dimensions for a single sample (a scalar is a zeroth-order tensor, a vector a firstorder tensor, etc.), while “ways” is reserved for the number of dimensions of a number of joined data arrays, measured for a group of samples. The classical univariate calibration, which operates using a single datum per sample, and hence with a vector data for a sample set, is both a zeroth-order method and a oneway method. Correspondingly, first-order sample data (vectors) lead to two-way data sets, second-order sample data (matrices) to three-way data sets, third-order sample data (three-dimensional arrays) to four-way data sets, etc. While data other than univariate are known as multivariate, second-order and beyond are known as multiway data. Figure 2 pictorially illustrates these concepts. Instrumentation for measuring first-order data is fairly simple: virtually all spectroscopic, chromatographic, and voltammetric equipments provide this kind of vectorial information. On the other hand, second- and third-order data can be measured either in a single instrument or resorting to instrument hyphenation,9 as summarized in Table 1. The progress in analytical instrumentation is leading to multiply hyphenated techniques (hyperhyphenation or hypernation),10 providing data of increasing complexity, a challenge both from the experimental and also from the theoretical standpoint. An appropriate nomenclature for sample components should also be mentioned. Components present in both calibration and validation samples are regularly denoted as “expected”, because the analyst expects them to be present in most test samples and employs them to build a sufficiently representative training sample set. The expected components can be further divided into (8) Booksh, K. S.; Kowalski, B. R. Anal. Chem. 1994, 66, 782A–791A. (9) Wilson, I. D.; Brinkman, U. A. J. Chromatogr., A 2003, 1000, 325–356. (10) Wilson, I. D.; Brinkman, U. A. Trends Anal. Chem. 2007, 26, 847–854.

5714

Analytical Chemistry, Vol. 80, No. 15, August 1, 2008

“calibrated” and “uncalibrated”, referring to whether or not calibration concentrations are available for each of them. On the other hand, truly unknown samples may contain additional, “unexpected” components. Although these potential interferences may generate a signal that overlaps with the analyte of interest, they will not always produce an interference, in the sense of generating a systematic error in the analyte determination.11 Whether the interference will be actual or will only remain as potential, depends on the type of instrumental signals and on the calibration methodology. Univariate calibration, for example, cannot detect sample components producing an interfering signal. However, first-order calibration may compensate for potential interferences, provided they are included in the calibration set. It is this possibility of extracting useful analytical information from intrinsically unselective data which has made first-order calibration so popular. Although unexpected components most likely constitute an interference in the analysis of a test sample, first-order calibration is able to flag the latter as an outlier, because it cannot be modeled with a given calibration data set, warning that analyte prediction is not recommended. This property is known as the first-order advantage,8 a kind of “better than nothing” advantage.12 Second- and higher-order calibration can compensate for potential interferences which are not included in the calibration set, and this is universally recognized as the second-order advantage.8 This property allows one not only to mark a sample carrying unexpected components as an outlier but also to model the presence of the potential interferences and to accurately quantitate the analyte(s). As a byproduct, analytes can be calibrated with a few samples instead of with a large set of samples, as will be required if first-order calibration is performed. The first experimental demonstration of the second-order advantage saw the light in 1978, when perylene was determined in mixtures with anthracene, calibrating only with perylene solutions, by suitable processing of fluorescence excitation-emission matrix data.13 However, the expression “second-order advantage” was not coined until 1994.8 Since these first theoretical and experimental approaches, a large body of work has been accumulating on the application of a variety of multiway algorithms to the analysis of complex samples from multiple sources, using data of increasing number of dimensions. Today, the limits of applicability and immense potentialities of the second-order advantage in analytical chemistry have become an active area of theoretical interest and of intensive experimental research. Table 2 shows a summary of the analytical properties of data of increasing dimensions, with emphasis on the advantages which can be achieved. This table highlights the fact that current knowledge permits a clear distinction among three complexity levels: univariate, first-order, and multiway. Although a theoretical Nth-order advantage has been suggested to accompany N-dimensional data,8 additional advantages to those described above remain to be uncovered. After briefly reviewing the relative advantages of measuring and processing multivariate data, the present report attempts to (11) Van der Linden, W. E. Pure Appl. Chem. 1989, 61, 91–95. (12) It could also be argued that the first-order advantage is related to the possibility of carrying out multicomponent analysis, in contrast to univariate calibration. In this sense, all data orders higher than zero would show this advantage. (13) Ho, C.-N.; Christian, G. D.; Davidson, E. R. Anal. Chem. 1978, 50, 1108– 1113.

Table 1. Different Types of Second- and Third-Order Dataa instrument type single instrument

second-order data

third-order data

luminescence EEM absorption (UV-visible, IR, NIR) spectra as a function of pH or reaction time luminescence (excitation or emission) spectra as a function of pH, reaction time, or decay time NIR spectra as a function of overtone number two-dimensional NMR

hyphenated instruments

chromatography or CE-spectrometryb voltammetry/spectrometryb MS/MS spectrometry two-dimensional chromatography or CE

luminescence EEM as a function of pH, reaction time or decay time two-dimensional NMR as a function of pH or reaction time absorption or emission spectra as a function of pH and reaction time

chromatography or CE-MS/MS two-dimensional chromatography or CE-spectrometryb

a EEM, excitation-emission matrix; UV, ultraviolet; IR, infrared; NIR, near-infrared; NMR, nuclear magnetic resonance; MS, mass spectrometry; CE, capillary electrophoresis. b Spectrometry includes diode-array, fluorimetry, IR, NIR, or MS spectrometry.

Table 2. Number of Dimensions of Different Data and Their Analytical Properties ordera 0 1 2 3 4

waysb 1 2 3 4 5

no. of analytes which can be quantitated

detection of unexpected componentsc

analysis in the presence of unexpected componentsc

one several several

no yes yes

no no yes

advantage first-order advantage second-order advantage

a Order ) number of dimensions of data for a single sample. b Ways ) number of dimensions of data for a set of samples. c The property in bold defines the advantage shown in the last column.

clarify some misconceptions about multivariate analysis and to explore future perspectives and new directions in multiway analytical research. MISCONCEPTIONS IN MULTIVARIATE ANALYSIS Univariate vs Multivariate Calibration. It is sometimes believed that multivariate data of the highest available number of dimensions is the best option for analysis, even if a single analyte is determined in the absence of interferences. This may not be the case. A general observation is that sensitivity increases on increasing the number of data dimensions, simply because more sensors are employed for signal detection, and this reduces the impact of noise on redundantly measured data for the same phenomenon.3 An example may be useful in this regard: assume a single analyte is studied, whose spectrum shows a Gaussian shape having a full width at half-height (fwhh) of 50 nm, with measurements done each 1 nm. If univariate analysis using the signal at the peak maximum is arbitrarily assigned a relative sensitivity of 1, then first-order analysis using the whole spectrum will show a sensitivity of 6 (this number is given by the length of the vectorized spectrum, mathematically defined as the square root of the sum of its squared values). Therefore, a 6-fold sensitivity increase is expected in going from univariate to firstorder analysis. If a two-dimensional signal is measured having Gaussian profiles with fwhhs of 50 nm in each dimension and measurements are done each 1 nm in both data dimensions, the overall sensitivity will increase by a factor of 36. This will translate in correspondingly decreasing detection limits, for example, from 10 to 2 to 0.3 ppm when going from zeroth- to first- to secondorder analysis. These improvements may not be significant enough to progress from microanalysis to trace analysis, and thus

the experimental effort in measuring multivariate signals may be worthless. Furthermore, some multidimensional experiments, i.e., luminescence excitation-emission matrix spectroscopy, carry the risk of detecting dispersion signals which are difficult to model, such as Rayleigh and Raman dispersion bands or diffraction harmonics.14 If the increase in sensitivity is not vital and interferences are not foreseen in test samples, then univariate analysis may actually be more appropriate than single-component multivariate analysis. First-Order vs Higher-Order Calibration. Similar concepts apply to the simultaneous analysis of several analytes producing overlapping multivariate responses. If all analytes are available (either in pure form or in mixtures) in order to build a suitably representative training sample set and interferences do not occur in the test samples, then the existing first-order algorithms will allow one to quantitate the analytes in mixtures of unknown composition, particularly the most popular regression technique known as partial least-squares or PLS.15 Measuring second- or higher-order signals for this kind of systems will provide not only increased sensitivity but also higher selectivity and, in general, useful qualitative information concerning the analyzed system, i.e., spectral, kinetic, or chromatographic profiles for the various sample components. This latter output may be valuable for physicochemical, biological, or process analytical studies; analysts should judge whether these benefits are really worthy. In any case, the study of the sensitivity and selectivity parameters in first- and second-order multivariate analysis is now firmly established and reliable closed expressions are available (14) Jiji, R. D.; Booksh, K. S. Anal. Chem. 2000, 72, 718–725. (15) Wold, S.; Trygg, J.; Berglund, A.; Antti, H. Chemom. Intell. Lab. Syst. 2001, 58, 131–150.

Analytical Chemistry, Vol. 80, No. 15, August 1, 2008

5715

for estimating these figures of merit, even before complex multidimensional experiments are actually performed.16–20 Analysts should judiciously compare pros and cons of experimentally demanding experiments with those of simpler, cheaper, and faster methods of analysis. A conclusion to be drawn from this and the previous section is that multivariate analysis should be left for cases where (1) unselective signals demand suitable data processing methods to extract analyte information and (2) the complexity of the samples require the full potentiality of the first- or second-order advantages. This rather conservative view complies with the celebrated Occam’s razor.21 Second-Order Algorithms and the Second-Order Advantage. Another common misconception is the notion that multiway data do always lead to the achievement of the second-order advantage, regardless of the algorithm employed. This is not so: the extension of the well-known partial least-squares regression analysis to N-dimensional data, known as N-PLS,22 constitutes a genuine multiway technique, yet it does not achieve the secondorder advantage [the same is true for the classical variants biand trilinear least-squares regressions (BLLS and TLLS)].23,24 These methods do not obtain the second-order advantage because they build the calibration models using the set of training data together with the nominal analyte concentrations (which are unavailable for a test sample). If any of these models is then applied to a test sample having unexpected components, the analyte quantitation will not be accurate because the test sample signals will give a poor fit to the calibration model. The secondorder advantage is acquired only when the above algorithms are coupled to adequate postcalibration procedures which are able to model the contribution of the potential interferences.25,26 This implies that multiway data carry the second-order advantage only potentially; whether it will become real or not depends on the type of data processing. Univariate vs Multiway Standard Addition. Two general univariate calibration modes exist: external calibration and standard addition calibration. The latter mode is sometimes considered as a solution to the problem of an interfering background signal. This is not the case, unless the background signal arises from the chemical treatment of the sample rather than from the sample itself and can be adequately subtracted (for example, by carrying out two standard additions on different sample amounts or by combining standard addition with Youden calibration).27 This is not a common situation, however, and does not include the analysis of natural or biological samples containing a variety of responsive nonanalytes. Standard addition is rather designed to circumvent the effect of a background on the analyte response (16) (17) (18) (19) (20) (21) (22) (23) (24)

(25) (26) (27)

Olivieri, A. C.; Faber, N. M. Chemom. Intell. Lab. Syst. 2004, 70, 75–82. Olivieri, A. C. J. Chemom. 2004, 18, 363–371. Olivieri, A. C.; Faber, N. M. J. Chemom. 2005, 19, 583–592. Olivieri, A. C. Anal. Chem. 2005, 77, 4936–4946. Olivieri, A. C.; Faber, N. M.; Ferre´, J.; Boque´, R.; Kalivas, J. H.; Mark, H. Pure Appl. Chem. 2006, 78, 633–661. Hoffmann, R.; Minkin, V. I.; Carpenter, B. K. HYLE 1997, 3, 3–28. Bro, R. J. Chemom. 1996, 10, 47–61. Linder, M.; Sundberg, R. Chemom. Intell. Lab. Syst. 1998, 42, 159–178. Arancibia, J. A.; Olivieri, A. C.; Bohoyo Gil, D.; Mun ˜oz de la Pen ˜a, A.; Dura´nMera´s, I.; Espinosa Mansilla, A. Chemom. Intell. Lab. Syst. 2006, 80, 77– 86. ¨ hman, J.; Geladi, P.; Wold, S. J. Chemom. 1990, 4, 79–88. O Olivieri, A. C. J. Chemom. 2005, 19, 253–265. Youden, W. J. Anal. Chem. 1947, 19, 946–950.

5716

Analytical Chemistry, Vol. 80, No. 15, August 1, 2008

leading to a change in sensitivity, i.e., a change in the slope of the univariate signal-concentration relationship.28 The generalized standard addition method (GSAM)29,30 is the first-order multivariate counterpart of univariate standard addition and is realized by measuring first-order data for various overlapping analytes embedded in a sample background. Generalized standard addition not only demands knowledge of the number and identity of the analytes but also that standards of each of them are available, in order to be added in perfectly known amounts to each sample. In any case, the limitations of this method regarding the background effects are analogous to those for the univariate standard addition mode. A background signal arising from responsive nonanalytes constitutes an interference in univariate analysis and cannot be corrected by means of standard addition. This is typical of most biological samples, where the second-order advantage is required for successful quantitation. The presence of a responsive background, which does also affect the analyte response in a sample (for example, through analyte-background interactions such as complex formation or protein binding), requires at least secondorder standard addition for analyte quantitation.31–34 One of the subtleties of the multiway world is that this ubiquitous analytical problem can also be solved by external calibration in the presence of background (in case the latter is available to be spiked with the analyte), which is experimentally simpler.35 Thus, in the threeway world and beyond, both types of classical calibrations appear to merge. Only a few references exist in the literature on this interesting multiway research field. It certainly deserves to be further investigated. FUTURE PERSPECTIVES Figures of Merit. In the first-order multivariate calibration field, figures of merit can now be conveniently estimated, including sensitivity, concentration standard errors, and limit of detection.36–41 This has been possible thanks to the introduction of the concept of net analyte signal, or NAS, which is the portion of the overall signal uniquely ascribed to the analyte of interest.20,42–45 The available proposals generalizing the well-established univariate methodology to the first-order multivariate domain have been discussed in a recent IUPAC’s Technical Report.20 (28) (29) (30) (31) (32) (33) (34) (35) (36) (37) (38) (39) (40) (41) (42) (43) (44) (45)

Castells, R. C.; Castillo, M. A. Anal. Chim. Acta 2000, 423, 179–185. Saxberg, B. E. H.; Kowalski, B. R. Anal. Chem. 1979, 51, 1031–1038. Kalivas, J. H.; Kowalski, B. R. Anal. Chem. 1981, 53, 2207–2212. Sena, M. M.; Trevisan, M. G.; Poppi, R. J. Quim. Nova 2005, 28, 910–920. Arancibia, J. A.; Olivieri, A. C.; Escandar, G. M. Anal. Bioanal. Chem. 2002, 374, 451–459. Bahram, M.; Bro, R. Anal. Chim. Acta 2007, 584, 397–402. Go´mez, V.; Cuadros, R.; Ruisa´nchez, I.; Callao, M. P. Anal. Chim. Acta 2007, 600, 233–239. Culzoni, M. J.; Goicoechea, H. C.; Pagani, A. P.; Cabezo´n, M. A.; Olivieri, A. C. Analyst 2006, 131, 718–723. Faber, K.; Kowalski, B. R. J. Chemom. 1997, 11, 181–238. Faber, N. M.; Song, X.-H.; Hopke, P. K. Trends Anal. Chem. 2003, 22, 330–334. Boque´, R.; Faber, N. M.; Rius, F. X. Anal. Chim. Acta 2000, 423, 41–49. Boque´, R.; Ferre´, J.; Faber, N. M.; Rius, F. X. Anal. Chim. Acta 2002, 451, 313–321. Faber, N. M.; Bro, R. Chemom. Intell. Lab. Syst. 2002, 61, 133–149. Linder, M.; Sundberg, R. J. Chemom. 2002, 16, 12–27. Faber, K.; Lorber, A.; Kowalski, B. R. J. Chemom. 1997, 11, 419–461. Lorber, A. Anal. Chem. 1986, 58, 1167–1172. Bergmann, G.; von Oepen, B.; Zinn, P. Anal. Chem. 1987, 59, 2522–2526. Ferre´, J.; Faber, N. M. Chemom. Intell. Lab. Syst. 2003, 69, 123–136.

Considerable effort has been devoted to extend this useful information to second-order calibration.46–50 As a result, a significant improvement in knowledge has been gained in the last years concerning the estimation of the three-way sensitivity and selectivity parameters.16–19 The latter are key figures of merit, which allow one to compare the performance of different analytical methods. It is interesting to note that several different definitions exist for the net analyte signal in the second-order scenario.18,49,50 This directly translates into potentially conflicting equations for the sensitivity because the latter is defined as the NAS at unit analyte concentration. Recently, each of the available expressions has been shown to correspond to specific calibration situations, depending on whether a given component belongs to the training data set or is only present in the test samples.18 Although mathematical expressions for estimating other important three-way figures of merit have not been fully developed, they do provide useful approximationstoconfidenceintervalsanddetectioncapabilities.16,47–51 The work is far from being complete: the subject is somewhat obscure for data orders higher than two, where the existing approaches do not provide adequate insight into the figures of merit in all possible calibration situations.19 This may indicate that the extension of the well-known concept of net analyte signal to any number of dimensions is not as straightforward as could be anticipated. Chemometricians will not be surprised to hear that multiway figures of merit are algorithm-specific: the same set of data measured under the same conditions and in the same equipment for the same set of samples may lead to different figures of merit, depending on the algorithm employed for data processing.8 This undoubtedly calls for an integrated view of the analytical process. What may perhaps be a surprise is the fact that the figures of merit are not only algorithm-specific but also calibration-specific: with the same set of sample components, the analyte sensitivity varies if components other than the analyte are present in the calibration set or belong to the category of potential interferences.8 A still open discussion concerns the fact that some of these parameters do also appear to be sample-specific: can they be called real “figures of merit” if they vary from sample to sample? Is it right to report population-averaged figures of merit? The future will definitely see a number of nomenclature-oriented debates on this issue. Achievement of the Second-Order Advantage. At present two different philosophies exist as regards the achievement of the second-order advantage. In one of them, the test sample data are joined with the calibration data, and this grand set is decomposed into contributions from the analyte(s) and potential interferences before prediction is made in a pseudounivariate manner. The process can be schematically viewed as indicated in Figure 3. Available algorithms for accomplishing this task with three-way data are based on highly constrained least-squares (46) Faber, K.; Lorber, A.; Kowalski, B. R. J. Chemom. 1997, 11, 419–461. (47) Bro, R.; Faber, N. M.; Rinnan, Å. Chemom. Intell. Lab. Syst. 2005, 75, 69– 76. (48) Faber, N. M.; Bro, R. Chemom. Intell. Lab. Syst. 2002, 61, 133–149. (49) Ho, C.- N.; Christian, G. D.; Davidson, E. R. Anal. Chem. 1980, 52, 1071– 1079. (50) Messick, N. J.; Kalivas, J. H.; Lang, P. M. Anal. Chem. 1996, 68, 1572– 1579. (51) Boque´, R.; Ferre´, J.; Faber, N. M.; Rius, F. X. Anal. Chim. Acta 2002, 451, 313–321.

Figure 3. Scheme of one possible strategy for achieving the secondorder advantage. Information from calibration (both signals and concentrations) and from the test sample (only signals) is introduced into the algorithm (pictorially represented by gears), producing samplespecific calibration parameters. The latter are then applied to the test sample signals to predict the analyte concentration. The process is repeated from the start for each newly analyzed test sample.

multivariate curve resolution-alternating least-squares52 or on a variety of the so-called trilinear decomposition techniques.53–59 Table 3 provides a brief description of these techniques, and Table 4 lists some of the available algorithms along with pertinent references. In the alternative mode of achieving the second-order advantage, calibration is first performed using only the training data, and the obtained calibration parameters (the multivariate counterparts of the intercept and slope) are then modified and made sample-specific by a postcalibration procedure recognizing the presence of unexpected sample components (Figure 4).25,26 Table 3 gives a brief description of the latter procedure, called residual bilinearization, while Table 4 collects the available algorithms for performing this task. Interestingly, Figures 3 and 4 highlight the fact that the test sample signal takes part in the calibration process even when employing the standard calibration mode, an entirely new concept in analytical calibration. Both of the above approaches to the second-order advantage have pros and cons. Trilinear decomposition provides physically recognizable component profiles (spectra, time, or pH profiles, etc.), while this information is partially lost into the latent structure of residual bilinearization. However, the latter method provides increased flexibility toward complex multiway data not fulfilling (52) DeJuan, A.; Casassas, E.; Tauler, R. In Encyclopedia of Analytical Chemistry; Myers, R. A., Ed.; Wiley: Chichester, U.K., 2002; Vol. 11, pp 9800-9837. (53) Sanchez, E.; Kowalski, B. R. Anal. Chem. 1986, 58, 496–499. (54) Sanchez, E.; Kowalski, B. R. J. Chemom. 1990, 1, 29–45. (55) Bro, R. Chemom. Intell. Lab. Syst. 1997, 38, 149–171. (56) Wu, H. L.; Shibukawa, M.; Oguma, K. J. Chemom. 1998, 12, 1–26. (57) Chen, Z. P.; Wu, H. L.; Jiang, J. H.; Li, Y.; Yu, R. Q. Chemom. Intell. Lab. Syst. 2000, 52, 75–86. (58) Hu, L. Q.; Wu, H. L.; Ding, Y. J.; Fang, D. M.; Xia, A. L.; Yu, R. Q. Chemom. Intell. Lab. Syst. 2006, 82, 145–153. (59) Xia, A. L.; Wu, H. L.; Fang, D. M.; Ding, Y. J.; Hu, L. Q.; Yu, R. Q. J. Chemom. 2005, 19, 65–76.

Analytical Chemistry, Vol. 80, No. 15, August 1, 2008

5717

Table 3. Description of the Three Techniques Employed for Achieving the Second-Order Advantage from Three-Way Data technique trilinear decomposition

multivariate curve resolution

residual bilinearization

description assumes that the three-way array of signals follows the so-called trilinear model, meaning that an array element (i,j,k) is modeled as the sum of contributions of the form (xi × yj × zk), where xi is the relative concentration of a given sample component in the ith sample and yj and zk are the values of the instrumental profiles at the jth and kth channel in each data dimension. Values of xi are then employed for analyte quantitation in a pseudounivariate fashion. places the second-order signals for a group of samples adjacent to each other along one of the data dimensions and assumes that the resulting (augmented) matrix follows a bilinear model, meaning that a matrix element (m,j) is modeled as the sum of contributions of the form (xm × yj), where xm is proportional to the concentration of a given sample component in the ith sample and yj is as defined above. For analyte quantitation, areas under the xm profiles are computed for each sample, as the sum of elements ranging from m ) (i - 1)K + 1 to m ) iK (K is the total number of channels in the dimension of matrix augmentation). Values of the areas are then employed to build a pseudounivariate calibration graph. assumes that the second-order signals from the potential interferences follow a bilinear model, meaning that an element (i,j) of the interferece matrix is the sum of contributions of the form (yj × zk), where yj and zk are as defined above. New calibration parameters (see Figure 4) are produced by modeling the residuals of the fit of the test sample signals to this bilinear model, hence the name residual bilinearization.

the model required for trilinear decomposition. Further details such as limitations, ranges of applicability, extensions to data orders higher than two, and other algorithmic characteristics have been discussed in a recent review.5 Additional work is required to seek for alternative philosophies for the achievement of the second-order advantage. In this regard, it will be interesting to develop techniques which are able to relax some of the restrictive conditions posed by the present models but still achieving the second-order advantage. Nonlinear Multiway Systems. Very recently, emphasis has been directed toward multiway data showing nonlinearities in the signal-concentration relationship. New techniques are being developed to process such data in order to achieve the second5718

Analytical Chemistry, Vol. 80, No. 15, August 1, 2008

order advantage.60 A particularly useful one, enjoying the abilities of latent-structured approaches, follows these steps: (1) models the calibration data by unfolding (or concatenating) the calibration data and processing them with the well-known principal component analysis, (2) applies residual bilinearization, which provides the second-order advantage, and (3) employs an artificial neural network (multilayer perceptron, radial basis functions, support vector machines) to model the relation between principal components and analyte concentrations. 60–63 Efforts are required in this new research arena because neural networks capable of processing multiway data without losing the data structure are unavailable. Moreover, detection limits and other figures of merit need to be estimated for these models, in order to allow for adequate method comparison and planning. Software Development. Programs for first-order multivariate and multiway analysis are freely available on the Internet (see Table 5 for an account of Web sites for specific programs). Inevitably, MATLAB64 is becoming the preferred programming environment for research activities concerning multiway calibration. Even with all its undeniable advantages, many analysts will complain that implementing multiway analysis in MATLAB requires some level of programming skill. Although MATLAB graphical user interfaces (GUI) are useful in bridging the gap between pure chemometricians and end users,65–67 considerable work is still required in developing easy-to-use software for routine applications in most analytical laboratories. Commercial software is also available for implementing firstorder calibration, as well as some multiway calibration methods. Table 5 collects this information, along with company names and Web sites. Inspection of this table shows that commercial software has not yet incorporated all of the existing offers from the multiway calibration market. However, since a fluid communication exists between chemometricians and commercial software developers, it is hoped that this gap will be bridged in the next few years. Further Multivariate Advantages. First-order data allows one to mark a sample having unexpected components as an outlier (though not permitting quantitation), leading to the first-order advantage. Second-order data permits accurate quantitation even in the presence of unexpected sample components, providing the second-order advantage. These two advantages are universally recognized. There is no general consensus, however, on the existence of additional advantages when working with data orders higher than two. There have been some speculations on this matter in the literature, proposing the following properties as candidates for the third-order advantage: (1) the possibility of decomposing a third-order data array into the three contributing profiles and hence separating the interference contributions from (60) Olivieri, A. C. J. Chemom. 2005, 19, 615–624. (61) Garcı´a-Reiriz, A.; Damiani, P. C.; Olivieri, A. C. Anal. Chim. Acta 2007, 588, 192–199. (62) Culzoni, M. J.; Damiani, P. C.; Garcı´a-Reiriz, A.; Goicoechea, H. C.; Olivieri, A. C. Analyst 2007, 132, 654–663. (63) Garcı´a-Reiriz, A.; Damiani, P. C.; Culzoni, M. J.; Goicoechea, H. C.; Olivieri, A. C. Chemom. Intell. Lab. Syst. 2008, 92, 61–70. (64) MATLAB; The Math Works Inc.: Natick, MA. (65) Olivieri, A. C.; Goicoechea, H. C.; In ˜o´n, F. A. Chemom. Intell. Lab. Syst. 2004, 73, 189–197.

Table 4. Algorithms for Processing Second-Order Data and Achieving the Second-Order Advantage algorithms employing the scheme of Figure 3

algorithms employing the scheme of Figure 4

algorithm

acronym

ref

generalized rank annihilation direct trilinear decomposition parallel factor analysis alternating trilinear decomposition

GRAM DTLD PARAFAC ATLD

53 54 55 56

self-weighted alternating trilinear decomposition alternating penalty trilinear decomposition multivariate curve resolutionalternating least-squares

SWALTD

57

APTLD

58, 59

MCR-ALS

66, 67

algorithm

acronym

ref

bilinear least-squares/residual bilinearization unfolded partial least-squares/residual bilinearization N-way partial least-squares/residual bilinearization unfolded principal component analysis/ residual bilinearization/artificial neural networks

BLLS/RBL U-PLS/RBL N-PLS/RBL U-PCA/RBL/ANN

23, 24 25, 26 25, 26 60

Table 5. Free and Commercial Software for Multivariate Calibration free software algorithm parallel factor analysis, N-way partial least-squares and other multiway methods multivariate curve resolution multivariate curve resolution several first-order and multiway methods generalized rank annihilation and direct trilinear decomposition

Web site

author/contact (e-mail)

ref

www.models.kvl.dk/source/

R. Bro ([email protected])

55

http://www.ub.es/gesq/mcr/mcr.htm http://personal.ecu.edu/gemperlinep www.chemometry.com

R. Tauler ([email protected]) P. Gemperline ([email protected]) A. C. Olivieri ([email protected])

66 67 65

http://www.cpac.washington.edu

Melvin Koch ([email protected])

commercial software algorithm

software

first-order partial least-squares first-order partial least-squares first-order and N-way partial least-squares, and multivariate curve resolution first-order partial least-squares and multivariate curve resolution first-order partial least-squares PARAFAC and other multiway methods

PLS Toolbox PLS Toolbox UNSCRAMBLER

Eigenvector.com The MathWorks Camo

company

www.eigenvector.com www.mathworks.com www.camo.com

Web site

PIROUETTE

Infometrix Software

www.infometrix.com

EZINFO 3WayPack

Umetrics The Three-Mode Company

www.umetrics.com http://three-mode.leidenuniv.nl/

those from the analytes when studying a single sample,8,68,69 (2) the improved algorithmic resolution of highly collinear third-order data,70 or (3) the sensitivity and selectivity increase which is known to accompany multiway data of increasing dimensions.71 If this third-order advantage (or in general, the Nth-order advantage) indeed exists, it should be defined in terms of what is really gained, in strictly analytical terms, in going to higher-order data, apart from increased sensitivity and selectivity. In this sense, these additional advantages are unknown, and some authors cast doubts as to whether they will ever be found. The present situation regarding the advantages which can be achieved from the various data orders (Table 2) seems analogous to that vividly described in George Gamow’s One, Two, Three...Infinity book in connection with infinite sets.72 The mathematician Georg Cantor (1845-1918) proved that the number of numbers constitute an infinite set called ℵ0, the number of points in a line, a surface or a volume make a larger infinite set called ℵ1, and the number of curved shapes make

Figure 4. Scheme of an alternative strategy for achieving the second-order advantage. The calibration information leads to calibration parameters, which are then modified and made sample-specific with the help of the test sample signals. Prediction proceeds as in Figure 3.

(66) Jaumot, J.; Gargallo, R.; de Juan, A.; Tauler, R. Chemom. Intell. Lab. Syst. 2005, 76, 101–110. (67) Gemperline, P. J.; Cash, E. Anal. Chem. 2003, 75, 4236–4243. (68) Sinha, A. E.; Hope, J. L.; Prazen, B. J.; Fraga, C. G.; Nilsson, E. J.; Synovec, R. E. J. Chromatogr., A 2004, 1056, 145–154. (69) Sinha, A. E.; Prazen, B. J.; Synovec, R. E. Anal. Bioanal. Chem 2004, 378, 1948–1951.

Analytical Chemistry, Vol. 80, No. 15, August 1, 2008

5719

an even larger set called ℵ2.72 No one has been able to conceive a set requiring a fourth type of infinity, or ℵ3. A variation of the “1, 2, 3, ∞” theme does also arise in the so-called beauty-contest experiments, a kind of numerical games where the behavior of subjects can be interpreted as iterative reasoning, with only reasoning levels 1, 2, and 3 allowed.73 The power of concurrentwrite models of parallel computation,74 the types of path execution for certain logic systems,75 and even the kinds of global financial city centers76 can all be similarly described by resorting to “1, 2, 3, ∞”. Interestingly, this scenario is not exclusive of academic environments: certain tribes (such as the Amazonian Piraha tribe) have words reserved for “one”, “two”, and “many” but do not distinguish between three or more.77 (70) Xia, A. L.; Wu, H. L.; Li, S. F.; Zhu, S. H.; Hu, L. Q.; Yu, R. Q. J. Chemom. 2007, 21, 133–144. (71) van Mispelaar, V. G. Chromametrics. Doctoral Thesis, University of Amsterdam, The Netherlands, 2005. (72) Gamow, G. One, Two, Three. .Infinity; The Viking Press: New York, 1961. (73) Nagel, R.; Satorra, A.; Garcı´a-Montalvo, J. Am. Econ. Rev. 2002, 92, 1687– 1701. (74) Fich, F. E.; Heide, F. M.; Ragde, P.; Wigderson, A. One, Two, Three,. ., Infinity: Lower Bounds for Parallel Computation. In Proceedings of the Seventeenth Annual ACM Symposium on Theory of Computing, Providence, RI, May 6–8, 1985, pp 4858.

5720

Analytical Chemistry, Vol. 80, No. 15, August 1, 2008

Similarly, we can count increasing analytical advantages from one-way to three-way data, but we cannot distinguish advantages among three-way data and beyond, at least with the present body of knowledge. Future research may shed light on whether the common expression “1, 2, 3, ∞” will continue describing the scene of the analytical advantages of multivariate data. ACKNOWLEDGMENT Universidad Nacional de Rosario, CONICET (Consejo Nacional de Investigaciones Cientı´ficas y Te´cnicas, Project No. PIP 5303) and ANPCyT (Agencia Nacional de Promocio´n Cientı´fica y Tecnolo´gica, Project No. PICT-25825) are gratefully acknowledged for financial support. Also thanked are stimulating discussions with Prof. Graciela Escandar (University of Rosario).

Received for review April 7, 2008. Accepted June 10, 2008. AC800692C (75) Pistore, M.; Vardi, M. Y. J. Artif. Intel. Res. 2007, 30, 101–132. (76) Mainelli, M. J. Risk Finance 2006, 7, 219–227. (77) Gordon, P. Science 2004, 306, 496–499.