Second-Order Advantage Achieved by Unfolded-Partial Least

Unfolded-Partial Least-Squares/Residual. Bilinearization Modeling of Excitation-Emission. Fluorescence Data Presenting Inner Filter Effects. Diego Boh...
0 downloads 0 Views 215KB Size
Anal. Chem. 2006, 78, 8051-8058

Second-Order Advantage Achieved by Unfolded-Partial Least-Squares/Residual Bilinearization Modeling of Excitation-Emission Fluorescence Data Presenting Inner Filter Effects Diego Bohoyo Gil,† Arsenio Mun˜oz de la Pen˜a,† Juan A. Arancibia,‡ Graciela M. Escandar,‡ and Alejandro C. Olivieri*,‡

Departamento de Quı´mica Analı´tica, Facultad de Ciencias, Universidad de Extremadura (06071), Badajoz, Spain, and Departamento de Quı´mica Analı´tica, Facultad de Ciencias Bioquı´micas y Farmace´ uticas, Universidad Nacional de Rosario, Suipacha 531, Rosario (S2002LRK), Argentina

A second-order multivariate calibration approach, based on a combination of unfolded-partial least-squares with residual bilinearization (U-PLS/RBL), has been applied to fluorescence excitation-emission matrix data for multicomponent mixtures showing inner filter effects. The employed chemometric algorithm is the most successful one regarding the prediction of analyte concentrations when significant inner filter effects occur, even in the presence of unexpected sample components, which require strict adherence to the second-order advantage. Results for simulated fluorescence excitation-emission data are described, in comparison with the classical approach based on parallel factor analysis and other second-order algorithms, including generalized rank annihilation, bilinear least squares combined with residual bilinearization and multivariate curve resolution-alternating leastsquares. A set of experimental data was also studied, in which calibration was performed with fluorescence excitation-emission matrices for samples containing mixtures of chrysene (the analyte of interest) and benzopyrene (which produced strong inner filter effect across the useful wavelength range). Prediction was made on validation samples with a qualitative composition similar to the calibration set, and also on test samples containing an unexpected component (pyrene). In this latter case, U-PLS/RBL showed a unique success for the analysis of the calibrated component chrysene, achieving the useful second-order advantage. Fluorescence spectroscopy is ubiquitous in modern analytical chemistry, due to its intrinsic sensitivity and to the easy availability of instruments.1 Two problems may arise, however, during the analysis of multianalyte samples, namely, spectral overlapping and inner filter effects. Spectral superposition can be circumvented by resorting to a myriad of multivariate chemometric procedures, * Towhomcorrespondenceshouldbeaddressed.E-mail: [email protected]. † Universidad de Extremadura. ‡ Universidad Nacional de Rosario. (1) Ichinose, N.; Schwedt, G.; Schnepel, F. M.; Adachi, K. Fluorometric analysis in biomedical chemistry; John Wiley & Sons: New York, 1987. 10.1021/ac061369v CCC: $33.50 Published on Web 11/02/2006

© 2006 American Chemical Society

designed to provide selectivity to potentially unselective signals. Inner filter effects, on the other hand, appear when a component (either fluorescent or not) absorbs the excitation or the emission corresponding to another sample component (or both).2 Provided the absorbing component does not fluoresce at the analyte emission wavelength, the standard way of coping with this problem is to perform the calibration in the standard addition mode. However, when the sample carries unexpected fluorescent constituents that have not been taken into account in the calibration step, a convenient way of performing the calibration is by resorting to second-order data coupled to the second-order advantage,3 which is inherent to excitation-emission fluorescence data, as well as to other types of matrix data. Second-order multivariate algorithms such as parallel factor analysis (PARAFAC),4 generalized rank annihilation (GRAM),5 and multivariate curve resolution-alternating least squares (MCR-ALS)6 can be employed in the standard addition mode and are able to obtain the secondorder advantage. In any case, the experimental procedure for standard addition is time-consuming, and it would therefore be interesting to have a technique able to account for inner filter effects using an external set of calibration standards, i.e., without requiring standard addition. A reflection on the inner filter effect leads to the conclusion that, in a given set of samples, the emission and excitation profiles for the affected analyte vary from sample to sample, not only in intensity but also in shape. This is usually due to the fact that the spectral overlapping (between the absorption of the component producing the inner filter effect and the emission/excitation analyte peaks) varies across the analyte band. It is this property that makes conventional second-order multivariate methodologies inapplicable to the presently studied problem, because most of them are designed to recover unique component profiles (excitation and emission). In this list, we (2) Lakowicz, J. R. Principles of fluorescence spectroscopy; Plenum Press: New York, 1983; p 44. (3) Booksh, K. S.; Kowalski, B. R. Anal. Chem. 1994, 66, 782A-791A. (4) Bro, R. Chemom. Intell. Lab. Syst. 1997, 38, 149-171. (5) Sanchez, E.; Kowalski, B. R. Anal. Chem. 1986, 58, 496-499. (6) Tauler, R. Chemom. Intell. Lab. Syst. 1995, 30, 133-146.

Analytical Chemistry, Vol. 78, No. 23, December 1, 2006 8051

include the above-mentioned PARAFAC, GRAM, and MCR-ALS methods, as well as recently described tools such as bilinear least squares (BLLS).8 Notice that MCR-ALS may alleviate the problem of having either the excitation or the emission profile varying from sample to sample, but not both of them at the same time. The same rationale applies to PARAFAC2, a variant of PARAFAC in which the strict trilinearity condition is relaxed to accommodate for changes in profiles for one of the data dimensions.9,10 When employing second-order data, a variant of PLS, named unfolded-PLS (U-PLS), can be applied. It deals with matrix data by concatenating (or unfolding) the original bidimensional information into unidimensional arrays (vectors). However, in the presence of unexpected compounds of unknown origin, U-PLS should be combined with an additional procedure to achieve for the second-order advantage. Fortunately, this latter technique, called residual bilinearization (RBL) has existed for more than 15 years,11 although only recently has its full potentiality been recognized, both for second-order12 and for third-order data, in this latter case, upon suitable extension to the third dimension.13 U-PLS combined with RBL has been recently applied to systems where component interactions modify the analyte spectra14 and to kinetic-spectrophotometric data with linear dependency in the time mode.15 Very recently, it has been suggested that U-PLS/ RBL performs adequately when processing solid-phase fluorescent signals suffering inner filter effects in the presence of both a strong background signal and unexpected components.16 The present report aims at a theoretical and experimental study of the ability of U-PLS/RBL to analyze complex samples of fluorescent species showing inner filter effects, with focus on the achievement of the important second-order advantage. Both simulations and experimental results discussed in the present report show excellent performance of U-PLS/RBL in comparison with standard methods such as PARAFAC, GRAM, BLLS/RBL, or MCR-ALS. All of the latter algorithms are shown to provide poorer predictive results in comparison with U-PLS/RBL, as expected from the above considerations. THEORY U-PLS Model. In the U-PLS method, the original second-order data are unfolded into vectors before PLS is applied, as has been described by Wold et al.17 In this algorithm, concentration information is employed in the calibration step, without including data for the unknown sample. The I calibration data matrices are first vectorized into JK × 1 vectors, and then a usual PLS model is built using these data together with the vector of calibration concentrations y (size I × 1). This provides a set of loadings P (7) Linder, M.; Sundberg, R. Chemom. Intell. Lab. Syst. 1998, 42, 159-178. (8) Linder, M.; Sundberg, R. J. Chemom. 2002, 16, 12-27. (9) Kiers, H. A. L.; Ten Berge, J. M. F.; Bro, R. J. Chemom. 1999, 13, 275294. (10) Bro, R.; Andersson, C. A.; Kiers, H. A. L. J. Chemom. 1999, 13, 295-309. (11) O ¨ hman, J.; Geladi, P.; Wold, S. J. Chemom. 1990, 4, 79-90. (12) Olivieri, A. C. J. Chemom. 2005, 19, 253-265. (13) Arancibia, J. A.; Olivieri, A. C.; Bohoyo Gil, D.; Mun ˜oz de la Pen ˜a, A.; Dura´nMera´s, I.; Espinosa Mansilla, A. Chemom. Intell Lab. Syst. 2006, 80, 7786. (14) Culzoni, M. J.; Goicoechea, H. C.; Pagani, A. P.; Cabezo´n, M. A.; Olivieri, A. C. Analyst 2006, 131, 718-732. (15) Garcı´a-Reiriz, A.; Damiani, P. C.; Olivieri, A. C. Talanta. In press (DOI: 10.1016/j.talanta.2006.05.050). (16) Piccirilli, C. N.; Escandar, G. M. Analyst 2006, 131, 1012-1020. (17) Wold, S.; Geladi, P.; Esbensen, K.; O ¨ hman, J. J. Chemom. 1987, 1, 41-56.

8052

Analytical Chemistry, Vol. 78, No. 23, December 1, 2006

and weight loadings W (both of size JK × A, where A is the number of latent factors), as well as regression coefficients v (size A × 1). The parameter A can be selected by techniques such as leave-one-out cross-validation.18 If no unexpected components occurred in the test sample, v could be employed to estimate the analyte concentration according to

yu ) tuT v

(1)

where tu is the test sample score, obtained by projecting the vectorized data for the test sample vec(Xu) onto the space of the A latent factors:

tu ) (WT P)-1 WT vec(Xu)

(2)

where vec(‚) implies the vectorization operator. When unexpected constituents occur in Xu, then the sample scores given by eq 2 are unsuitable for analyte prediction through eq 1. In this case, the residuals of the U-PLS prediction step (sp, see eq 3 below) will be abnormally large in comparison with the typical instrumental noise level:

sp ) || ep ||/(JK - A)1/2 ) || vec(Xu) P (WT P)-1 WT vec(Xu) ||/(JK - A)1/2 ) ) || vec(Xu) - P tu ||/(JK - A)1/2

(3)

where || ‚ || indicates the Euclidean norm. This situation can be handled by a separate procedure called residual bilinearization, which has already been described in the literature, and is based on the singular value decomposition (SVD) to model the unexpected effects.11 RBL aims at minimizing the norm of the residual vector eu, computed while fitting the sample data to the sum of the relevant contributions. For a single unexpected component the expression is

vec(Xu) ) P tu + vec[gunx bunx (cunx)T] + eu

(4)

where bunx and cunx are the left and right eigenvectors of Ep and gunx is a scaling factor appropriate for SVD analysis:

(gunx, bunx, cunx) ) SVD1(Ep)

(5)

where Ep is the J × K matrix obtained after reshaping the JK × 1 ep vector of eq 3 and SVD1 indicates taking the first principal component. During this RBL procedure, P is kept constant at the calibration values, and tu is varied until || eu || is minimized in eq 4 using a Gauss-Newton procedure. Once || eu || is minimized, the analyte concentrations are provided by eq 1, by introducing the final tu vector found by the RBL procedure. Software. All routines employed to carry out the calculations described in this paper were written in MATLAB.19 Those for (18) Haaland, D. M.; Thomas, E. V. Anal. Chem. 1988, 60, 1193-1202. (19) MATLAB 6.0, The MathWorks Inc., Natick, MA, 2000.

applying PARAFAC and GRAM are available on the Internet thanks to Bro,20 and those for MCR-ALS are maintained by Tauler et al. respectively.21 U-PLS/RBL and BLLS/RBL are available from the authors on request, including a useful graphical user interface for data input and parameter setting, of the type already described for first-order multivariate calibration.22 Simulated Data Sets. In all cases, second-order data for the calibration set of samples were created starting from noiseless profiles for both analytes (see below). All matrices were of size 17 × 19 data points (17 corresponds to the first dimension and 19 to the second dimension, which are intended to mimic emission and excitation wavelengths, respectively). Two different calibration sets were built, with concentrations of both analytes given by the following: (1) set C1 is a central composite design (9 samples), with concentration ranges for both components from 0 to 1, and (2) set C2 includes the latter central composite design and 6 additional samples only containing component 1 (at equally spaced concentrations in the range 0-1). These calibration sets were built in order to study the effect of increasing the calibration information regarding component 1 in the predicting ability of the secondorder algorithms. A test set of 50 samples was also built, with random concentrations of the analytes (in the range 0-1). To each of these test samples, second-order signals corresponding to a single unexpected component were added in random concentrations. The average contribution of the unexpected component to the overall signal was of the same order of those for the calibrated analytes. Finally, random numbers taken from a Gaussian distribution were added to all signals. The standard deviation of the Gaussian noise was taken as 2% of the mean calibration signal. In the absence of inner filter effects, the calibration signals would be computed as

Xc,i ) y1,c,i S1 + y2,c,i S2 + R sX

(6)

where Xc,i is the J × K matrix of second-order signals for the ith calibration sample, yn,c,i is the nominal concentration of each analyte, and Sn ) kn bn cnT are the corresponding matrix signals at unit concentration for analyte n (bn and cn are the profiles in the first and second dimensions, both normalized to unit length, and kn is a scaling factor, all set at 1), R is a matrix of Gaussian random numbers with unit standard deviation of appropriate size, and sX is the standard deviation of the noise added to signals. When inner filter effect from component 2 on component 1 is present, then S1 is replaced in eq 6 by an S1,if matrix whose generic (j,k) element is given by the following expression:

S1,if(j,k) ) S1(j,k) × exp[-(2j +2k) y2,c,i]

(7)

where 2j and 2k are the absorptivities of component 2 at channels j and k, respectively, in each of the data dimensions. The product (2j y2,c,i) represents the inner filter effect produced when component 2 absorbs the excitation intensity at channel j, whereas (2k y2,c,i) corresponds to absorption of the emission intensity at channel (20) http://www.models.kvl.dk/source/. (21) http://www.ub.es/gesq/mcr/mcr.htm. (22) Olivieri, A. C.; Goicoechea, H. C.; In ˜o´n, F. A. Chemom. Intell. Lab. Syst. 2004, 73, 189-197.

k. The absorptivities are given by the product of an adjustable scaling factor and the values of the excitation profile c2 at each of these two channels. The scaling factor was adjusted to produce a decrease in the fluorescence intensity which was at most 50% of the original signal at any channel. The test signals were built using the following expression:

Xu ) y1,u S1,if + y2,u S2 + y3,u S3 + R sX

(8)

where Xu is the J × K matrix for the unknown sample, yn,u is the nominal concentration of each component, and S3 is the matrix signal for the unexpected component (S3 ) k3 b3 c3T). Notice that the presence of component 3 in the test samples makes the use of the second-order advantage mandatory to resolve the presently simulated mixtures. All profiles for these system are shown in Figure 1A using a single-wavelength axis, in order to show the overlapping of the excitation spectrum of component 2 with the emission and excitation spectra for component 1. The excitation and emission ranges for the simulated experiments are also shown in Figure 1A. In principle, there are other sources of inner filter effect in Figure 1A, because the excitation of component 2 does also partially overlap with the emission and excitation of component 3. These small effects were not considered in the simulations, but may be present in real situations (see below). Figure 1B shows the type of effects that are expected in this simulated system, illustrating how the emission profiles for analyte 1 vary across the calibration set of samples. Similar changes operate in the case of the excitation spectra. This is the main cause for the lack of success for most of second-order multivariate methods except U-PLS/RBL. EXPERIMENTAL SECTION Apparatus. Fluorescence spectral measurements were performed on a fast-scanning Varian Cary Eclipse fluorescence spectrophotometer, equipped with two Czerny-Turner monochromators and a xenon flash lamp, and connected to a PC microcomputer via an IEEE 488 (GPIB) serial interface. Excitation-emission matrices were recorded in a 10-mm quartz cell. Reagents. All experiments were performed with analytical reagent grade chemicals. Stock solutions of benzopyrene (BPY), pyrene (PYR), and chrysene (CHR) were prepared by dissolving 0.0100 g of each reagent (Sigma) in 50 mL of acetonitrile. These solutions were stored in dark bottles at 4 °C. Appropriate working solutions of BPY, PYR, and CHR of different concentrations were prepared by dilution in acetonitrile. Calibration and Validation Sets. A calibration set was constructed composed of 11 samples, used for the determination of CHR in the presence of BPY (also modeled in the calibration phase) and PYR (only present in test samples). In the calibration set (see Table 1), the standards spanned nine concentration levels according to a central composite design, in the ranges 0.00-0.80 (CHR) and 0.00-2.00 mg L-1 (BPY), and two additional samples containing only the analyte CHR at concentrations above and below the mean CHR concentration (the last two entries in Table 1). Concentration values for the analyte CHR were selected considering the linear fluorescence-concentration range. Those for BPY were increased beyond its own linear limit, in order to Analytical Chemistry, Vol. 78, No. 23, December 1, 2006

8053

Table 2. Composition of the Validation Set of Samples and Prediction Results Using PARFAC, MCR-ALS, and U-PLS/RBL in the Experimental Validation Seta predicted CHR/mg L-1 CHR/mg L-1 0.50 0.00 0.30 0.76 0.18 0.63 0.60 RMSE/mg L-1 REP/%

BPY/mg L-1 PARAFAC 0.00 1.49 0.60 1.24 0.80 0.74 0.00

MCR-ALSb U-PLS/RBL

0.78 0.06 0.24 0.85 0.23 0.87 0.73

0.62 0.04 0.38 0.63 0.28 0.66 0.72

0.49 0.01 0.32 0.80 0.17 0.62 0.59

0.12 29

0.10 24

0.01 3.6

a Predicted concentrations are averages of duplicate analyses. RMSE, root-mean-square error; REP%, relative error of prediction. b The direction of augmentation was the excitation mode.

excitation-emission fluorescence matrices of these solutions were then recorded, and the data were subjected to three-way analysis, as described below. Suitable wavelength ranges for the determination were as follows: emission from 340 to 410 nm at 1-nm intervals (J ) 71 wavelengths) and excitation from 200 to 332 nm at 4-nm intervals (K ) 34 wavelengths), making a total of 71 × 34 ) 2414 spectral points per sample matrix. A validation set was also prepared, composed of seven duplicate samples, in the same form as those for calibration, but using a random design, i.e., selecting the target concentrations of both analytes at random from each calibration range (see Table 2 for details on the composition of these samples). Finally, a test set composed of 11 samples, also prepared in duplicate, contained all 3 compounds. In this case, PYR is included and is an unexpected component in unknown samples (Table 3). Figure 1. (A) Excitation (solid lines) and emission (dashed lines) profiles for the three components employed for the simulation study. Numbers indicate the specific component. The channel ranges employed for the simulations (excitation and emission) are shown at the top of the plot. As can be seen, the excitation spectrum of component 2 (blue shade) overlaps with the emission spectrum of component 1 (red shade) and also with its excitation spectrum (yellow shade). (B) Simulated emission profiles for component 1 in the nine calibration samples of set C1, after computing the inner filter effect produced by component 2. Table 1. Composition of the Calibration Set of Samples for the Experimental System under Study CHR/mg L-1

BPY/mg L-1

CHR/mg L-1

BPY/mg L-1

0.41 0.41 0.00 0.80 0.12 0.12

0.00 1.98 1.00 1.00 0.29 1.73

0.69 0.69 0.41 0.12 0.69

0.29 1.73 1.00 0.00 0.00

appreciate the inner filter effect on the fluorescence spectra of CHR. For preparing a given calibration sample, a solution containing CHR and BPY (prepared from adequate volumes of the corresponding solutions) was placed in a 5.00-mL flask, and completion to the mark was achieved with acetonitrile. The 8054 Analytical Chemistry, Vol. 78, No. 23, December 1, 2006

RESULTS AND DISCUSSION Simulated Data. The first step in processing the simulated data sets with the different second-order algorithms is the assessment of the correct number of sample constituents. With PARAFAC, the standard procedure involves the so-called core consistency diagnostic test (CORCONDIA),23 which involves observing the changes in the core consistency parameter as the number of trial fluorescent components is increased. The number of components is taken as the largest number for which the latter parameter is larger than ∼50. Figure 2 shows a typical progression of core consistency values when processing a simulated test sample, together with the calibration set of samples C1, as is usual in PARAFAC modeling studies. As can be seen, the suggested number of components is three, even when a multitude of emission profiles for component 1 occur in this system (see Figure 1B). When more components are employed, the profiles recovered by PARAFAC for the additional components are similar to those for the calibrated analytes. Analogous results were obtained when the calibration set was C2. Therefore, three components were used for PARAFAC, and also for GRAM calculations. Within the BLLS/RBL methodology, the number of components is established as the number of known analytes (two in the (23) Bro, R.; Kiers, H. A. L. J. Chemom. 2003, 17, 274-286.

Table 3. Composition of the Test Set of Samples and Prediction Results Using PARAFAC, MCR-ALS, and U-PLS/RBL in the Experimental Test Seta predicted CHR/mg L-1 CHR/mg

L-1

0.55 0.79 0.28 0.34 0.62 0.73 0.00 0.00 0.67 0.15 0.30

BPY/mg 1.49 1.00 1.72 0.29 0.59 1.98 1.49 0.74 0.00 0.00 0.00

L-1

PYR/mg

L-1

0.49 0.29 0.76 0.40 0.34 0.69 0.49 0.29 0.75 0.24 0.49

RMSE/mg L-1 REP,%

PARAFAC

MCR-ALSb

U-PLS/RBL

0.60 0.70 0.38 0.47 0.56 0.65 0.06 0.03 0.88 0.27 0.52

0.43 0.68 0.25 0.24 0.60 0.57 0.05 0.02 0.48 0.22 0.28

0.56 0.81 0.31 0.38 0.58 0.70 0.01 0.01 0.65 0.16 0.31

0.12 30

0.10 24

0.02 5.9

a Predicted concentrations are averages of duplicate analyses. RMSE, root-mean-square error; REP%, relative error of prediction. b The direction of augmentation was the excitation mode.

Figure 2. (A) PARAFAC core consistency diagnostics for a typical simulated test sample, showing the changes in the core consistency parameter as a function of a trial number of components. (B) Logarithm of the leave-one-out cross-validation [log(PRESS)] as a function of the number of U-PLS latent variables A, constructed using only simulated calibration data (solid line and filled circles), and logarithm of the U-PLS prediction residuals [log(su)] as a function of the number of unexpected components (Nunx) (dashed line and open circles).

present case) plus the number of unexpected components, assessed in turn by suitable study of the prediction residuals when analyzing a given test sample (for details, see refs 24 and 25). The use of this methodology led to the conclusion that three total components (two analytes and a single unexpected component) were a sensible choice for this method, using either calibration set C1 or C2 during the study of a typical test sample. Finally, when employing MCR-ALS, principal component analysis is the technique recommended for finding the correct number of components, which was also three. This latter method does also

require approximations to pure component spectra, which were assessed using the known spectral profiles of the simulated components. When using U-PLS, on the other hand, valid procedures for establishing the number of latent variables A include leave-oneout cross-validation, according to the Haaland and Thomas criterion, as described in the literature.18 In the presently studied case, Figure 2B shows how the PRESS varies as the number of latent variables is increased when calibration set C1 is employed, showing that A ) 3 is the optimum value. The same result was obtained for calibration set C2. The fact that three latent variables are required to explain the calibration data is indicative that U-PLS is modeling the inner filter effect by including additional factors to the known number of analytes (two in the present case). Another relevant problem is to assess the number of unexpected components (Nunx), which can be done by analyzing the residuals su as a function of a trial number of unexpected components, as has already been described for BLLS/RBL.24-26 Figure 2B clearly indicates that the size of residuals stabilizes at Nunx ) 1. Once the number of components for each second-order methodology is established, the next step is the analysis of the instrumental signals, joining the calibration and test samples for PARAFAC, GRAM, and MCR-ALS and separating the calibration and prediction (with second-order advantage) steps in BLLS/RBL and U-PLS/RBL. Typical results for PARAFAC are shown in Figure 3A and B in the form of recovered emission and excitation profiles, respectively, when processing a test sample together with the calibration set C1. These latter figures also compare the retrieved profiles with those employed for the simulations. As can be seen, differences arise, in particular in the profiles involved in the inner filter process, i.e., the emission and excitation profiles for component 1 (Figure 3A) and in the excitation profile for component 2 (Figure 3B). On the other hand, when the calibration set was C2, which contains several samples only composed of (24) Goicoechea, H. C.; Olivieri, A. C. Appl. Spectrosc. 2005, 59, 67-74. (25) Marsili, N. R.; Lista, A.; Fernandez Band, B. S.; Goicoechea, H. C.; Olivieri, A. C. Analyst 2005, 130, 1291-1298. (26) Haimovich, A.; Orselli, R.; Escandar, G. M.; Olivieri, A. C. Chemom. Intell. Lab. Syst. 2006, 80, 99-108.

Analytical Chemistry, Vol. 78, No. 23, December 1, 2006

8055

Figure 3. PARAFAC recovered profiles for emission (A) and excitation (B), and MCR-ALS recovered profiles for emission (C) and excitation (D) (dashed lines in all cases), when processing a typical simulated test sample together with the calibration set C1. Component numbers are indicated in each graph. The true spectral profiles are shown as solid lines in all graphs. In plot D, all excitation profiles recovered by MCR-ALS having nonzero elements are displayed. All spectra have been normalized to unit length.

analyte 1, the recovered profiles (both excitation and emission) for component 1 were very similar to the theoretical ones. This indicates that inclusion of sufficient information for analyte 1 in the calibration set leads PARAFAC to obtain profiles that closely resemble those in the absence of inner filter effects. When MCR-ALS is applied, different matrix augmentation modes are possible, depending on whether the data matrices are joined in the direction of the excitation, the emission, or in both of them at the same time. However, in any conceivable augmentation mode, it is necessary to fix an emission profile (or an excitation profile) common to the data matrices for several different samples. Since the emission (or excitation) profiles of the component suffering the internal filter effect are in principle different in each sample, it is not possible to model them using any of the augmentation modes. In fluorescence spectroscopy, there are no reasons to prefer a specific dimension as the augmentation direction, and therefore, we illustrate the MCR-ALS results through simulations using excitation augmentation. In all cases, all calibration matrices were joined with each test sample matrix in turn, in order to achieve the second-order advantage. The usual constraints of non-negativity in both spectra and concentrations were imposed. Panels C and D in Figure 3 show the optimum emission and excitation profiles, respectively, when processing a typical test sample together with the calibration set C1. As can be seen in Figure 3C, the emission profiles differ from the theoretical expectations, particularly in the case of component 1, as was observed for PARAFAC, and also in component 2. On the other hand, the sample-specific excitation profiles (Figure 3D) differ among each other, because they have been optimized by MCR-ALS as separate component profiles in each sample, where they experience distinct inner filter effects. 8056 Analytical Chemistry, Vol. 78, No. 23, December 1, 2006

Figure 4. Predicted vs nominal concentration values for analyte 1 in simulated data sets of 50 samples with random component concentrations using the following models: (A) PARAFAC, (B) MCRALS, and (C) U-PLS/RBL. The narrow solid lines in (A)-(C) indicate the perfect fit.

Similar results were obtained when processing the samples using the calibration set C2. Results similar to those shown in Figure 3 were obtained using the remaining second-order algorithms, with the notable exception of U-PLS. In the case of the latter method, a set of latent factors contained in matrix P is rendered, which are able to explain most of the variability in the training set of samples. This is true regardless of which calibration set is employed. Presumably, inclusion of additional information for analyte 1 in the absence of inner filter effects would increase the ability of U-PLS to recognize samples of the latter type. In any case, notice that P carries no directly interpretable physical information. Application of the calibration methodology appropriate for each algorithm produced the prediction results for analyte 1 in the set of 50 test samples with random concentrations of the three components. Figure 4A displays the results corresponding to PARAFAC using the calibration set C1, which leads to a relative error of prediction (REP) of 24%. Similarly discouraging results (not shown) were obtained using calibration set C2. There seems to be a correlation between the PARAFAC predicted concentrations and the presence of component 2 in the test samples: when using the calibration set C1, the smallest errors are obtained when component 2 has concentrations close to the center of the calibration, indicating that PARAFAC is modeling an average inner filter effect. If calibration set C2 is employed, then the smallest

Table 4. U-PLS Leave-One-Out Cross-Validation Results for the Experimental Calibration Set of Samplesa latent variable

PRESS

RMSECV

F

p

1 2 3 4 5 6 7

0.87 0.060 0.078 0.098 0.024 0.017 0.017

0.28 0.07 0.08 0.09 0.05 0.04 0.04

50.7 3.49 4.54 5.68 1.37 1.00

0.999 0.975 0.990 0.995 0.696 0.500

a PRESS, predicted error sum of squares; RMSECV, root-meansquare error in cross-validation; F, statistical parameter for establishing the optimum number of latent variables; p, probability associated to F with (11,11) degrees of freedom (there are 11 samples in the calibration set). Optimum values in boldface type.

Figure 5. Fluorescence excitation spectra (blue lines) and emission spectra (red lines) of the experimental system components, plotted using a common wavelength axis: CHR (solid line), BPY (dashed line), and PYR (dotted line). Excitation spectra were registered at the following emission wavelengths: CHR, 382 nm, BPY, 412 nm, and PYR, 372 nm. The area under the excitation spectrum for BPY has been shaded for better appreciation of the inner filter effect produced by this component. The emission spectra were recorded at the following excitation wavelengths: CHR, 265 nm, BPY, 250 nm, and PYR, 240 nm. All spectra are normalized to unit length for comparison. The working excitation and emission ranges are indicated at the top of the plot.

errors correspond to test samples having low concentrations of component 2; in this case. PARAFAC modeled situations where inner filter effects are minimal. In all cases, however, the overall performance of these multivariate algorithms is very poor. The prediction results for MCR-ALS using calibration set C1 are graphically displayed in Figure 4B, leading to a REP of 16%, lower than PARAFAC because of the more flexible structure of MCR-ALS. Similar figures were obtained using calibration set C2. The use of BLLS in combination with RBL (REP ) 14%) gave results similar to MCR-ALS, whereas GRAM (REP ) 38%) furnished seemingly poorer results than PARAFAC. Figure 4C plots the corresponding results for U-PLS/RBL, which achieves an average root-mean-square error of prediction (RMSEP) of 0.01 concentration unit, i.e., ∼2% of the mean calibration concentration. This is acceptable in view of the degree of noise introduced in the system. In all cases, the calibration with factors additional to those required by the number of calibrated analytes, coupled to the successful RBL procedure, provides U-PLS/RBL with both modeling and predicting ability and achieves the second-order advantage with no serious loss in accuracy and precision. In conclusion, the simulation study provides strong evidence that the U-PLS/RBL methodology is currently the most successful second-order multivariate method for dealing with fluorescent inner filter effects, still preserving the second-order advantage, a property of paramount importance in the field of complex sample analysis. Experimental Data. Fluorescence spectra for standard solutions of CHR, BPY, and PYR are shown in Figure 5 in a rather wide spectral range. Not only a severe overlapping can be noticed in the useful wavelength range, but fluorescent inner filter effects are likely to occur since the excitation spectrum of BPY covers

the entire wavelength range where both CHR and PYR are excited and emit fluorescence. This suggests that quantitation of the analyte CHR requires the application of multivariate calibration methodologies able to take into account the influence of filter effects. Validation Samples. A set of seven validation samples (Table 2), only containing CHR, BPY, or both, was analyzed in order to monitor the predictive ability of the different models. When using PARAFAC, the standard second-order method for studies based on fluorescence excitation-emission data, core consistency analysis required the consideration of two fluorophores, as would be expected from the composition of the calibration and validation samples, even in the presence of inner filter effects. The success in recovering the component profiles was qualitatively similar to that discussed above for the simulated data, i.e., slight differences were observed on comparing the retrieved profiles with those of pure components. This is explained on the basis of the ability of PARAFAC in retrieving an average component profile from the set of analyzed samples, which, in the presence of inner filter, may differ from that for the pure component. In any case, the statistical analysis of the prediction results indicated a rather poor behavior of PARAFAC toward these validation samples (Table 2). MCR-ALS was also applied to this set of validation samples, considering two responsive components. Selecting the excitation dimension as the direction of matrix augmentation led to the prediction results shown in Table 2, which are seen to be slightly better than those furnished by PARAFAC. Furthermore, the profile recovery results were of similar quality as those already commented on during the simulation study. In the case of U-PLS/RBL, five latent variables were required to describe the variability present in the calibration set of samples, as dictated by leave-one-out cross-validation, according to the Haaland and Thomas criterion18 (Table 4). Error analysis suggested that Nunx ) 0, as expected from the known composition of the validation samples, making it unnecessary to resort to RBL and the second-order advantage in this case. The prediction results for U-PLS/RBL for these validation samples are also shown in Table 2. As can be seen, considerably better recoveries are obtained in comparison to PARAFAC, with an average prediction error of 3.6%. The results for this simple validation set did not convey, however, the importance of those obtained in the presence of unexpected components (see below). Analytical Chemistry, Vol. 78, No. 23, December 1, 2006

8057

Figure 6. (A) Excitation profile recovered by U-PLS/RBL for the unexpected component present in a typical test sample (solid line), in comparison with the real excitation spectrum of PYR (dashed line). (B) The corresponding emission profiles, with line types as in (A). All spectra have been normalized to unit length for comparison.

Test Samples Containing PYR. The presence of PYR in the test samples produces a strong fluorescence signal, which overlaps with those of CHR and BPY over the useful wavelength range (Figure 5). The study of these samples with both the PARAFAC and MCR-ALS models did require, therefore, the introduction of an additional spectral component to those employed for the analysis of the validation samples. Again, the profile recovery was of the quality expected for systems having inner filter effects. When using the U-PLS/RBL model, the number of latent variables A was identical to that employed for calibration and in the study of the validation samples (i.e., A ) 5), while Nunx had to be increased to one, in order to accommodate for the signal of the unexpected PYR signal. The profiles recovered for the unexpected component when Nunx ) 1 are given by RBL analysis, and the principal component in each data dimension for this signal can be compared with pure analyte spectra. Overall, the results provided by this technique appear to be in good agreement with pure spectra, as is apparent on inspection of Figure 6A and B. Nevertheless, some subtle effects are also visible on a closer look to the retrieved profiles, particularly the excitation profile shown in Figure 6A: the intensities of the profile estimated by RBL for the unexpected component in the regions 230-250 and

8058

Analytical Chemistry, Vol. 78, No. 23, December 1, 2006

300-335 nm seem to be relatively higher than those corresponding to the excitation spectrum of pure PYR. This can be explained by inspecting the excitation spectrum of BPY, which shows maximum excitation in the range 250-300 nm, causing the largest decrease of the emission of PYR in these latter spectral region. Therefore, the presently applied RBL procedure is seen to be successful in the estimation of profiles for the unexpected components, even including the subtle inner filter effects. Specific prediction results from PARAFAC, MCR-ALS, and U-PLS/RBL are provided in Table 3 for the test set of samples, including the corresponding statistical information. As can be seen, U-PLS/RBL not only leads to reasonably low values of RMSEP but also provides significantly better recoveries in comparison to other multivariate techniques. MCR-ALS seems to give better results than PARAFAC, although they cannot be compared with the quality of those from U-PLS/RBL. These encouraging results imply that the second-order advantage achieved with excitationemission fluorescence matrices having inner filter effects is fully exploited by the new U-PLS/RBL algorithm and suggest that the latter one is the most successful alternative for processing of the presently analyzed second-order data. Finally, notice that the possibility of combining the RBL procedure with the multidimensional version of PLS (N-PLS) has already been suggested,11 although this model was not developed or tested. Work in this direction is in progress in our laboratory. CONCLUSIONS Fluorescence excitation-emission matrices showing inner filter effects can be successfully processed by unfolded-partial least-squares regression analysis, because the flexible structure of the latter technique allows it to account for these effects by including additional latent variables in the model. In the presence of unexpected sample components, the calibration model can be combined with residual bilinearization, a procedure capable of modeling the contribution of potential interferences, allowing one to achieve the second-order advantage. The appropriate combination of these techniques appears to be unique in its success for processing the type of data that are described in the present report. ACKNOWLEDGMENT Universidad Nacional de Rosario, CONICET (Consejo Nacional de Investigaciones Cientı´ficas y Te´cnicas, Project PIP 5303), ANPCyT (Agencia Nacional de Promocio´n Cientı´fica y Tecnolo´gica, Project PICT05-25825), Ministerio de Educacio´n y Ciencia of Spain (Project CTQ2005-02389), and AECI (Programa Intercampus Iberoame´rica/PCI Project A/2394/05) are gratefully acknowledged for financial support. Received for review July 26, 2006. Accepted September 25, 2006. AC061369V