Anal. Chem. 2007, 79, 6949-6958
Multiway Partial Least-Squares Coupled to Residual Trilinearization: A Genuine Multidimensional Tool for the Study of Third-Order Data. Simultaneous Analysis of Procaine and Its Metabolite p-Aminobenzoic Acid in Equine Serum Patricia C. Damiani,† Isabel Dura´n-Mera´s,‡ Alejandro Garcı´a-Reiriz,† Ana Jime´nez-Giro´n,‡ Arsenio Mun˜oz de la Pen˜a,‡ and Alejandro C. Olivieri*,†
Departamento de Quı´mica Analı´tica, Facultad de Ciencias, Universidad de Extremadura (06071) Badajoz, Spain, and Departamento de Quı´mica Analı´tica, Facultad de Ciencias Bioquı´micas y Farmace´ uticas, Universidad Nacional de Rosario, and Instituto de Quı´mica Rosario (IQUIR), Consejo Nacional de Investigaciones Cientı´ficas y Te´ cnicas (CONICET), Suipacha 531, Rosario (S2002LRK), Argentina
A new third-order multivariate calibration approach, based on the combination of multiway-partial leastsquares with a separate procedure called residual trilinearization (N-PLS/RTL), is presented and applied to multicomponent analysis using third-order data. The proposed chemometric algorithm is able to predict analyte concentrations in the presence of unexpected sample components, which require strict adherence to the secondorder advantage. Results for the determination of procaine and its metabolite p-aminobenzoic acid in equine serum are discussed, based on kinetic fluorescence excitationemission four-way measurements and application of the newly developed multiway methodology. Since the analytes are also the reagent and product of the hydrolysis reaction followed by fast-scanning fluorescence spectroscopy, the classical approach based on parallel factor analysis is challenged by strong linear dependencies and multilinearity losses. In comparison, N-PLS/RTL appears an appealing genuine multiway alternative that avoids the latter complications, yielding analytical results that are statistically comparable to those rendered by related unfolded algorithms, which are also able to process fourway data. Prediction was made on validation samples with a qualitative composition similar to the calibration set and also on test samples containing unexpected equine serum components. Owing to its intrinsic sensitivity, easiness of use, and availability of instruments, fluorescence spectroscopy provides a valuable tool to analytical chemists.1 When a given sample carries unexpected fluorescent constituents that have not been modeled during the calibration phase, a convenient way of quantifying a given analyte * To whom correspondence should be addressed. E-mail: aolivier@fbioyf. unr.edu.ar. † Universidad Nacional de Rosario and Instituto de Quı´mica Rosario. ‡ Universidad de Extremadura. (1) Ichinose, N.; Schwedt, G.; Schnepel, F. M.; Adachi, K. Fluorometric analysis in biomedical chemistry; John Wiley & Sons: New York, 1987. 10.1021/ac070596+ CCC: $37.00 Published on Web 08/10/2007
© 2007 American Chemical Society
is by resorting to higher-order data coupled to the second-order advantage.2 Examples on the use of second-order spectrofluorometric data to analyze complex samples can be found in recent reports.3,4 However, very few works have been published based on third-order data. Specifically, they describe the processing of time-modulated luminescence excitation-emission matrices, either by following the kinetics of a reaction using spectrofluorometry5-11 or by the time-resolved decay of phosphorescence signals.12 A recent publication deals with two-dimensional liquid chromatography coupled to diode array detection for metabolomic studies.13 The classical multiway multivariate algorithm parallel factor analysis (PARAFAC)14 is capable of processing second- and higherorder data and obtaining the second-order advantage. The success of PARAFAC in this regard is directly related to the trilinearity in the processed data: fluorescence excitation-emission matrices are not merely a collection of two-way data sets, but there is actually an internal relationship between each of the matrices, and as a consequence, the decomposition of the data array built with (2) Booksh, K. S.; Kowalski, B. R. Anal. Chem. 1994, 66, 782A-791A. (3) Smilde, A.; Bro, R.; Geladi, P. Multi-way analysis. Applications in the chemical sciences; Wiley: Chichester, 2004. (4) Bro, R. Crit. Rev. Anal. Chem. 2006, 36, 279-293. (5) Nikolajsen, R. P. H.; Booksh, K. S.; Hansen, A. M.; Bro, R. Anal. Chim. Acta 2003, 475, 137-150. (6) Tan, Y.; Jiang, J. H.; Wu, H. L.; Cui, H.; Yu, R. Q. Anal. Chim Acta 2000, 412, 195-202. (7) Nahorniak, M. L.; Cooper, G. A.; Kim, Y. C.; Booksh, K. S. Analyst 2005, 130, 85-93. (8) Olivieri, A. C.; Arancibia, J. A.; Mun ˜oz de la Pen ˜a, A.; Dura´n Mera´s, I.; Espinosa Mansilla, A. Anal. Chem. 2004, 76, 5657-5666. (9) Mun ˜oz de la Pen ˜a, A.; Dura´n Mera´s, I.; Jime´nez Giro´n, A. Anal. Bioanal. Chem. 2006, 385, 1289-1297. (10) Mun ˜oz de la Pen ˜a, A.; Dura´n Mera´s, I.; Jime´nez Giro´n, A.; Goicochea, H. C. Talanta 2007, 71, 806-815. (11) Kim, Y. C.; Jordan, J. A.; Nahorniak, M. L.; Booksh, K. S. Anal. Chem. 2005, 77, 7679-7686. (12) Goicoechea, H. C.; Yu, S.; Olivieri, A. C.; Campiglia, A. D. Anal. Chem. 2005, 77, 2608-2616. (13) Porter, S. E. G.; Stoll, D. R.; Rutan, S. C.; Carr, P. W.; Cohen, J. D. Anal. Chem. 2006, 78, 5559-5569. (14) Bro, R. Chemom. Intell. Lab. Syst. 1997, 38, 149-171.
Analytical Chemistry, Vol. 79, No. 18, September 15, 2007 6949
the response matrices measured for a number of samples is often unique. If a unique decomposition is achieved, the second-order advantage can be obtained, allowing a robust estimation of analyte concentrations in mixtures, even in the presence of unknown interferences. Recently, alternatives based on data unfolding, such as trilinear least-squares (TLLS)15 and unfolded-partial least-squares (U-PLS),16 have been introduced, which deal with this information by concatenating (or unfolding) the original data into unidimensional arrays (vectors). In the presence of unexpected compounds of unknown origin, both of these methods should be complemented with an additional procedure to achieve the second-order advantage. A technique that is useful in this regard, called residual trilinearization (RTL), has been developed by our group as an extension of residual bilinearization (RBL),17,18 which was originally proposed for second-order data.15 Latent structured methods such as U-PLS/RBL are particularly interesting, especially because recent applications in the second-order domain have highlighted its abilities in modeling nontrilinear data sets, when compared to the more classical PARAFAC and bilinear least-squares/RBL (BLLS/RBL) methods.19-21 However, it appears that a genuine multiway latent variable method capable of achieving the second-order advantage is still needed. We have recently combined multiway PLS (N-PLS)22 with RBL for three-way data, producing a technique that is still under testing in our laboratories, using both simulated and experimental second-order data sets. In the present report, where four-way kinetic fluorescence excitation-emission data are analyzed, we introduce the required combination of N-PLS with RTL for thirdorder data sets, extending the approach one further dimension. The aim is the simultaneous quantitation of the anaesthetic procaine and its metabolite p-aminobenzoic acid in equine serum samples, where the background interference makes the use of the second-order advantage mandatory. Procaine (PRO, 2-diethylaminoethyl p-aminobenzoate hydrochloride) is used as a local anaesthetic. In human plasma, it is almost completely hydrolyzed by the enzyme pseudocholinesterase to p-aminobenzoic acid (PAB) and diethylaminoethanol.23 The latter compound has both circulatory and neurological effects, producing blood vessel constriction and euphoria.24 Procaine is usually used in horses as an anaesthetic and a stimulating drug, and also accompanying penicillin to treat infections, with plasma concentrations on the order of 10 mg mL-1.25 Horse plasma does not promote the complete hydrolysis of procaine into p-aminoben(15) Arancibia, J. A.; Olivieri, A. C.; Bohoyo Gil, D.; Mun ˜oz de la Pen ˜a, A.; Dura´nMera´s, I.; Espinosa Mansilla, A. Chemom. Intell. Lab. Syst. 2006, 80, 7786. (16) Wold, S.; Geladi, P.; Esbensen, K.; O ¨ hman, J. J. Chemom. 1987, 1, 41-56. (17) O ¨ hman, J.; Geladi, P.; Wold, S. J. Chemometrics 1990, 4, 79-90. (18) Olivieri, A. C. J. Chemom. 2005, 19, 253-265. (19) Bohoyo Gil, D.; Mun ˜oz de la Pen ˜a, A.; Arancibia, J. A.; Escandar, G. M.; Olivieri, A. C. Anal. Chem. 2006, 78, 8051-8058. (20) Culzoni, M. J.; Goicoechea, H. C.; Pagani, A. P.; Cabezo´n, M. A.; Olivieri, A. C. Analyst 2006, 131, 718-732. (21) Garcı´a-Reiriz, A.; Damiani, P. C.; Olivieri, A. C. Talanta 2007, 71, 806-815. (22) Bro, R. J. Chemom. 1996, 10, 47-61. (23) Reynolds, J. E. F., Ed. W. Martindale, Extra Pharmacopoeia, Pharmaceutical Press: London, 1996; Vol. XXXI, p 76. (24) Turner, P.; Volans, G.; Wiserman, H. Drugs Handbook; McMillan: London, 1993; p 76. (25) Olse´n, L.; Ingvast-Larsson, C.; Brostro ¨m, H.; Larsson, P.; Tja¨lve, H. J. Vet. Pharmacol. Therap. 2007, 30, 201-207.
6950
Analytical Chemistry, Vol. 79, No. 18, September 15, 2007
zoic acid,26 and therefore, the simultaneous determination of both compounds is important in monitoring fair horse sports. Several analytical methods for the determination of either procaine or p-aminobenzoic acid in a variety of samples are known;27-35 however, only few methods were developed for the simultaneous determination of both compounds, mainly based on highperformance liquid chromatographic techniques.36-38 In this report, we follow the hydrolysis of procaine to give strongly fluorescent p-aminobenzoic acid in aqueous alkaline media,39 by recording fluorescence excitation-emission matrices as a function of reaction time, using a fast scanning detector, and processing the obtained four-way information with the classical PARAFAC model, as well as with the presently introduced multiway latent structured method N-PLS/RTL and some of its unfolded predecessors. We also compare the advantages and disadvantages of the above-mentioned chemometric techniques as regards the resolution of the presently studied multicomponent mixtures. THEORY General Considerations. Before discussing specific aspects of the employed algorithms, it is useful to consider the different manners in which they process four-way data arrays. For a given test or problem sample, PARAFAC builds a joined model, which includes the third-order signals for all the calibration samples and the analyzed test sample data. This must be done for each of the test samples. In the first step of PARAFAC processing, the fourway signal array is decomposed, while the calibration concentrations are employed, in a subsequent step, to estimate a given component concentration in the test sample. This means that a new PARAFAC model is calculated for each new test or problem sample to be predicted. On the other hand, in the N-PLS/RTL, U-PLS/RTL, and TLLS/ RTL methodologies, the first step consists of establishing a relationship between the measured calibration signals and the known calibration concentrations, without considering the test sample data. Once this is done, a postcalibration RTL procedure applied to the test sample signals is introduced, allowing one to model the presence of unexpected components and to accurately estimate the analyte concentration. In this way, PARAFAC resolves each component information separately, and this is the reason why this model inherently (26) Committee for Veterinary Medicinal Products, Procaine. Summary Report EMEA/MRL/217/97-FINAL of the Veterinary Medicines Evaluation Unit, European Agency for the Evaluation of Medicine Products, January 1998. (27) Storms, M. L.; Stewart, J. T. J. Pharm. Biomed. Anal. 2002, 30, 49-58. (28) Berzas Nevado, J. J.; Murillo Pulgarı´n, J. A.; Reillo Escudero, O. I. Appl. Spectrosc. 2000, 54, 1678-1683. (29) Einosuke, T.; Yuji, N.; Xuan, Z. S.; Shogo, M.; Yukio, K. Jpn. J. Forensic Toxicol. 1995, 13, 11-16. (30) Carretero, A. S.; Cruces-Blanco, C.; Peinado, S. F.; Bergmi, R. E. I.; Gutie´rrez, A. F. J. Pharm. Biomed. Anal. 1999, 21, 969-974. (31) Badea, I.; Moja, D.; Vladescu, L. Anal. Biochem. 2002, 374, 51-53. (32) Vanquerp, V.; Rodriguez, C.; Coiffard, C.; Coiffard, L. J. M.; De RoeckHoltzhauer, Y. J. Chromatogr., A 1999, 832, 273-277. (33) Kastel, R.; Rosival, I.; Blahovec, J. Biomed. Chromatogr. 1994, 8, 294-296. (34) Stokes, D. L.; Vo-Dinh, T. Sens. Actuators, B 2000, 69, 28-36. (35) Chen, L. C.; Hu, M. L. J. Food Drug Anal. 1996, 4, 75-87. (36) Yang, H.; Thyrion, F. C. J. Liq. Chromatogr. Rel. Technol. 1998, 21, 13471357. (37) Rop, P. P.; Grimaldi, F.; Bresson, M.; Fornaris, M.; Viala, A. J. Liq. Chromatogr. 1993, 16, 2797-2811. (38) Dhananjeyan, M. R.; Bykowski, C.; Trendel, J. A.; Sarver, J. G.; Andob, H.; Erhardt, P. W. J. Chromatogr., B 2007, 847, 224-230. (39) Kondritzer, A. A.; Zvirblis, P. J. Am. Pharm. Assoc. 1957, 46, 531-535.
presents the second-order advantage (the analyte signal is separated from that of the potential interferences). On the other hand, the remaining methods intend to find the relationship among the measured calibration responses and particular concentration information (used to build the model). This is why the potential interferents should be present in the calibration samples; otherwise a separate procedure, such as residual trilinearization, should be combined with the calibration model to obtain the second-order advantage. PARAFAC Model. After measuring third-order data for a set of samples, each of them as a J × K × L array (J, K, and L are the number of data points in each of the three dimensions), the I training arrays Xi,cal are joined with the unknown sample array Xu into a four-way data array X, whose dimensions are [(I + 1) × J × K × L]. Provided X follows a quadrilinear PARAFAC model, it can be written in terms of four vectors for each responsive component, designated as an, bn, cn, and dn, and collecting the relative concentrations [(I + 1) × 1] for component n and the profiles in the three modes (J × 1), (K × 1), and (L × 1), respectively. The specific expression for a given element of X is40 N
Xijkl )
∑a
inbjnckndln
+ Eijkl
(1)
n)1
where N is the total number of responsive components, ain is the relative concentration of component n in the ith. sample, and bjn, ckn, and dln are the fluorescence intensities at the emission wavelength j, excitation wavelength k, and time l, respectively. The values of Eijkl are the elements of the array E, which is a residual error term of the same dimensions as X. The column vectors an, bn, cn, and dn are collected into the corresponding loading matrices A, B, C, and D (bn, cn, and dn are usually normalized to unit length). The model described by eq 1 defines a decomposition of X, which provides access to emission (B) and excitation spectral profiles (C), time profiles (D), and relative concentrations (A) of individual components in the (I + 1) mixtures, whether they are chemically known or not. This constitutes the basis of the secondorder advantage. The decomposition is usually accomplished through an alternating least-squares minimization scheme.14,41 Issues relevant to the application of the PARAFAC model for the calibration of four-way data are as follows: (1) initializing the algorithm, (2) establishing the number of responsive components, (3) identifying specific components from the information provided by the model, and (4) calibrating the model in order to obtain absolute concentrations for a particular component in an unknown sample. Initializing PARAFAC for the study of four-way arrays can be done using singular value decomposition vectors, spectral and kinetic data that are known in advance for pure components, or by the loadings giving the best fit after small PARAFAC runs involving both singular value decomposition vectors and several sets of orthogonal random loadings. These options are implemented in the PARAFAC package.42 (40) Leurgans, S.; Ross, R. T. Stat. Sci. 1992, 7, 289-319. (41) Paatero, P. Chemom. Intell. Lab. Syst. 1997, 38, 223-242. (42) http://www.models.kvl.dk/source/.
The number of responsive components (N) can be estimated by several methods. A useful technique is CORCONDIA, a diagnostic tool considering the PARAFAC internal parameter known as core consistency.43,44 Another useful technique is the consideration of the PARAFAC residual error, i.e., the standard deviation of the elements of the array E in eq 1.14 Usually this parameter decreases with increasing N, until it stabilizes at a value compatible with the instrumental noise (the latter can be assessed by blank replicate measurements). A reasonable choice for N is thus the smallest number of components for which the residual error is not statistically different than the instrumental noise. Identification of the chemical constituents under investigation is done with the aid of the three estimated mode profiles, emission spectrum, excitation spectrum, and kinetic profile, and comparing them with those for a standard solution of the analyte of interest. This is required since the components obtained by decomposition of X are sorted according to their contribution to the overall spectral variance, and this order is not necessarily maintained when the unknown sample is changed. Absolute analyte concentrations are obtained after calibration, because the four-way array decomposition only provides relative values (A). Calibration is done by means of the set of standards with known analyte concentrations (contained in an I × 1 vector y) and regression of the first I elements of column an against y:
k ) y+ × [a1n|...|aIn]
(2)
where “+” implies taking the pseudoinverse. Conversion of relative to absolute concentration of n in the unknown proceeds by division of the last element of column an [a(I+1)n] by the slope of the calibration graph k:
yu ) a(I+1)n/k
(3)
The above procedure is repeated for each new test sample analyzed. N-PLS Model. In the N-PLS method applied to third-order data, concentration information is employed in the calibration step, without including data for the unknown sample. The I calibration data arrays, together with the vector of calibration concentrations y (size I × 1) are employed to obtain sets of loadings Wj, Wk, and Wl (of sizes J × A, K × A, and L × A, where A is the number of latent factors), as well as regression coefficients b (size A × 1).43 The parameter A can be selected by techniques such as leaveone-out cross-validation.45 If no unexpected components occurred in the test sample, b could be employed to estimate the analyte concentration according to
yu ) tuT b
(4)
where tu is the test sample score vector, obtained by appropriate projection of the test data onto the calibration loading matrices. When unexpected constituents occur in the unknown sample, the (43) Bro, R. Multi-way analysis in the food industry. Doctoral Thesis, University of Amsterdam, The Netherlands, 1998. (44) Bro, R.; Kiers, H. A. L. J. Chemom. 2003, 17, 274-286. (45) Haaland, D. M.; Thomas, E. V. Anal. Chem. 1988, 60, 1193-1202.
Analytical Chemistry, Vol. 79, No. 18, September 15, 2007
6951
latter scores are unsuitable for analyte prediction through eq 4. In this case, it is useful to consider the residuals of the N-PLS modeling of the test sample signal (sp; see eq 5 below) before prediction is made. These residuals will be abnormally large in comparison with the typical instrumental noise level:
sp ) || ep ||/(JKL-A)1/2 ) || vec(Xu) - vec(X ˆ u) ||/ (JKL-A)1/2 (5) where X ˆ u is the sample three-way data array (Xu) reconstructed by the N-PLS model and || ‚ || indicates the Euclidean norm. This situation can be handled by a separate procedure called residual trilinearization, based on the Tucker3 model of the unexpected effects.8 RTL aims at minimizing the residuals computed while fitting the sample data to the sum of the relevant contributions:
Xu ) reshape{tu[(Wj| X |Wk)| X |Wl]} + Tucker3(X ˆ u - Xu) + Eu (6) where “reshape” indicates transforming a JKL × 1 vector into a J × K × L three-way array, and | X | indicates the Kathri-Rao operator.43 During this RTL procedure, the weight loadings Wj, Wk, and Wl are kept constant at the calibration values, and tu is varied until the final RTL residual error su is minimized using a Gauss-Newton procedure, with su given by
su ) || Eu ||/(JKL)1/2
(7)
where Eu is from eq 6. Once this is done, the analyte concentrations are provided by eq 4, by introducing the final tu vector found by the RTL procedure. To analyze the presently discussed data, the Tucker3 model in eq 6 is constructed by restricting the loadings to be orthogonal and with no special constraints on the core elements. For a single unexpected component, this analysis is straightforward and provides the corresponding interferent profiles in the three dimensions. For additional unexpected constituents, however, the retrieved profiles no longer resemble true spectra (or time profiles). Moreover, in this latter case, several different Tucker3 models could in principle be constructed, because the number of loadings may be different in each dimension. We notice that the aim which guides the RTL procedure is the minimization of the residual error term su of eq 7 to a level compatible with the degree of noise present in the measured signals. Therefore, if two unexpected components are considered, for example, one should explore the possible Tucker3 models having one or two loadings in each dimension and select the simplest model giving a residual value of su that is not statistically different from the minimum one. For more unexpected components, a similar procedure is recommended. The final Tucker3 model selected to model the unexpected effects is the simplest one, which provides a value of su, which is not statistically different from the noise level. We note that two different residual parameters appear in the above discussion, which should not be confused: sp (eq 5) corresponds to the difference between the test sample signal and 6952
Analytical Chemistry, Vol. 79, No. 18, September 15, 2007
that model by N-PLS before the RTL procedure, while su (eq 7) arises from the difference after the RTL modeling of the interferent effects. Hence, it is the latter one that should be comparable to the instrumental noise level if RTL is successful. Finally, we note that N-PLS/RTL can easily been extended to even higher data orders, giving rise to N-PLS coupled to residual multilinearization, or N-PLS/RML, which will certainly be more and more applied in the near future, as data of higher complexity are produced by modern instruments. Software. All routines employed to carry out the calculations described in this paper were written in MATLAB.46 Those for applying PARAFAC and N-PLS (without the second-order advantage provided by RTL) are available in the Internet thanks to Bro.42 N-PLS/RTL, U-PLS/RTL, and TLLS/RTL are available from the authors on request, including a useful graphical user interface for data input and parameter setting, of the type already described for first-order multivariate calibration.47 EXPERIMENTAL SECTION Apparatus. Fluorescence spectral measurements were performed on a fast-scanning Varian Cary Eclipse fluorescence spectrophotometer, equipped with two Czerny-Turner monochromators and a xenon flash lamp, and connected to a PC microcomputer via an IEEE 488 (GPIB) serial interface. Excitation-emission-time data arrays were recorded in a 10 mm quartz cell, in the following ranges: excitation, 240-300 nm each 6 nm; emission, 310-390 nm each 4 nm; and time, 0-18 min each 2 min. Thus, a third-order data array for a given sample was of size 11 × 21 × 10, consisting of a total of 2310 data points. The rapidscanning instrument allows the acquisition of a complete excitation-emission matrix in 0.2 min at a wavelength scanning speed of 24 000 nm/min, considerably smaller than the time between successive measurements. This is important, as it should be noticed that, if reaction significantly evolves during spectral acquisition, the sample data would not be strictly quadrilinear. The cell was thermostated at 40 °C. Reagents. All experiments were performed with analytical reagent grade chemicals: p-aminobenzoic acid and trichloroacetic acid from Sigma-Aldrich (Steinheim, Germany); and sodium hydroxyde from Merck (Darmstadt, Germany). Procaine was obtained from Laboratorio Scott Cassara (Rosario, Argentina), as an aqueous solution containing 452 mg mL-1, as determined by a titrimetric method described in the Pharmacopoeia.48 Equine sera were obtained from the laboratory of the Rosario Racetrack. Calibration, Validation, and Spiked Serum Samples. A calibration set was constructed composed of 11 samples, using a central composite design with the central point replicated three times. The concentrations were in the ranges 0-900 ng mL-1 for procaine and 0-600 ng mL-1 for p-aminobenzoic acid. The analyte concentrations were selected by considering the linear fluorescence concentration range, the drug therapeutic levels, and the molecular weight relation, since procaine is converted into p-aminobenzoic acid by alkaline hydrolysis. For preparing a given calibration sample, the analytes were mixed in the measuring cell, (46) MATLAB 6.0, The MathWorks Inc., Natick, MA 2000. (47) Olivieri, A. C.; Goicoechea, H. C.; In ˜o´n, F. A. Chemom. Intell. Lab. Syst. 2004, 73, 189-197. (48) British Pharmacopoeia; Her Majesty’s Stationary Office: London, 2003, pp 7, 94, and 258.
Table 1. Validation Results Using the Different Algorithmsa predicted/ng mL-1 nominal/ng
mL-1
218 607 587 251 657 390 700 668 263 RMSE/ng mL-1 REP% 456 307 484 233 290 167 383 394 170 RMSE/ng mL-1 REP%
PARAFAC Procaine 221 602 583 284 683 414 722 672 273 18 4.0
N-PLS
TLLS
U-PLS
223 603 595 275 684 401 724 674 259
221 607 598 275 687 402 728 678 258
225 603 595 275 681 401 722 672 260
15 3.3
p-Aminobenzoic Acid 473 469 299 307 475 489 243 233 287 300 170 165 367 385 392 410 185 173 11 3.7
8 2.7
17 3.8 475 304 487 237 295 166 380 406 177 8 2.7
14 3.1 469 307 489 233 300 165 386 410 173 8 2.7
a RMSE = root mean square error; REP% = relative error of prediction.
by taking appropriate volumes of PRO and PAB solutions and diluting with 0.1 M NaOH to the desired concentration. The excitation-emission-time fluorescence arrays of these solutions were then recorded, and the data were subjected to four-way analysis. A validation set was also prepared, composed of nine samples, in the same form as those for calibration, but using a random design, i.e., selecting the target concentrations of both analytes at random from each calibration range (see Table 1 for details of the composition of these samples). Spiked equine serum samples were prepared as follows. Typically, 0.50 mL of a horse serum was spiked with procaine and p-aminobenzoic acid, treated with 1.50 mL of 2% (v/v) trichloroacetic acid and centrifuged at 3000 rpm during 5 min for deproteinization. Finally, 0.40 mL of the supernatant was diluted to a final volume of 2.00 mL with 0.1 M NaOH, and thus, the overall degree of dilution of the equine serum was 1:20. Several deproteinization agents were tried, but tricholoroacetic acid seemed to provide better results in terms of absence of inner filter effects and level of analyte recovery from protein binding. As will be clear below, a first set of equine sera containing five duplicate concentration levels of procaine was employed to estimate the degree of recovery for procaine after deproteinization. This set (T1) did not contain p-aminobenzoic acid. The second set of sera, (set T2) spiked with both analytes, was prepared for specific prediction purposes, and contained five duplicate samples, with final concentrations of procaine and p-aminobenzoic acid selected at random from the corresponding calibration ranges. In both of these sets, each sample contained a different equine serum, with different concentrations of the analyte or analytes of interest.
Figure 1. (A) Contour plot of the excitation-emission matrix of a solution of PRO (900 ng mL-1) at pH 5.0 to avoid hydrolysis. The intensities have been multiplied by a factor of 10. The color bar on the right shows the fluorescence intensities. (B) Contour plot of the excitation-emission matrix for a 500 ng mL-1 solution of PAB at pH 13. (C) Contour plot of the excitation-emission matrix for a typical equine serum, after deproteinization and 1:20 dilution at pH 13. Notice the higher fluorescence intensities of p-aminobenzoic acid and serum in comparison with procaine.
RESULTS AND DISCUSSION Kinetic and Spectral Characteristics. Procaine is weakly fluorescent (compare the excitation-emission matrices contours of the analytes procaine and p-aminobenzoic acid in Figure 1A and B), but its hydrolysis in basic medium generates p-aminobenzoic acid, which is a strongly fluorescent compound (Figure 2). In the working wavelength regions, the signal of the latter is also overlapped with that for a typical horse serum sample (Figure 1C), making necessary the use of advanced multiway modeling techniques for quantifying the analytes in the presence of the serum background. Analytical Chemistry, Vol. 79, No. 18, September 15, 2007
6953
Figure 2. Contour plots of the excitation-emission matrices as a function of time (as indicated) for a solution of procaine (700 ng mL-1). The color bar on the right shows the fluorescence intensities.
The experimental parameters for the hydrolysis reaction were studied in order to obtain appropriate kinetics and fluorescence properties of the product originated. Increasing the pH and the temperature led to increasing reaction rates and to decreasing fluorescence intensity for p-aminobenzoic acid. Several pH values were tested in the range between 9 and 13, and temperatures in the range from 13 to 46 °C. This study suggested that pH 13 and a temperature of 40 °C were a sensible choice, mainly because the reaction rate was compatible with the time required to record a single excitation-emission matrix. At this latter temperature, the fluorescence intensity of p-aminobenzoic acid was readily measured. Validation Data. The first step in processing the data sets with the different third-order algorithms is the assessment of the correct number of sample constituents. With PARAFAC, the standard procedure involves the so-called core consistency diagnostic test (CORCONDIA),44 which involves observing the changes in the core consistency parameter as the number of trial fluorescent components is increased. The number of components is taken as the largest number for which the latter parameter is larger than ∼50. However, in the presently studied system, a high 6954 Analytical Chemistry, Vol. 79, No. 18, September 15, 2007
correlation is observed between the concentration of procaine and that for the p-aminobenzoic acid which forms during hydrolysis of the former. In this case, CORCONDIA analysis could not be applied. One alternative is to judge the correct number of components mainly on the basis of the knowledge of the chemical composition of samples and kinetic behavior of analytes.49,50 Another technique selects N by considering the residuals of the PARAFAC least-squares fit for models including an increasing trial number of components.14 The appropriate value of N corresponds to the residuals stabilizing at a value compatible with the instrumental noise. This latter value, estimated through several blank replicates, was on the order of 5.0 arbitrary fluorescence units. Satisfactory PARAFAC fit was achieved by initializing the algorithm with loadings in the spectral and time dimensions carrying spectral and kinetic information recorded for pure p-aminobenzoic acid. However, this procedure lacks the generality (49) Espinosa-Mansilla, A.; Mun ˜oz de la Pen ˜a, A.; Goicoechea, H. C.; Olivieri, A. C. Appl. Spectrosc. 2004, 58, 83-90. (50) Marsili, N. R.; Lista, A.; Fernandez Band, B. S.; Goicoechea, H. C.; Olivieri, A. C. J. Agric. Food. Chem. 2004, 52, 2479-2484.
required for checking different trial values of N. A good alternative was to run several PARAFAC models started with singular value decomposition vectors and random orthogonal loadings using a few iterations and selecting the best-fitting model. During PARAFAC least-squares minimization, all four modes were constrained to be non-negative, a common practice when facing linear dependency problems.49,50 When applying this procedure to a typical validation sample, the standard deviation of the PARAFAC residuals were, in arbitrary fluorescence units, 15.2, 5.5, and 5.0 for N ) 1, 2, and 3, respectively. Similar results were obtained for all validation samples, suggesting that two components is a sensible choice in all cases. The need of two PARAFAC components can be explained on the following basis. Assuming that only p-aminobenzoic acid is fluorescent, and that hydrolysis of procaine originates p-aminobenzoic acid, the excitation-emission matrix for a given sample will vary as a function of time as
Xi(t) ) SPAB,i cPAB + SPAB,i cPRO f (t)
(8)
where Xi(t) is the excitation-emission matrix data matrix at time t for sample i, SPAB,i is the excitation-emission data matrix for pure p-aminobenzoic acid at unit concentration, cPAB and cPRO are the nominal concentrations of p-aminobenzoic acid and procaine in this particular sample, and f (t) is a function of time describing the kinetics of the conversion of procaine into p-aminobenzoic acid. Since SPAB,i is the same in both terms of the right-hand side of eq 8, we may write
Xi(t) ) SPAB,i [cPAB + cPRO f (t)]
(9)
If the factor [cPAB + cPRO f (t)] were independent of the nominal concentrations cPAB and cPRO, due to a closure relationship present in the system, then one may consider that a single PARAFAC component is enough to describe the time evolution of the data. However, in our case, procaine and p-aminobenzoic acid are independent analytes, and hence, we have employed designed calibration data, where cPAB and cPRO are different for each sample. Consequently, there is no single function of time that can describe this evolution, and hence two PARAFAC components are required. This implies the need of a B loading matrix having two columns, both equal to the (normalized) emission spectrum of p-aminobenzoic acid. Likewise, the C matrix will contain two columns with the excitation p-aminobenzoic acid spectra. The loading matrix D, in turn, would be given by
D ) [N1 cPAB 1 | N2 cPRO f (t)]
(10)
where 1 is a K × 1 vector having all elements equal to 1, t is a vector having the K time values, f (t) is the function of time of eq 9, and N1 and N2 are normalization constants. However, a D matrix composed of linear combinations of the two columns of eq 10 will also describe the time variation of the instrumental data. This means that the PARAFAC least-squares fit may give linear combinations of the expected ideal kinetic profiles, which may depend on the following: (1) the starting values for the leastsquares PARAFAC minimization and (2) possible perturbations
Figure 3. Spectral profiles recovered by PARAFAC after processing a typical validation sample, containing procaine (218 ng mL-1) and p-aminobenzoic acid (456 ng mL-1): solid line, first PARAFAC component; dashed line, second PARAFAC component. (A) Spectral emission profiles. (B) Spectral excitation profiles. (C) Time profiles. All profiles are normalized to unit length.
produced by the small fluorescence signal of procaine (Figure 1), which is not only very low but does also decrease significantly with time. Analysis of a typical validation sample rendered the profiles in the three dimensions which are shown in Figure 3. The emission and excitation profiles for both calibrated components were almost identical and match the expected spectral properties of p-aminobenzoic acid, but the time profiles differed. The retrieved kinetic profiles are composed of the following: (1) a time-decreasing profile for one of the spectrally active components and (2) a time-increasing profile for the other component. These profiles are not the expected ones on the basis of the monitored phenomenon, i.e., a constant signal due to the p-aminobenzoic acid originally present in the solution, and a growing signal due to procaine being hydrolyzed to p-aminobenzoic acid. These unexpected profiles suggest the presence of linear combinations in the time mode, which are expressed into corresponding correlations in the PARAFAC A or score matrix (the decomposiAnalytical Chemistry, Vol. 79, No. 18, September 15, 2007
6955
Figure 4. PARAFAC A scores as a function of nominal analyte concentrations in the calibration set of samples, after processing a typical validation sample. (A) First column of matrix A as a function of p-aminobenzoic acid calibration concentrations. (B) Second column of A as a function of p-aminobenzoic acid . (C) First column of A as a function of procaine. (D) Second column of A as a function of procaine .
tion of the four-way data renders the profiles in the three dimensions, which are usually collected into the B, C, and D matrices). These correlations translate into pseudounivariate calibration graphs that do not present the familiar properties with respect to nominal concentrations of either analyte. Figure 4 clearly shows that none of the A scores are linearly related to procaine, although one of them provides a distorted central composite design (typical of the presence of correlations between analytes), whereas the correlation with p-aminobenzoic acid calibration concentrations, although seemingly linear, indicates the presence of a significant intercept. One alternative to cope with the above problem is to correlate all columns of A with concentrations for each analyte using multilinear regression (MLR) analysis instead of classical pseudounivariate methodology. This approach has been previously applied in the modeling of a kinetics using four-way fluorescence data.5 When PARAFAC results were processed in this latter way, analyte concentrations could be reasonably predicted in the validation set of samples. The statistical summary of this analysis is presented in Table 1. As can be seen, good figures of merit are obtained, in terms of root-mean-square error (RMSE) and relative error of prediction (REP%), computed with respect to the mean calibration concentration of each analyte. It should be noticed that these results came at the expense of a somewhat complicated program operation, in terms of initialization, fitting restrictions and postdecomposition calibration procedures. On the other hand, when using N-PLS (and also U-PLS), a valid procedure for establishing the number of latent variables A is leave-one-out cross-validation, according to the Haaland and Thomas criterion, as described in the literature.45 Within the TLLS methodology, the number of components is established as the number of calibrated analytes (two in the present case). None of these approaches required the use of the separate RTL procedure for analyzing the validation samples, since they contain no 6956 Analytical Chemistry, Vol. 79, No. 18, September 15, 2007
unexpected interferents. In this sense, the application of N-PLS to the presently studied samples was considerably simpler than PARAFAC, requiring no special initialization conditions or restrictions. The N-PLS results are collected in Table 1 and are similar to those obtained with its unfolded companions U-PLS and TLLS, but better than those from PARAFAC as regards the content of procaine. The better predictive ability of N-PLS can be related to its more flexible internal structure in comparison with PARAFAC, which makes the former methodology better fitted for processing four-way information, which is not strictly quadrilinear. It is interesting to note that the presently applied method is based on a kinetic phenomenon, and therefore, the precise timing of the instrumental measurements may be crucial for the success of a given algorithm. The following causes may lead to multilinearity losses: (1) the time needed to record the complete excitationemission matrix of a sample in comparison with the reaction progress and (2) the fact that the analytes are the reagent and product of the hydrolysis reaction. These two sources of multilinearity losses may seriously affect the less robust algorithm PARAFAC as compared with a latent structured methodology such as N-PLS/RTL. Spiked Serum Samples. In the case of the set of spiked serum samples, application of PARAFAC followed guidelines similar to those commented on above in connection with the study of the validation samples, in the sense of the need of setting initialization conditions (the best-fitting model of several models run with a few iterations proved to be a useful procedure) and also of the requirement of non-negative constraints during the least-squares fit. The assessment of the number of components could not be done using core consistency diagnostics, and was made by increasing a trial number of components, and inspecting the PARAFAC least-squares residues until they matched the instrumental noise level. Using this approach, a typical spiked serum samples rendered the following values for the standard deviation of PARAFAC residuals: 18.5, 14.5, 5.2, and 4.9, in arbitrary fluorescence units, for N ) 1, 2, 3, and 4, respectively. Similar results were obtained for all spiked samples. These values, together with the instrumental noise level of ∼5.0 arbitrary fluorescence units, suggest that three responsive components are a sensible selection, with the additional component with respect to the validation samples being ascribed to the serum background. It is noteworthy that PARAFAC decomposition retrieved correctly the spectral (both excitation and emission) and time profiles for serum, which are shown in Figure 5. Indeed, Panels A and B in Figure 5 favorably compare the recovered spectral profiles with those recorded for a typical unspiked blank serum sample. Figure 5C, in turn, compares the time evolution with that expected, i.e., a constant value at all times (normalized to unit length). Though the agreement is reasonable, it is however noticed that the PARAFAC time profile suffers an initial decrease until it stabilizes after the first minutes. We attribute this observation, which was repeated consistently for all samples, to temperature instabilities upon placement of the measuring cell in the spectrofluorometer, rather than to any reaction of serum under the applied conditions. Since the fluorescence intensity usually decreases with increasing temperature, the initial decrease in Figure 5C for the serum background is ascribed to changes upon equilibrating the tem-
Table 2. Spiked Sera Results Using the Different Algorithms on Set T2a predictedb/ng mL-1 nominal/ng
mL-1
0 0 540 747 567 RMSE/ng mL-1 REP% 300 0 110 250 400 RMSE/ng mL-1 REP%
PARAFAC
N-PLS
TLLS
U-PLS
-35 -88 529 759 494
-20 20 580 1003 700
55 12
54 12
109 24
p-Aminobenzoic Acid 320 297 70 54 170 147 310 317 445 416
320 70 150 325 445
297 55 149 310 405
54 18
40 13
Procaine 57 38 0 -38 651 614 919 834 612 580 97 22
54 18
42 14
a RMSE = root mean square error; REP% = relative error of prediction. Predictions are averages of duplicate samples. b Predicted values were corrected using the correction factor for deproteinization (see text).
Figure 5. (A) Emission profiles recovered for the component not present in the calibration set after processing a typical spiked equine serum sample. In the case of PARAFAC (solid line), it is one of the three components used to model the data, which is only present in the spiked test serum sample. In the case of N-PLS/RTL (dashed line), it is the corresponding Tucker3 loading in the emission dimension, as obtained through the RTL procedure. The dasheddotted line shows the spectrum for a typical unspiked serum sample The emission spectrum of the serum has been registered at a constant excitation wavelength of 280 nm. (B) Spectral excitation profiles, similar to (A). The excitation spectrum of the serum has been registered at a constant emission wavelength of 360 nm. (C) Time profiles, similar to (A). In the time profiles, solid and dashed lines are almost coincident. The dashed-dotted line is a constant at all times. All profiles have been normalized to unit length.
perature to 40 °C, which was selected for obtaining a sensible reaction rate for procaine hydrolysis. As was the case with the validation data, prediction of analyte concentrations in spiked sera required MLR calibrated with the relevant A-scores (i.e., the columns corresponding to the analytes) against nominal calibration concentrations, because of obtaining linear combination of analyte scores. Specific prediction results showed that there was a consistent degree of loss of procaine during the deproteinization procedure. A series of spiked sera with several different concentrations of procaine and p-aminobenzoic
acid showed a good correlation of predicted versus nominal values, but in the case of procaine, the linear correlation exhibited a slope that was smaller than the ideal value of 1. In particular, linear regression of predicted versus nominal procaine concentrations in the set T1 of serum samples (see Experimental Section) rendered the following results: slope ) 0.47(2) (standard deviation in the last significant figure in parenthesis) with an r2 ) 0.9819. The value of the slope was subsequently employed to correct for the deproteinization loss of procaine when predicting its concentration in the independent set T2. Interestingly, the degree of binding or procaine to serum proteins has been estimated to be ∼45%,51 and thus, we attribute the lower value of the slope in the case of procaine to this binding phenomenon. To demonstrate how consistent the results are on analyte recovery after serum deproteinization, a set of sera spiked with both analytes (T2) was studied, and the predicted procaine concentrations by PARAFAC were adjusted according to the slope value. The results on the set T2 are shown in Table 2. It is apparent that the RMSE and REP% values are reasonable in view of the complexity of the presently studied biological samples. It may be noticed that even when the test sample composition is more complex than the validation samples, a successful PARAFAC model could be built, based on the one employed to model the calibration data, expanded by adding an interfering component. With a similar starting strategy, non-negativity restrictions and MLR calibration, PARAFAC achieves prediction results that are comparable to those for the validation samples. When applying N-PLS to these spiked sera, it was necessary to assess the number of unexpected components (Nunx) to be employed in the RTL procedure. This can be done by analyzing the residuals su as a function of a trial number of unexpected components, as has already been described. A typical result for one of the analyzed samples was as follows: su ) 45 and 6.2 for (51) Tobin, T.; Blake, J. W. Br. J. Sport Med. 1976, 10, 109-116.
Analytical Chemistry, Vol. 79, No. 18, September 15, 2007
6957
Nunx ) 0 and 1, respectively. For Nunx ) 2, all possibilities having a different number of loadings in each dimension were tried in order to construct the Tucker3 model for the potential interferents, giving su ) 4.8 [2 2 2], 6.1 [1 1 2], 6.1 [1 2 1], 6.2 [2 1 1], 4.8 [2 2 1], 6.0 [2 1 2], and 6.1 [1 2 2] (the values are in arbitrary fluorescence units, with the number of loadings used in the emission, excitation, and time dimension, respectively, between brackets). Since blank replication allowed us to estimate an instrumental noise level of ∼5.0 arbitrary fluorescence units, for this particular sample Nunx ) 1 is a reasonable choice. This is because the value of (su)2, measuring the residual variance, is statistically indistinguishable from the squared noise level (25 arbitrary fluorescence units2) using an appropriate Fischer F test for the comparison of variances. Similar results were obtained for the remaining samples. The choice of Nunx ) 1 can also be confirmed to be reasonable because, using this value, the predicted analyte concentrations did not significantly change with respect to the use of higher number of unexpected components. Application of the N-PLS/RTL methodology to this set led to the retrieval of unexpected serum spectral and time profiles, which can be compared with those expected for background serum. Panels A-C in Figure 5 show the profiles obtained by the RTL procedure, i.e., the three Tucker3 loadings, which are employed to model the effect of the background serum. Figure 5 shows that the ability of RTL in recovering profiles is of a quality comparable to that of PARAFAC, implying that the former approach is valid for achieving the second-order advantage. Moreover, the subtle effects discussed above concerning the slight decrease of fluorescence intensity for serum in the first reaction minutes are also observable in the RTL time profile of Figure 5C. Concerning the specific prediction results, better figures of merit were obtained with N-PLS/RTL in comparison with those furnished by PARAFAC. First, the set T1 of serum samples spiked only with procaine was studied, in order to assess the level of recovery after deproteinization. This study yielded, for N-PLS/ RTL, slope ) 0.45(2) (r2 ) 0.9930), i.e., similar to PARAFAC (see above). Subsequent prediction on the second independent set T2 of spiked equine sera gave the results shown in Table 2. As can be seen, the results are better than those for PARAFAC, again pointing to an improved predicting ability of the latent structured methodology N-PLS/RTL. Application of the unfolded methodologies TLLS/RTL and U-PLS/RTL gave results that are similar to those obtained with N-PLS/RTL. Recovery of procaine on the set T1 gave the following
6958
Analytical Chemistry, Vol. 79, No. 18, September 15, 2007
results: U-PLS/RTL, slope ) 0.45(2) (r2 ) 0.9890); and TLLS/ RTL, slope ) 0.51(2) (r2 ) 0.9874). Prediction on set T2, in turn, provided comparable results to N-PLS/RTL, except in the case of U-PLS/RTL for procaine (Table 2), which are comparable to PARAFAC. CONCLUSIONS The use of kinetics in combination with excitation-emission fluorescence matrices measured with a fast-scanning fluorometric detector allows the recording of four-way data sets. Several unfolded algorithms are available for processing four-way information, although they do not take full advantage of the higher-order data structure. In comparison, multiway partial least-squares is a genuine higher-order latent variable method, which, once combined with the successful residual trilinearization technique, achieves the second-order advantage needed to successfully quantify analytes in complex biological samples. The latter combination of techniques, which is proposed in this report, i.e., N-PLS/RTL, is currently the only competitor available to parallel factor analysis in the realm of truly higher-order analysis achieving the important second-order advantage. It certainly deserves a place among the few available multiway chemometric methods for advanced modeling studies and is demonstrated to perform better than PARAFAC in the presently studied samples. This is because of the more flexible internal structure of N-PLS/RTL in comparison with PARAFAC, which makes the former a more appropriate approach for processing four-way data, which are not strictly quadrilinear. ACKNOWLEDGMENT Universidad Nacional de Rosario, CONICET (Consejo Nacional de Investigaciones Cientı´ficas y Te´cnicas, Project No. PIP 5303), ANPCyT (Agencia Nacional de Promocio´n Cientı´fica y Tecnolo´gica, Project No. PICT-25825), Ministerio de Educacio´n y Ciencia of Spain (Project CTQ2005-02389), and AECI (Programa Intercampus Iberoame´rica/PCI Project A/6576/06) are gratefully acknowledged for financial support. A.J.-G. thanks Ministerio de Eudacio´n y Ciencia of Spain and A.G.-R. thanks CONICET for their fellowships.
Received for review March 26, 2007. Accepted June 15, 2007. AC070596+