ARTICLE pubs.acs.org/ac
New Developments for the Sensitivity Estimation in Four-Way Calibration with the Quadrilinear Parallel Factor Model Alejandro C. Olivieri*,† and Klaas Faber‡ †
Departamento de Química Analítica, Facultad de Ciencias Bioquímicas y Farmaceuticas, Universidad Nacional de Rosario, Instituto de Química de Rosario (IQUIR-CONICET), Suipacha 531, Rosario S2002LRK, Argentina ‡ Chemometry Consultancy, Goudenregenstraat 6, 6573 XN Beek-Ubbergen, The Netherlands
bS Supporting Information ABSTRACT: Appropriate closed-form expressions are known for estimating analyte sensitivities when calibrating with one-, two-, and three-way data (vectors, matrices, and three-dimensional arrays, respectively, built with data for a group of samples). In this report, sensitivities are estimated for calibration with four-way data using the quadrilinear parallel factor (PARAFAC) model, making it possible to assess important figures of merit for method comparison or optimization. The strategy is based on the computation of the uncertainty in the fitted PARAFAC parameters through the Jacobian matrix. Extensive Monte Carlo noise addition simulations in four-way data systems having widely different overlapping situations are helpful in supporting the present approach, which was also applied to two experimental analytical systems. With this proposal, the estimation of the PARAFAC sensitivity for calibration scenarios involving three- and four-way data may be considered complete.
he estimation of analytical figures of merit for multivariate calibration models has been the object of important theoretical and experimental efforts in the past years.15 Specifically, sensitivity, selectivity, and limits of detection and quantitation have been calculated and employed for the comparison of the analytical performance of different methods, for the optimization of analytical methodologies, or for assessing detection capabilities.6,7 Table 1 summarizes the various types of data which can be recorded, the resulting number of data ways for a set of samples, and the current knowledge on sensitivity. In univariate calibration using linear regression, the sensitivity is numerically equal to the slope of the calibration graph.8 In multivariate calibration with two-way data, usually performed by partial least-squares (PLS) regression analysis, the sensitivity is the inverse of the length of the vector of the model regression coefficients.9 When calibrating with three-way data, a closed-form sensitivity expression is firmly established for data processing using the trilinear parallel factor (PARAFAC) model,10 and is defined in terms of the profiles of the various sample constituents in the different data modes.11 The three-way PARAFAC sensitivity depends on the calibration scenario, i.e., on the specific number of calibrated analytes and unexpected interferents. It is apparent in Table 1 that no general expression for estimating sensitivities is available in the case of PARAFAC calibration of four-way data. The situation is particularly disturbing when analytes are determined in the presence of unexpected constituents in complex samples, because this is a field of intense experimental activity.12 It is likely that in the near future more four-way data will be produced, as the analytical instrumentation continues to progress toward more complex arrangements, and
T
r 2011 American Chemical Society
computer data processing by multiway algorithms is popularized. It is imperative, thus, to be able to estimate appropriate figures of merit for the processing of these complex data structures. In this report, four-way PARAFAC sensitivities are shown to be accounted for by a new approach, based on the computation of the Jacobian matrix associated to the fitted parameters. For a variety of binary, ternary, and quaternary four-way analytical systems, all having unexpected sample components, the proposal is supported by Monte Carlo simulations concerning the uncertainty in predicted analyte concentrations. Although other methods have been employed to assess prediction uncertainties, such as resampling, jack-knife, etc.,13 the present strategy provides a clear link between concentration uncertainties and analyte sensitivities, as has been discussed previously.4,5 The new approach also allows comparison of the four-way PARAFAC sensitivity with those achieved after unfolding the data to three-way arrays and processing them with three-way PARAFAC, and supports previous results concerning the higher efficiency of maintaining the multiway structure for data decomposition.14 The new approach has been applied to two experimental fourway data sets analyzed with PARAFAC, providing reasonable figures of merit. The results may also have important implications as to the analytical advantages which can be expected in processing multiway instrumental information.7 It may be noticed that the present approach is applicable to quadrilinear four-way data amenable to PARAFAC processing, Received: September 3, 2011 Accepted: November 16, 2011 Published: November 16, 2011 186
dx.doi.org/10.1021/ac202268k | Anal. Chem. 2012, 84, 186–193
Analytical Chemistry
ARTICLE
Table 1. Current Knowledge on Sensitivity for Various Data Structures data waysa
data structure for each sample
available expression
description
1
scalar
intuitive definition
slope of the univariate calibration graph
2
vector
established generalization of univariate definition
inverse of the length of the vector of regression coefficients
3
matrix
consistent generalizations to three-way
general PARAFAC sensitivity equation, defined in terms of the various component profiles11
PARAFAC calibration 4 a
three-dimensional array
various conjectures
unknown when unexpected constituents occur in the test samples
The number of ways refers to the mathematical object which can in principle be built with data for a set of samples.
It is natural to expect a sensitivity (SENMC) given by the ratio of the standard deviations σin and σout, because a large input for producing a given output implies a larger sensitivity, i.e.
in a similar manner to previous approaches for trilinear three-way PARAFAC sensitivity.11 It would be also applicable to other quadrilinear decomposition (QLD) algorithms, such as alternating penalty QLD (APQLD)15 and alternating weighted residual constraint QLD (AWRCQLD).16 When data are not multilinear, such as when component profiles change from sample to sample (chromatographic data with retention time shifts, kinetic/pH data with misfits in temperature of pH between samples, etc.), other multivariate techniques are necessary for data processing.17 Further work will be required in the near future to assess the sensitivity in these cases.
SENMC ¼ ðσ in =σout Þ ¼ ðVIFMC Þ1=2
In the present context, σin and σout are the standard errors in the test sample signal (sx) and in the predicted concentration (sy), respectively:12 SENMC ¼ sx =sy
Terminology and Calibration Scenarios. Various types of data structures can be recorded, as shown in Table 1. One possibility to describe them is to employ the expressions “one-way”, “two-ways”, etc., referring to the dimensions of an array composed of data recorded for a group of samples. An alternative is to use the concept of “order”, focusing on the dimensions of the data array collected for a single sample. This is linked to the popular expression “second-order advantage”, which is the property of quantitating analytes even when unexpected constituents occur in test samples.18 Thus, one-way (univariate) calibration is equivalent to zeroth-order calibration, and the same is true for the pairs two-way/first-order calibration, three-way/second-order calibration, four-way/third-order calibration, etc.19 Different calibration models are available to process these data structures. Univariate calibration (either linear or nonlinear) is employed for one-way data, while two-way calibration (e.g., PLS) applies to two-way data. For processing three- and four-way data, it is important to know whether they are tri- or quadrilinear, respectively.17 For multilinear data, PARAFAC and some of its variants can be confidently applied.17 When trilinearity is lost due to profile changes along one of the data modes, but the matrix signal for a given sample is still bilinear, techniques such as multivariate curve resolution-alternating least-squares (MCR-ALS)20 or PARAFAC221 can be used. For additional details on the available methods for coping with multilinearity losses, see a recent review.17 Monte Carlo Sensitivity Estimates. In the Monte Carlo approach to sensitivity estimation, the propagation of signal uncertainty to concentration uncertainty is studied. It is important to define the so-called variance inflation factor (VIFMC, MC stands for Monte Carlo), which relates the output variance of the system (Vout, corresponding to the uncertainty in predicted concentration) to the input variance (Vin, represented by the uncertainty in the instrumental signal):
Vout Vin
ð3Þ
Equation 3 allows one to assess the analyte sensitivity by statistical analysis of the concentrations predicted by a given model when noise is added to signals. Three-Way PARAFAC Sensitivity Estimates. In three-way calibration using the trilinear PARAFAC model, matrix data are measured for a set of samples, and the training matrices are joined with the unknown sample matrix into a three-way data array X. The latter can be written in terms of three vectors for each responsive component, each designated as an, bn, and cn, and collecting the relative concentrations for constituent n and its profiles in both data modes. The specific expression for a given element of X is
’ THEORY
VIFMC ¼
ð2Þ
xijk ≈
N
∑ ain bjn ckn n¼1
ð4Þ
where N is the total number of responsive components, ain is the relative concentration of component n in the ith sample, and bjn and ckn are the intensities at channels j and k, respectively. The values of ain are collected into the score matrix A, while bjn and ckn are stored in the loading matrices B and C (the columns of these latter two matrices are normalized to unit length). In eq 4 and throughout, the symbol ≈ implies that the model is not exact and carries an error; the various noise contributions are dropped to simplify the presentation. The variance in analyte concentrations predicted with the PARAFAC model can be estimated using the following approximate expression11 V ð^yÞ ≈ SEN2 n ð1 þ hn ÞVx þ hn Vy
ð5Þ
where ^y is the estimated concentration, SENn is the analytespecific sensitivity, Vx and Vy are the variances in instrumental signals and calibration concentrations, respectively, and hn is the analyte-specific sample leverage, a dimensionless parameter which places the sample relative to the calibration space.22 Equation 5 shows that, for low leverage samples, the main ingredient affecting the prediction error is the sensitivity. For estimating the latter in various calibration situations, the most general equation is the following (the subscript FO3 is after Faber and Olivieri,11 with 3 standing for three-way, and the subscript n dropped for simplicity)
ð1Þ
SENFO3 ¼ sn f½ðBcal T PB, unx Bcal ÞðCcal T PC, unx Ccal Þ1 gnn 1=2 187
ð6Þ
dx.doi.org/10.1021/ac202268k |Anal. Chem. 2012, 84, 186–193
Analytical Chemistry
ARTICLE
Table 2. Summary of Three- and Four-Way PARAFAC Sensitivity Expressions Relevant to the Present Work expressiona
data ways
a
description
SENFO3 = sn {[(BcalTPB,unxBcal) * (CcalTPC,unxCcal)]1}nn1/2
general case
3
SENMKL3 = sn {[(BTB) * (CTC)]1}nn1/2
special case of FO3 when no interferents occur
3
SENHCD3 = sn {[(BTB) 1]nn [(CTC)1]nn }1/2
special case of FO3 for a single calibrated analyte + interferents
4
SENFO4 = sn {[(BcalTPB,unxBcal) * (CcalTPC,unxCcal) * (DcalTPD,unxDcal)]1}nn1/2
extension of FO3 to four-way
4
SENMKL4 = sn {[(BTB) * (CTC)]1 * (DTD)]1}nn1/2
extension of MKL3 to four-way and special case of FO4
4
SENHCD4 = sn {[(BTB) 1]nn [(CTC)1]nn [(DTD)1]nn }1/2
extension of HCD3 to four-way and special case of FO4
4
SENFO3BC = sn [{[(Bcal.Ccal)TPBC,unx (Bcal.Ccal)] * (DcalTPD,unxDcal)]}nn1]1/2
FO3 expression when unfolding along B and C modes
4 4
SENFO3BD = sn [{[(Bcal.Dcal)TPBD,unx (Bcal.Dcal)] * (CcalTPC,unxCcal)]}nn1]1/2 SENFO3CD = sn [{[(Ccal.Dcal)TPCD,unx (Ccal.Dcal)] * (BcalTPB,unxBcal)]}nn1]1/2
FO3 expression when unfolding along B and D modes FO3 expression when unfolding along C and D modes
4
SENJ4 = sn nth row of (PZunxZcal)+
1
)
)
3
general Jacobian-based expression
See text for symbol meanings.
where sn is the pure analyte signal at unit concentration, Bcal and Ccal collect the loading matrices for the calibrated analytes, * is the element-wise or Hadamard matrix product, the subscript nn is used to denote the nth. diagonal element of a matrix, and PB,unx and PC,unx are projection matrices given by PB, unx ¼ I Bunx Bunx þ
ð7Þ
PC, unx ¼ I Cunx Cunx þ
ð8Þ
where PD,unx is analogous to PB,unx and PC,unx in eqs 7 and 8, extended one further dimension. Equation 10 reduces to either the MKL4 or HCD4 sensitivities under the same circumstances detailed above in connection with three-way PARAFAC analysis (Table 2). The MKL4 expression has been previously confirmed to be correct in the absence of unexpected interferents by Monte Carlo simulations,12 and thus, it will not be further explored. The HCD4 expression did not account for simulated results,12 and the present report will show that the (previously untested) eq 10 is also unsuitable in this regard. This suggests that the intuitive extension of the three-way PARAFAC calibration approach to the four-dimensional world is not as straightforward as could be anticipated. We found it appropriate to apply a methodology based on the computation of the uncertainty in the fitted PARAFAC parameters, with particular focus on the predicted analyte concentration. The procedure involves a careful analysis of the Jacobian matrix, i.e., the matrix of all partial derivatives of the unfolded test sample signal with respect to the fitted PARAFAC parameters, whose connection with the uncertainties in the modeled parameters is well-known.25 This new approach has a number of advantages: (1) it fully agrees with the known general FO3 sensitivity expression 6 in the case of three-way data, (2) it can be straightforwardly extended to data arrays of higher dimensions, and (3) it provides a rationale as to why the intuitive expression 10 is not appropriate for four-way data. The key to the new approach is the recognition of two different submatrix blocks in the Jacobian matrix J: (1) a block Zunx corresponding solely to the profiles for the unexpected constituents, and (2) a block Zcal corresponding to the profiles for the calibrated analytes (see Appendix):
where I are appropriately dimensioned identity matrices, Bunx and Cunx collect the loading matrices for the so-called unexpected sample constituents, and the superscript + stands for the generalized inverse operation. Notice that sn in eq 6 can be measured from pure analyte data, or can be estimated from mixtures with other constituents as the slope of pseudounivariate calibration graph (i.e., scores versus concentrations) which is regularly employed for PARAFAC analyte prediction.11 Two specific cases of eq 6 deserve attention: (1) when a test sample only contains calibrated analytes but no unexpected interferents, it reduces to the MKL sensitivity (after Messick, Kalivas, and Lang, see Table 2),23 and (2) for a single calibrated analyte in the presence of unexpected interferents, it reduces to the HCD sensitivity (after Ho, Christian, and Davidson, see Table 2).24 Four-Way PARAFAC Sensitivity Estimates. In this case each sample produces a three-way array, and a four-way object X is obtained by suitably joining data for a group of samples. The corresponding quadrilinear PARAFAC expression for an element of X (xijkl) is xijkl ≈
N
∑ ain bjn ckn dln n¼1
ð9Þ
J ¼ ½Zunx jZcal
After some algebra (see Appendix), the sensitivity for analyte n is shown to be given by an expression which strongly resembles the use of the net analyte signal (NAS) concept for sensitivity estimation11
1
þ SENJ4 ¼ sn nth row of ðPZunx Zcal Þ ð12Þ
where the symbols are analogous to those in eq 4, with dln describing the profile for the nth constituent in the third data mode. No general expression is known to account for the uncertainty in the predicted analyte concentration for this four-way PARAFAC model, but a first step toward such an expression would be an estimation of the analyte sensitivity. Various conjectures exist in this context: one of them implies a straightforward extrapolation of the three-way PARAFAC results. In the general case, this leads to the following FO-type four-way sensitivity expression
where PZunx is a matrix describing an orthogonal projection to the space spanned by Zunx (the subscript J4 indicates that the fourway sensitivity is obtained through the Jacobian approach), i.e.
SENFO4 ¼ sn f½ðBcal T PB, unx Bcal ÞðCcal T PC, unx Ccal Þ ðDcal T PD, unx Dcal Þ1 gnn 1=2
ð11Þ
PZunx ¼ I Zunx Zunx þ
ð10Þ 188
ð13Þ
dx.doi.org/10.1021/ac202268k |Anal. Chem. 2012, 84, 186–193
Analytical Chemistry
ARTICLE
Details on the derivation of eq 12 can be found in the Appendix, including its relationship with the intuitive extension FO4 (eq 10). The approach is reminiscent of the one employed for deriving CramerRao lower bounds for the decomposition of multidimensional arrays.14 However, it focuses on the uncertainty of the calibrated analyte scores instead of on all PARAFAC fitted parameters, and applies to cases relevant to analytical chemistry of complex systems having unknown constituents in test samples. Unfolding the Data from Four-Way to Three-Way. Fourway data may be modeled by maintaining the four-dimensional structure, or may be unfolded to three-way data in three different manners. The latter correspond to combinations of two of the three dimensions of each sample array. This produces a threeway array for calibration, as has been done for chromatographic data with excitationemission fluorescence matrix detection,26 using the generalized rank annihilation method (GRAM) for processing. Understandably, the unfolding operation was recommended to be performed in the way leading to the maximum sensitivity.26 On the other hand, Liu and Sidiropoulos found that the statistical efficiency of decomposing multiway arrays is higher (and consequently the sensitivity is larger) when the original data structure is maintained, in comparison with unfolding into arrays of lower dimensions.14 It remains to be uncovered, however, what is the relationship of the multiway sensitivity with those which can be achieved by unfolding the data into low-dimensional arrays, especially when unexpected constituents occur in the test samples. If the three-dimensional data arrays for all samples (of size J K L, where J, K, and L indicate the number of recording channels in each data mode) were unfolded producing matrices of size JK L, the three-way PARAFAC sensitivity which would be achieved can be precisely computed from the well-established, general three-way FO3 expression as
Figure 1. Simulated profiles for the various components in the first data mode for a particular overlapping situation. Component 1 (black line) is the analyte of interest, with a peak maximum centered at sensor 5. The remaining components have peak maxima located at random positions.
’ EXPERIMENTAL SECTION Monte Carlo Simulations. To perform Monte Carlo noise addition simulations, noiseless Gaussian-shaped profiles for four different constituents (1, 2, 3, and 4) at unit concentration were first defined in three different data modes. The profiles span 10 data points in each mode, and were all normalized so that the area under the profile (the total signal for each pure constituent) is 1. Simulated Systems. Four simulated systems were created, one binary (B1), two ternary (T1 and T2), and one quaternary (Q1). The constituent numbers were the following: (B1) analyte, 1, interferent, 3, (T1) analyte, 1, interferents, 3 and 4, (T2) analytes, 1 and 2, interferent, 3, and (Q1) analytes, 1 and 2, interferents, 3 and 4. In all cases, the peak maximum for the Gaussian profile of constituent 1 (the analyte of interest) was fixed at sensor 5 in all modes (see Figure 1), while the peak maxima for the remaining constituents were placed in each of the three modes in 100 different random positions. Figure 1 shows a particular situation for the four possible constituents. For a single analyte (systems B1 and T1), perfectly trilinear data for five calibration samples were created, with nominal calibration concentrations taken at random from the range 01. For two-analyte systems (T2 and Q1), 10 calibration samples were created, with random concentrations for both analytes in the range 01. For all systems and for each of the overlapping situations, a single test sample was produced, having constituent concentrations taken at random from the center of the corresponding calibration ranges (10%. PARAFAC Processing. All systems were processed by fourway PARAFAC, which was applied in the usual way: (1) data for the unknown sample are joined with calibration data to create a four-way array, (2) the latter array is decomposed, (3) the calibration analyte scores are regressed against nominal concentrations, and (4) the test analyte score is interpolated in the calibration graph to estimate its concentration. For further details see the relevant references.2830 Instrumental uncertainty (0.01% of the mean calibration signal) was added to the unknown sample data arrays, while keeping the calibration precise, as in previous studies.12 This removes the effect of the leverage terms on the propagated uncertainty, and leaves the concentration uncertainty as dependent on the sensitivity and noise level (see eq 5).
SENFO3BC ¼ sn ½f½ðBcal . Ccal ÞT PBC, unx ðBcal . Ccal Þ ðDcal T PD, unx Dcal Þgnn 1 1=2
ð14Þ
where the subscript FO3BC stands for FO-type sensitivity with unfolding along B and C modes. The symbol . is the Khatri Rao matrix product,27 also known as column-wise Kronecker product, because for matrices A and B, the ith column of A.B follows from the ith columns of A and B as aiXbi = [a1i bi | a2i bi | ... ] (the symbol X indicates the Kronecker product). In eq 14, (Bcal. Ccal) represents the matrix of calibration loadings which would be obtained from the analysis in the mixed BC mode, and PBC,unx is a projection matrix generated from the mixed Bunx and Cunx loadings: PBC, unx ¼ I ðBunx . Cunx ÞðBunx . Cunx Þþ
ð15Þ
Likewise, data could be unfolded in two additional ways. Table 2 summarizes the corresponding expressions, with the subscripts FO3BD and FO3CD and symbols being analogous to those in eqs 14 and 15 after adequate permutation of B, C, and D. The unfolded values will be compared with the four-way sensitivity J4, which is achieved by processing the data in their original structure. 189
dx.doi.org/10.1021/ac202268k |Anal. Chem. 2012, 84, 186–193
Analytical Chemistry
ARTICLE
Figure 2. Four-way expression-based PARAFAC sensitivities versus Monte Carlo values for all 100 cases of the binary system B1: (A) MKL4 (O) and FO4 (0), (B) Jacobian-based J4 from expression 12.
Figure 3. Four-way expression-based PARAFAC sensitivities versus Monte Carlo values for all 100 cases of the ternary system T1: (A) MKL4 (O) and FO4 (0), (B) Jacobian-based J4 from expression 12.
The calibration/prediction procedure was repeated 1000 times using different random seeds for the signal noise. The variance was registered in the estimated concentrations for each test sample in the 1000 Monte Carlo cycles, leading to sensitivities through eq 3. Software. All calculations were implemented with MATLAB 7.10 routines,31 available from the authors on request. The PARAFAC algorithm was obtained from the Internet page maintained by Bro at http://www.models.kvl.dk/algorithms (accessed January 2011). Experimental Systems. Two experimental systems will be discussed, previously studied using PARAFAC and other algorithms.32,33 They involve the measurement of the time evolution of excitationemission fluorescence matrices, i.e., four-way data for which the modes are sample number, excitation wavelength, emission wavelength, and reaction time. In system 1, malondialdehyde (MLD) was measured in olive oils treated with methylamine, which led the analyte to develop a strongly fluorescent product.32,33 Calibration was performed with 14 samples with the analyte in the range 0.002.10 mg L1, and 10 spiked olive oils were examined. When analyzing each test sample, the size of the four-dimensional array was 15 samples 14 excitation wavelengths 11 emission wavelengths 21 times. This system shows a nonlinear behavior of signal versus analyte concentration, requiring a polynomic pseudounivariate calibration graph for PARAFAC quantitative analysis. System 2 corresponds to the determination of two analytes, the antineoplastics leucovorin (LEU) and methotrexate (MET), in human urine samples, carried out by monitoring their oxidation with permanganate.33 Four-way data were recorded for nine calibration samples and six different test urine samples, with analytes in the concentration ranges 0.000.98 mg L1 for methotrexate and 0.000.68 mg L1 for leucovorin. The four-dimensional arrays were of the following size when studying each experimental sample: 10 samples 11 excitation wavelengths 32 emission wavelengths 10 times.
’ RESULTS AND DISCUSSION Monte Carlo Simulations. The four-way data for all simulated systems were subjected to Monte Carlo/PARAFAC calculations. In each of the analyzed overlapping cases, analyte prediction proceeded as described above, leading to the estimation of the concentration values. Equation 3 then provided the Monte Carlo sensitivity estimate. For the binary system B1, the four-way MKL4 and FO4 sensitivities are compared with the Monte Carlo MC values in Figure 2A. It is apparent that all MC values are smaller than the corresponding MKL4 values (lying in the gray triangular region in Figure 2A). This is expected, since the presence of unexpected interferents leads to a decreased sensitivity in comparison with a system with the same number of calibrated constituents.11 The MC values are also larger than expectations based on the FO4 expression (lying in the white triangular region in Figure 2A). This confirms, for a large number of systems with varying degrees of overlapping, that the plainly extrapolated FO4 expression (eq 10) seriously underestimates four-way PARAFAC sensitivities. Moreover, the difference between realistic sensitivity values and those expected from FO4 can be very large, highlighting the need for an adequate approach to sensitivity estimation in multiway calibration. Figure 2B shows a comparison of the Monte Carlo derived sensitivities with the ones estimated by the new approach (J4, eq 12), with good agreement between simulations and estimations based on the presently described methodology. Similar results to those discussed above were obtained for the simulated ternary system T1, as can be seen in Figure 3A, which shows a plot of four-way MKL4 and FO4 PARAFAC sensitivity values versus the MC sensitivities. The degree of underestimation of the true sensitivity by the FO4 value is significantly larger than for system B1, which has a single unexpected interferent. The results from the new approach to the ternary system T1 are displayed in Figure 3B: a plot of the J4 values versus the 190
dx.doi.org/10.1021/ac202268k |Anal. Chem. 2012, 84, 186–193
Analytical Chemistry
ARTICLE
Monte Carlo values shows a reasonably good correlation. This confirms the applicability of eq 12 in the case of samples of more complex composition. The results for the remaining two systems T2 (ternary) and Q1 (quaternary) are similar to those presented in Figures 2 and 3 (see Supporting Information, Figures S-1 and S-2). In sum, the presently proposed approach makes it possible to complete the unknown in Table 1. Comparison with Unfolding Data. With the new Jacobian approach, the J4 PARAFAC sensitivities for 10 000 different overlapping situations were estimated in all simulated systems using eq 12, and compared to the maximum of the three sensitivity values expected on unfolding. The latter ones were computed using the appropriate FO3BC, FO3BD, and FO3CD expressions of Table 2. The results indicate that the four-way sensitivities are all larger than the best of the three unfolded values, with increasing effects in going from one unexpected interferent to more complex systems having two unexpected interferents (see Supporting Information, Figure S-3). This implies that processing four-way data in their original structure is preferable to unfolding into three-way, since the former activity provides a higher sensitivity. If unfolding is mandatory, it should be properly made, i.e., into the best combination of modes (the ones providing the highest sensitivity according to the FO3BC, FO3BD, and FO3CD equations, see Table 2). This is important when four-way data are processed by algorithms which are not conceived for four-way data,34,35 such as MCRALS.20 Experimental Systems. Both experimental four-way data systems considered in this report were previously studied using PARAFAC for analyte determination. In system 1 the analyte sensitivity was underestimated by employing the HCD4 version,32 whereas in system 2 the sensitivity was not reported.33 In the experimental system 1, a single calibrated analyte occurs (malondialdehyde), and the test samples contained a single unexpected interferent (the olive oil background). In this case the analyte signals (and also the PARAFAC analyte scores) display a significant nonlinear behavior toward the nominal analyte concentrations.32 Hence, for computing SEN, the value of sn was replaced by the local slope (dsn/dyn) at low concentrations. The resulting sensitivity is 1610 AFU mg1 L (AFU = arbitrary fluorescence units). The limits of detection (LOD) and quantitation (LOQ) are 0.01 and 0.03 mg L1, respectively, as estimated with the following crude approximations1 LOD ¼ 3:3 sx =SENn
ð16Þ
LOQ ¼ 10 sx =SENn
ð17Þ
also compatible with the reported experimental RMSEP values of 0.10 mg L1 for the analyte LEU and of 0.08 mg L1 for MET.33
’ CONCLUSIONS Analyte sensitivities achieved when processing four-way data with the quadrilinear parallel factor model can now be estimated using the expression developed in this work. This closes the gap between three- and four-way PARAFAC calibration, particularly when unexpected constituents occur in the unknown samples. It also provides the basic ingredient for future research concerning the estimation of additional figures of merit, such the uncertainty in predicted concentrations, detection capabilities, etc. The present results do also imply that the PARAFAC sensitivity obtained by maintaining the originally structured four-way data arrays is larger than those obtained by processing the data after unfolding them to three-way. If the latter operation is indeed required, it should be done by selecting the proper modes for unfolding; otherwise, the sensitivity would not be optimal. Further research is required to extend these promising results to the estimation of the multiway sensitivity when alternative non-multilinear algorithms are applied. ’ APPENDIX Three-Way Data. These are generated by joining individual sample data matrices. For the simplest system having unexpected constituents (one analyte and one interferent), the generic element of the test data matrix X is given by
xjk ¼ a1 b1j c1k þ a2 b2j c2k
ð18Þ
where b1j, b2j are elements of the B matrix of profiles in the first data mode (J2) and c1k, c2k are elements of the C matrix in the second mode (K2). With no loss of generality, we take a2 = 1, and unfold X into a vector x x ¼ ðc1 . b1 Þa1 þ ðc2 . b2 Þ ¼ s1 a1 þ s2
ð19Þ
We assume that the analyte columns (b1 in B and c1 in C) are well determined by calibration, so that their uncertainties are negligible. When fitting the matrix X to the PARAFAC model, there are (J + K +1) adjustable parameters: the J elements of b2, the K elements of c2 and a1 (the test analyte score). If p is any of the adjustable parameters, its uncertainty (sp) is obtained from the Jacobian matrix J as sp ¼ sx f½ðJT JÞ1 pp g1=2
ð20Þ
where sx is the uncertainty in instrumental signals. The sensitivity can be estimated as the ratio between sx and sp, with p corresponding to the analyte score a1 (sn is required for scaling) SENJ3 ¼ sn ðsx =sp Þ ¼ sn f½ðJT JÞ1 pp g1=2
where the signal uncertainty sx can be estimated from blank replicate measurements. The obtained values are entirely compatible with the reported experimental root-mean-square error of prediction (RMSEP) of 0.06 mg L1, estimated for a group of independent test samples.32 In the experimental system 2 there are two calibrated analytes (the antineoplastics leucovorin and methotrexate) and a single unexpected interferent (the background serum signal). The sensitivities estimated with expression 12 are 5200 and 7200 AFU mg1 L for LEU and MET, respectively, while the LODs and LOQs are 0.01 and 0.03 mg L1 for both analytes. The results are
ð21Þ
where the subscript J3 indicates that the Jacobian approach is applied to three-way data. The J matrix contains J columns with the derivatives dx/db2j, K columns with the derivatives dx/dc2k, and a single last column with the derivatives dx/da1. The specific structure of J is as follows J ¼ ½c2 X Ib jIc X b2 js1
ð22Þ
where Ib and Ic are J J and K K identity matrices, respectively. It is convenient to divide the matrix J in two blocks: 191
dx.doi.org/10.1021/ac202268k |Anal. Chem. 2012, 84, 186–193
Analytical Chemistry
ARTICLE
(1) the first (J + K) columns (Zunx), containing information for the unexpected component, and (2) the last column (Zcal), associated to the calibrated analyte: J ¼ ½Zunx jZcal
Using similar arguments, the Jacobian matrix for the [(J + K + L) Nunx + Ncal] fitted parameters takes the form J ¼ ½dunx1 X cunx1 X Ib jdunx1 X Ic X bunx1 jId X cunx1 X bunx1 jdunx2 X cunx2 X Ib jdunx2 X Ic X bunx2 jId X cunx2 X bunx2 j:::jscal1 jscal2 j:::
ð23Þ
Using the latter notation and a well-known property of the inversion of block matrices, eq 21 leads to
where Id is an LL identity matrix, and scal1, scal2, etc. are the unfolded signals for each of the pure analytes. As an example, for analyte cal1
SENJ3 ¼ sn ½ðI Zunx Zunx þ ÞZcal þ jj1
¼ sn PZunx Zcal jj
scal1 ¼ dcal1 . ccal1 . bcal1
ð24Þ
J ¼ ½Zunx jZcal
where
Pc2 ¼ Ic c2 c2
ð28Þ
These results can be easily generalized for several calibrated analytes and unexpected components. The general Jacobian matrix is J ¼ ½cunx1 X Ib jIc X bunx1 jcunx2 X Ib jIc
SENJ3 ¼ sn nth row of ðPZunx Zcal Þþ jj1
ð30Þ
where n can be cal1, cal2, etc., and PZunx is a projection matrix orthogonal to Zunx. Expression 30 is formally identical to the so-called FO3 sensitivity expression for PARAFAC (Table 2). Four-Way Data. A generalization of the above approach to four-way data is straightforward. It requires, in addition to the B and C profile matrices, the D matrix (L N), involving the elements of the constituent profiles in the third data dimension.
ð36Þ
Pb, unx1 ¼ I bunx1 bunx1 þ
ð37Þ
Pc, unx1 ¼ I cunx1 cunx1 þ
ð38Þ
Pd, unx1 ¼ I dunx1 dunx1 þ
ð39Þ
The reason is that the columns of Zunx (eq 34) are not composed of the bunx1, cunx1, and dunx1 subspaces individually, but of combinations of pairs of subspaces, i.e., (bunx1/cunx1), (cunx1/dunx1), and (bunx1/dunx1). This also implies that the general sensitivity will not be adequately described by the plain extension of FO3 to FO4 (Table 2).
ð29Þ
where unx1, unx2, etc. identify the unexpected component numbers, and cal1, cal2, etc. the calibrated component numbers. The relevant blocks in J are (1) Zunx, associated to the Nunx unexpected components, comprising the first [(J + K)Nunx] columns, and (2) Zcal, ascribed to the Ncal calibrated analytes, comprising the last Ncal columns. For analyte n, the sensitivity is thus given by
ð35Þ
It should be noted that, for four-way data, the orthogonal projection matrix PZunx is not the intuitive extension of the threeway NAS projection matrix. For a single unexpected component, it is not equal to the product (Pd,unx1 X Pc,unx1 X Pb,unx1), where
SENHCD3 ¼ sn ½ðPc2 X Pb2 ÞZcal þ jj1 ¼ sn ðPc2 X Pb2 Þs1 jj1
X bunx2 j:::jscal1 jscal2 j:::
Zcal ¼ ½scal1 jscal2 j:::
ð27Þ
¼ sn f½ðBT BÞ1 nn ½ðCT CÞ-1 nn g1=2
ð34Þ
SENJ4 ¼ sn nth row of ðPZunx Zcal Þþ jj1
Inserting eq 25 in eq 24 leads to the HCD3 sensitivity definition (Table 2):
Zunx ¼ ½dunx1 X cunx1 X Ib jdunx1 X Ic X bunx1 jId X cunx1 X bunx1 jdunx2 X cunx2 X Ib jdunx2 X Ic X bunx2 jId X cunx2 X bunx2 j:::
Using the latter notation, the sensitivity for analyte n is given by an expression analogous to eq 24:
ð26Þ
þ
ð33Þ
where
ð25Þ
Pb2 ¼ Ib b2 b2 þ
ð32Þ
As with three-way data, the J matrix can be divided in two blocks, one spanning the contribution of the unexpected component profiles, and another one associated to the calibrated analyte profiles
where the orthogonal projection matrix PZunx has been explicitly defined. For the obtainment of Zunx+, if (ZunxT Zunx)1 does not exist, it can be replaced by the MoorePenrose pseudoinverse Zunx+. Equation 24 is formally identical to the previously derived expression for estimating the three-way net analyte signal (NAS) for an analyte in the presence of unexpected constituents. In fact, the PZunx matrix in eq 24 is identical to the projection matrix leading to the HCD3 PARAFAC sensitivity PZunx ¼ Pc2 X Pb2
ð31Þ
’ ASSOCIATED CONTENT
bS
Supporting Information. Additional information as noted in text. This material is available free of charge via the Internet at http://pubs.acs.org.
’ AUTHOR INFORMATION Corresponding Author
*E-mail:
[email protected].
’ ACKNOWLEDGMENT Financial support is acknowledged from University of Rosario, CONICET (Project PIP 1950) and ANPCyT (Project PICT2010-0084). 192
dx.doi.org/10.1021/ac202268k |Anal. Chem. 2012, 84, 186–193
Analytical Chemistry
ARTICLE
’ REFERENCES (1) Olivieri, A. C.; Faber, N. M.; Ferre, J.; Boque, R.; Kalivas, J. H.; Mark, H. Pure Appl. Chem. 2006, 78, 633–661. (2) Faber, N. M.; Lorber, A.; Kowalski, B. R. J. Chemom. 1997, 11, 419–461. (3) Faber, N. M.; Ferre, J.; Boque, R.; Kalivas, J. H. Chemom. Intell. Lab. Syst. 2002, 63, 107–116. (4) Olivieri, A. C.; Faber, N. M. Chemom. Intell. Lab. Syst. 2004, 70, 75–82. (5) Olivieri, A. C. J. Chemom. 2004, 18, 363–371. (6) Olivieri, A. C.; Faber, N. M. In Comprehensive Chemometrics; Brown, S., Tauler, R., Walczak, B., Eds.; Elsevier: Amsterdam, 2009; Vol. 3, pp 91120. (7) Olivieri, A. C. Anal. Chem. 2008, 80, 5713–5720. (8) Danzer, K.; Currie, L. A. Pure Appl. Chem. 1998, 70, 993–1014. (9) Faber, K.; Kowalski, B. R. J. Chemom. 1997, 11, 181–238. (10) Bro, R. Chemom. Intell. Lab. Syst. 1997, 38, 149–171. (11) Olivieri, A. C.; Faber, N. M. J. Chemom. 2005, 19, 583–592. (12) Olivieri, A. C. Anal. Chem. 2005, 77, 4936–4946. (13) Shao, J.; Tu, D. The Jackknife and Bootstrap; Springer: New York, 1995. (14) Liu, X.; Sidiropoulos, S. D. IEEE T. Signal Proces. 2001, 49, 2074–2086. (15) Xia, A. L.; Wu, H. L.; Li, S. F.; Zhu, S. H.; Hu, L. Q.; Yu, R. Q. J. Chemom. 2007, 21, 133–144. (16) Fu, H. Y.; Wu, H. L.; Yu, Y. J.; Yu, L. L.; Zhang, S. R.; Nie, J. F.; Li, S. F.; Yu, R. Q. J. Chemom. 2011, 25, 408–429. (17) Olivieri, A. C.; Escandar, G. M.; Mu~noz de la Pe~na, A. Trends Anal. Chem. 2011, 30, 607–617. (18) Booksh, K. S.; Kowalski, B. R. Anal. Chem. 1994, 66, 782A– 791A. (19) Escandar, G. M.; Faber, N. M.; Goicoechea, H. C.; Mu~ noz de la Pe~na, A.; Olivieri, A. C.; Poppi, R. J. Trends Anal. Chem. 2007, 26, 752–765. (20) De Juan, A.; Casassas, E.; Tauler, R. In Encyclopedia of Analytical Chemistry; Myers, R. A., Ed.; Wiley: Chichester, U.K, 2002; Vol. 11, pp 98009837. (21) Kiers, H. A. L.; Ten Berge, J. M. F.; Bro, R. J. Chemom. 1999, 13, 275–294. (22) Martens, H. ; Næs, T.Multivariate Calibration; John Wiley: Chichester, U.K., 1989. (23) Messick, N. J.; Kalivas, J. H.; Lang, P. M. Anal. Chem. 1996, 68, 572–1579. (24) Ho, C.-N.; Christian, G. D.; Davidson, E. R. Anal. Chem. 1980, 52, 1071–1079. (25) Strutz, T. Data Fitting and Uncertainty. A Practical Introduction to Weighted Least Squares and Beyond; Vieweg and Teubner: Wiesbaden, Germany, 2010. (26) Appellof, C. J.; Davidson, E. R. Anal. Chim. Acta 1983, 146, 9–14. (27) Rao, C. R.; Mitra, S. Generalized Inverse of Matrices and its Applications; Wiley: New York, 1971. (28) Rinnan, A.; Riu, J.; Bro, R. J. Chemom. 2007, 21, 76–86. (29) Andersen, C. M.; Bro, R. J. Chemom. 2003, 17, 200–215. (30) Olivieri, A. C.; Wu, H. -L.; Yu, R.-Q. Chemom. Intell. Lab. Syst. 2009, 96, 246–251. (31) MATLAB 7.10; The MathWorks Inc.: Natick, MA, 2010. (32) García-Reiriz, A.; Damiani, P. C.; Olivieri, A. C.; Ca~ nadaCa~ nada, F.; Mu~noz de la Pe~na, A. Anal. Chem. 2008, 80, 7248–7256. (33) Olivieri, A. C.; Arancibia, J. A.; Mu~noz de la Pe~na, A.; DuranMeras, I.; Espinosa Mansilla, A. Anal. Chem. 2004, 76, 5657–5666. (34) Jaumot, J.; Marchan, V.; Gargallo, R.; Grandas, A.; Tauler, R. Anal. Chem. 2004, 76, 7094–7101. (35) Bailey, H. P.; Rutan, S. C. Chemom. Intell. Lab. Syst. 2011, 106, 131–141.
193
dx.doi.org/10.1021/ac202268k |Anal. Chem. 2012, 84, 186–193