Anal. Chem. 2009, 81, 1208–1216
Spectral Simulation Protocol for Extending the Lifetime of Near-Infrared Multivariate Calibrations Yusuf Sulub† and Gary W. Small* Department of Chemistry and Optical Science and Technology Center, University of Iowa, Iowa City, Iowa 52242 Multivariate calibration models based on synthetic singlebeam near-infrared spectra are used to demonstrate the ability to maintain viable calibrations over extended time periods. Glucose is studied over the physiological concentration range of 1-30 mM in a buffered aqueous matrix containing varying levels of alanine, ascorbate, lactate, urea, and triacetin. By employing a set of 25 test samples measured 23 times over a period of 325 days, partial least-squares (PLS) calibrations based on synthetic spectra are observed to outperform conventional calibrations that use a set of 64 measured calibration samples. The key to the success of this approach is the use of a set of spectra of phosphate buffer collected on each prediction day to construct synthetic calibration spectra that are specific to that day. This allows the incorporation into the calibration model of nonanalyte spectral variance that is unique to a particular day. In this way, the adverse effects of instrumental drift or other sources of spectral variance on prediction performance can be minimized. Through the application of this methodology, values of the standard error of prediction (SEP) for glucose concentration are maintained to a range of 0.50-0.95 mM and an average of 0.68 mM over the 325 days of the experiment. These results are significantly better than those obtained with conventional models based on measured calibration samples. Over the same time period, a PLS model based on measured calibration spectra in absorbance units produced values of SEP that ranged from 0.41 to 2.02 mM and an average of 1.23 mM. Near-infrared (near-IR) spectroscopy has found significant use in a variety of quantitative determinations involving analytes in complex matrixes.1-5 Many important analytes have characteristic vibrational spectral signatures in the near-IR region and can be measured directly with little or no sample preparation. Furthermore, common background constituents such as water exhibit reduced absorption in the near-IR relative to the mid-infrared. This * To whom correspondence should be addressed. Phone: 1-319-335-3214. Fax: 1-319-353-1115. E-mail:
[email protected]. † Present address: Novartis Pharmaceuticals Corp., One Health Plaza, Bldg. 401/B244B, East Hanover, NJ 07936-1080. (1) Blanco, M.; Villarroya, I. Trends Anal. Chem. 2002, 21, 240–250. (2) Brearley, A. M. In Process Analytical Technology; Bakeev, K. A., Ed.; Blackwell Publishing Ltd.: Oxford, U.K., 2005; pp 392-423. (3) Lafrance, D.; Lands, L. C.; Burns, D. H. Vib. Spectrosc. 2004, 36, 195–202. (4) Larrechi, M. S.; Callao, M. P. Trends Anal. Chem. 2003, 22, 634–640. (5) Heise, H. M. In Near-Infrared Spectroscopy: Principles, Instruments, Applications; Siesler, H. W., Ozaki, Y., Kawata, S., Heise, H. M., Eds.; Wiley-VCH: Weinheim, Germany, 2002; pp 289-333.
1208
Analytical Chemistry, Vol. 81, No. 3, February 1, 2009
allows the direct determination of species in aqueous solutions with available optical path lengths in the millimeter range. This capability offers the potential for near-IR analyses of biological samples such as blood or plasma6 and also provides hope for in vivo measurements of analytes such as glucose directly in tissue.7 The utility of near-IR measurements in these application scenarios can only be realized, however, if reliable methods are available to calibrate the method and to update or maintain the calibration over time. The broad and overlapping nature of nearIR spectra dictates that calibrations must be based on measurements at multiple wavelengths. Consequently, multivariate calibration methods such as partial least-squares (PLS) regression are typically used to correlate output signals from the spectrometer with analyte concentrations. There are two key calibration issues that must be addressed if a practical near-IR analysis is to be developed. First, the instrumental configuration and sampling interface have to be designed to provide spectral measurements that minimize variation and are as stable as possible with respect to time. These factors will contribute to the stability of the calibration. Second, the requirements for the collection of calibration data must be practical from the standpoint of time and cost. Unfortunately, multivariate calibration models built from complex samples typically experience degraded prediction accuracy over time. Such models require many parameters to be estimated from the calibration data and thus there are many opportunities for the model to degrade. Factors such as instrumental drift, a change in the environmental conditions associated with the measurement, or a change in the background matrix may all contribute to reduce prediction accuracy. In such cases, a full recalibration is often required. To address this issue, numerous calibration updating and standardization methods have been reported that attempt to minimize the deterioration of the performance of calibration models over time. They include piecewise direct standardization (PDS)8,9 and prediction augmented classical least-squares (PACLS).10 Signal processing techniques such as orthogonal signal correction (OSC)11 can also be employed to the same effect. (6) Amerov, A. K.; Chen, J.; Small, G. W.; Arnold, M. A. Anal. Chem. 2005, 77, 4587–4594. (7) Maruo, K.; Tsurugi, M.; Tamura, M.; Ozaki, Y. Appl. Spectrosc. 2003, 57, 1236–1244. (8) Wang, Y.; Lysaght, M. J.; Kowalski, B. R. Anal. Chem. 1992, 64, 562–564. (9) Wang, Y.; Kowalski, B. R. Appl. Spectrosc. 1992, 46, 764–771. (10) Haaland, D. M.; Melgaard, D. K. Appl. Spectrosc. 2000, 54, 1303–1312. (11) Woody, N. A.; Feudale, R. N.; Myles, A. J.; Brown, S. D. Anal. Chem. 2004, 76, 2595–2600. 10.1021/ac801746n CCC: $40.75 2009 American Chemical Society Published on Web 01/08/2009
However, all of these techniques require the generation of samples and collection of spectra for calibration. To counter the detrimental effect of spectrometer drift and changes in physical parameters on the performance of calibration models without the expensive regeneration of samples and collection of calibration spectra, we have developed a synthetic spectral modeling approach that employs background spectra.12 This involves using background measurements and known purecomponent spectra to generate a synthetic calibration set and subsequently using these data to generate calibration models to predict analyte concentrations in measured spectra. By utilizing background measurements, we are in effect incorporating spectral variations that emanate from spectrometer drift and environmental fluctuations. Success of this methodology depends on the frequency and abundance of background measurements and their ability to capture the spectral information pertaining to fluctuations in the instrumental response and environment. In the initial work, the basic feasibility of this method was demonstrated, although the availability of only a small number of background spectra was thought to limit the ability of the method to account for variation in the measured prediction spectra. In the research presented here, we investigate the stability of calibration models by evaluating their prediction performance over a period of 325 days. A data set composed of aqueous samples containing physiological levels of glucose (analyte) with five other interfering species (lactate, urea, ascorbate, alanine, triacetin) is used. This sample matrix represents the types of components and degree of spectral overlap encountered in biological samples such as blood plasma or serum. The prediction results obtained with models built from measured calibration data are compared to the corresponding performance of calibration models based on synthetic spectra. Data sets are collected with two different light sources to incorporate a systematic component of instrumental variation into the analysis and thereby test the robustness of the calibration models to such variation. In addition, the viability of the simulation approach is evaluated by varying the composition of the sample matrix with the inclusion of an additional component (proline). Proline is chosen here as another potential constituent of biological samples that has a spectrum distinct from glucose and the other five matrix components. EXPERIMENTAL SECTION Reagents and Solution Preparation. Each sample was composed of a mixture of glucose, sodium lactate, urea, sodium ascorbate, alanine, and triacetin dissolved in a phosphate buffer. A uniform experimental design13 was used to generate an initial set of 2129 mixture combinations. Employing the subset selection method first proposed by Carpenter and Small,14 64 calibration and 25 prediction samples were picked separately from this pool of 2129 combinations. The concentration ranges for glucose, lactate, urea, ascorbate, alanine, and triacetin were 1.6-30, 0.8-32, 1-30, 1.2-19.8, 1.5-28.5, and 1.4-30 mM, respectively. The concentration values for each component per sample were assigned to minimize correlations among the constituents. For the calibration and prediction sets, respectively, absolute values (12) Sulub, Y.; Small, G. W. Appl. Spectrosc. 2007, 61, 406–413. (13) Fang, K. T. Uniform Design and Uniform Design Tables; Science Press: Beijing, 1994. (14) Carpenter, S. E.; Small, G. W. Anal. Chim. Acta 1991, 249, 305–321.
of the correlation coefficients between pairs of components ranged from 0.0021 to 0.1251 and from 0.0292 to 0.2841. An additional set of eight samples was prepared that contained an extra component (proline) in the sample matrix. Here, the concentration of proline was kept constant at 15 mM, while the concentration values of glucose, lactate, urea, ascorbate, alanine, and triacetin were picked from the set of 2129 design points using the Kennard-Stone15 algorithm. Across the eight samples, concentration ranges for glucose, lactate, urea, ascorbate, alanine, and triacetin varied between 1.5 and 30, 1.0 and 30, 2.0 and 30, 1.2 and 20.1, 1.7 and 29.7, and 1.4 and 30 mM, respectively. Samples were prepared by volumetric dilutions of stock solutions. The solvent was a 0.1 M, pH 7.4 phosphate buffer prepared in reagent-grade water obtained from a Labconco WaterPro PS Station (Labconco, Inc., Kansas City, MO). The buffer also contained 5 g/L of sodium benzoate as a preservative. Reagents used were glucose (ACS reagent, Fisher Scientific, Fair Lawn, NJ), sodium L-lactate (98%, Sigma, St. Louis, MO), urea (ACS reagent, Fisher), sodium ascorbate (ACS reagent, Aldrich, Milwaukee, WI), alanine (Aldrich), and triacetin (Sigma). Preparation of the eight samples that contained proline (99%, Sigma) involved a similar dilution procedure using stock solutions. The concentrations of glucose for all the samples were confirmed with a YSI model 2300 STAT Plus glucose analyzer (YSI, Inc., Yellow Springs, OH). Instrumentation and Apparatus. Spectra were collected with a Nicolet Nexus 6700 Fourier transform (FT) spectrometer (Thermo Fisher Scientific, Inc., Madison, WI) equipped with a calcium fluoride beam splitter and liquid nitrogen-cooled indium antimonide detector. In an effort to investigate the effect of changing the instrumental configuration on the performance of multivariatecalibrationmodels,twosources(globarandtungsten-halogen lamp) were used in this experiment. Data sets collected with the globar and tungsten-halogen sources will henceforth be termed experiments A and B, respectively. A K-band optical interference filter (Barr Associates, Westford, MA) was used to restrict the source radiation reaching the detector to the range of 5000-4000 cm-1. To ensure detector linearity, the incident light was further attenuated by use of a 63% transmittance thin-film-type neutral density filter (Rolyn Optics, Covina, CA). A demountable liquid transmission cell with a 20 mm diameter circular aperture and 1.5 mm path length (model 118-3, Wilmad Glass, Buena NJ) equipped with sapphire windows (Meller Optics, Providence, RI) was used to hold the samples for measurement. Temperatures of the cell were maintained at 37.0 ± 0.1 °C by employing a refrigerated temperature bath (model 9100, Fisher) to circulate water through an integrated water jacket. Sample temperatures were monitored with a copper-constantan thermocouple probe and thermocouple meter (Omega Engineering, Stamford, CT). Sampling Protocol. The 64 samples in the calibration set were collected over a period of 4 days for both experiments A and B. The prediction set composed of 25 samples was collected over 2 days. For experiment A (globar source), the first, second, and third prediction sets (prediction weeks 1-3) were collected 1 day, 1 week, and 2 weeks after the calibration set. Subsequent prediction sets were collected on an approximately biweekly basis. (15) Kennard, R. W.; Stone, L. A. Technometrics 1969, 11, 137–149.
Analytical Chemistry, Vol. 81, No. 3, February 1, 2009
1209
Table 1. Summary of Data Collection
days
prediction set
data type
no. of samples
no. of sample spectra
no. of buffer spectra
1-4 5-6 13-14 20-21 34-35 48-49 62-63 86-87 100-101 114-115 120-123 128-129 142-143 156-157 170-171 184-185 205-206 219-220 240-241 254-255 268-269 282-283 296-297 310-311 324-325
n/a 1 2 3 4 5 6 7 8 9 n/a 10 11 12 13 14 15 16 17 18 19 20 21 22 23
calibrationa prediction prediction prediction prediction prediction prediction prediction prediction prediction calibrationb prediction prediction prediction prediction prediction prediction prediction prediction prediction prediction prediction prediction prediction prediction
64 25 25 25 25 25 25 25 25 25 64 25 25 25 25 25 25 25 25 25 25 25 25 25 25
192 75 75 75 75 75 75 75 75 75 192 75 75 75 75 75 75 75 75 75 75 75 75 75 75
88 44 44 44 44 44 44 44 44 44 88 44 44 44 44 44 44 44 44 44 44 44 44 44 44
a Data collected with the globar source (experiment A). collected with the tungsten-halogen source (experiment B).
b
Data
For experiment B (tungsten-halogen source), the first prediction set was collected 1 week after the calibration set with subsequent prediction sets acquired on an approximately biweekly basis except for prediction set 17 which was collected 3 weeks after prediction set 16. Experiment A spanned approximately 4 months (115 days), while the data for experiment B were collected over 6 months (205 days). Triplicate measurements of each sample were made without replacing or reloading the sample. During the course of each measurement session, phosphate buffer spectra were collected periodically. This was implemented by acquiring 16, 3, and 3 buffer spectra at the beginning, middle, and end of each data collection day, respectively. Table 1 provides a summary of the sampling protocol for both experiments. The eight samples containing proline were measured in a similar fashion at the same time as the collection of prediction set 23. At the end of each measurement session, buffer and analyte samples were stored in the freezer. Prior to the commencement of the next data acquisition period, both buffer and analyte samples were removed from the freezer and thawed. Glucose concentrations were tested periodically with the YSI analyzer and were confirmed to be stable through the freeze/thaw cycles. Experimental run orders for all the samples were randomized to minimize the correlation of component concentrations and time. The raw data consisted of 256 coadded double-sided interferograms containing 8192 points collected at every zero crossing of the He-Ne reference laser (nominal 8 cm-1 spectral resolution, 15 803 cm-1 bandwidth). Density Measurements. Generation of synthetic spectra required data for the degree of water displacement upon dissolution of the seven solutes used in this work. Displacement factors for glucose, lactate, urea, ascorbate, alanine, and triacetin were 1210
Analytical Chemistry, Vol. 81, No. 3, February 1, 2009
obtained from previously reported work.16 To obtain the water displacement factor for proline, the densities of water and five pure-component solutions of proline (50-250 mM) were measured at 37.0 ± 0.1 °C with a densitometer (model 5000, Anton Paar USA, Ashland, VA). The water displacement coefficient of 6.39 was obtained from the slope of a function relating the concentration of water displaced (obtained from the measured density) to the proline concentration.16 Data Analysis. All interferograms were transferred to a Silicon Graphics Origin 200 computer (Silicon Graphics, Inc., Mountain View, CA) operating under Irix (version 6.5, Silicon Graphics, Inc.). The interferograms were subsequently Fourier processed on this system with one level of zero-filling, triangular apodization and Mertz phase correction resulting in single-beam spectra with a point spacing of 1.93 cm-1. All other computations were performed under MATLAB (version 6.5, The MathWorks, Inc., Natick, MA) running on a Dell Precision 670 workstation (Dell Computer Corp., Austin, TX) operating under Red Hat Enterprise Linux WS (Red Hat, Inc., Raleigh, NC). RESULTS AND DISCUSSION Assessment of Measured Data. The spectral quality of the measured data can be obtained by evaluating the noise estimates within the replicates for each sample. Ratios were taken among triplicate spectra in all possible combinations to obtain 100% lines which were subsequently converted to absorbance units (AU). Root-mean-square (rms) noise values in AU were computed about a second-order polynomial function fitted to the 100% lines across the 4500-4300 cm-1 spectral window. Noise estimates for the buffer spectra were computed similarly, dividing spectra 2-22 on each day into seven groups of three. The means and standard deviations of the rms noise estimates for glucose samples in experiments A and B were 6.5 ± 2.9 and 5.8 ± 2.7 µAU, respectively. The corresponding values for the buffer spectra were 4.9 ± 2.7 and 4.6 ± 2.7 µAU. The pure-component spectra for the seven components used in this study are depicted in Figure 1. Each spectrum in this plot is unique in terms of the position, magnitude, and number of absorption bands. Glucose has three bands in this spectral region, an O-H combination band at 4700 cm-1 and two C-H combination bands at 4300 and 4400 cm-1, respectively. The extensive degree of spectral overlap evidenced in these spectral traces demands the use of multivariate calibration techniques such as PLS regression to correlate the acquired spectra and reference glucose concentrations. To assess the integrity of the measured data further, PLS calibration models were generated from normalized single-beam spectra and spectra in absorbance units. For the absorbance calculation, the background was taken as the mean of the nine buffer spectra acquired closest in time to the sample spectra. This choice of background has been established in previous work with similar aqueous samples.17 Spectra were mean-centered before submission to the PLS algorithm. A leave-one-out cross-validation procedure was utilized to assess calibration model performance. This involved leaving out (16) Amerov, A. K.; Chen, J.; Arnold, M. A. Appl. Spectrosc. 2004, 58, 1195– 1204. (17) Kramer, K. E.; Small, G. W. Vib. Spectrosc. 2007, 43, 440–446.
Figure 1. Pure-component spectra (AU) of 150 mM glucose (solid line), 200 mM lactate (dotted line), 100 mM urea (dash-dot line), 60 mM ascorbate (dashed line), 150 mM alanine (large dashed line), 140 mM triacetin (dash-dot-dot-dot line), and 100 mM proline (large dash-small dash line). Spectra for each component were computed by use of a background spectrum of phosphate buffer. Negative absorbance values arise from differences in water concentrations between the sample and buffer spectra, and baseline fluctuations are caused primarily by slight differences in sample and buffer temperatures during the data collection.
a sample with its replicate measurements and formulating a calibration model from the spectra of the remaining samples. The computed model was then used to predict the analyte concentration corresponding to the spectra of the withheld sample. This procedure was repeated iteratively until every sample had been left out once. The model performance was assessed by computing the cross-validation standard error of prediction (CV-SEP). Because the glucose signature is heavily overlapped with the spectral features of the other five interfering species, a wavenumber search protocol was utilized. This involved investigating the 4883-4099 cm-1 spectral window with range sizes varying from 600 to 50 cm-1 in steps of 25 cm-1. Each range size was stepped across the spectral window in steps of 25 cm-1. This procedure resulted in an evaluation of 437 ranges. For each wavenumber range investigated, models based on 1-18 PLS latent variables were computed and the corresponding CV-SEP values obtained. The optimal wavenumber range was determined from a sort of the CV-SEP values. The corresponding number of PLS factors to use was established via the F-test statistic at the 95% confidence interval. For the 437 spectral ranges, the number of latent variables that produced a value of CV-SEP not statistically different from the minimum CV-SEP was taken as optimal. For models based on normalized single-beam spectra, the optimal spectral ranges were 4633-4233 and 4458-4158 cm-1 for experiments A and B, respectively. The corresponding numbers of latent variables were 13 and 11. For the models based on spectra in AU, the spectral ranges were 4533-4258 and 4458-42831 cm-1 for experiments A and B, respectively. The number of latent variables was 10 in both cases. The fact that the optimization procedure produced different spectral ranges for the two data sets is a reflection that many similar spectral ranges produce similar CV-SEP values. For example, with the models based on absorbance data using the globar source (experiment A), CV-SEP values ranged from 0.47 to 0.50 mM for the top 10
spectral ranges. The need for 10-13 latent variables in a system with only six independently varying chemical components reflects the small glucose spectral signals (10-3-10-4 AU), the significant nonlinear effects of temperature variation on the water background absorbance, and the presence of instrumental drift in the data. The performance of the optimized calibration models was then tested by evaluating prediction performance for glucose in the calibration and prediction sets. This produced values of the standard error of calibration (SEC) and standard error of prediction (SEP) for each data set. For the models based on normalized single-beam spectra, implementation of this procedure generated SEC values of 0.37 and 0.36 mM for experiments A and B, respectively. Corresponding SEP results for prediction sets 1 and 10 (closest to the respective calibration data) were 0.55 and 0.34 mM for experiments A and B, respectively. For the models based on data in AU, SEC values were 0.42 and 0.40 mM for experiments A and B, respectively. The corresponding SEP values for prediction sets 1 and 10 were 0.65 and 0.50 mM. Prediction results tended to degrade with time. For experiment A (globar source), the calibration model based on single-beam spectra performed well with prediction sets 1-3 with corresponding SEP values of 0.55, 0.47, and 0.53 mM, respectively. From prediction set 4 onward, the performance of the model began to degrade and ultimately failed in prediction set 9 with an SEP value of 3.10 mM. A similar deterioration was evidenced in experiment B (tungsten-halogen source) with respect to prediction sets 11-23. For the model based on absorbance data from experiment A, SEP values were 0.65, 0.73, 0.74, and 0.78 mM for prediction sets 1-4. The results degraded then, reaching 2.02 mM at prediction set 9. A similar trend was seen with experiment B. Values of SEP were below 0.61 mM for prediction sets 10-13, then degraded to an average of 0.95 mM over sets 14-16, and an average of 1.30 mM for sets 17-23. The overall degradation of the calibration models with respect to time evidenced in both experiments A and B and with both types of spectral data can be attributed to the existence of nonanalyte spectral variations in the prediction samples that were not present in the calibration spectra. These spectral perturbations are thought to emanate from changes in the instrumental response caused by instrumental drift and possible variations in environmental parameters associated with the instrument or sample (e.g., slight temperature effects). Even though the temperature was controlled during the data acquisition, near-IR spectra of aqueous samples are known to be very sensitive to temperature variations of less than ±0.1 °C.18 These results serve as a motivation to develop and implement a calibration protocol that incorporates auxiliary nonanalyte information pertaining to the prediction samples into the generation of the calibration models. Generation of Synthetic Spectra. To counteract and remedy the harmful effects of instrumental drift and environmental changes on the performance of calibration models over time, the synthetic spectral generation algorithm developed in our previous work was used.12 To review briefly, this procedure can be represented as (18) Hu, S.-K. B.; Arnold, M. A.; Wiencek, J. M. Anal. Chem. 2000, 72, 696– 702.
Analytical Chemistry, Vol. 81, No. 3, February 1, 2009
1211
I0(ν¯)
I(ν¯) )
n
10(
∑
(1)
ε(ν¯)ibci)
sample optical configuration, I(νj)water,high, to one measured using the same configuration as the blank cell, I(νj)water,low, as shown in eq 3.
i)1
In eq 1, I(νj) is the measured single-beam output as a function of wavenumber, νj, I0(νj) is the spectral response of the background, b is the optical path length, and ε(νj)i and ci correspond to the absorptivity and concentration, respectively, of component i in the sample. The summation represents an extension of the Beer-Lambert law for multicomponent systems, in which the overall absorbance measurement for a sample is given by the sum of absorbances for each constituent. Implementation of this methodology involved the use of buffer spectra to approximate I0(νj). Variables included into the simulation algorithm were a path length of 1.5 mm, reference concentrations of the six components, and the water concentration, adjusted for the solvent displacement effect of dissolving the six sample constituents. Calculation of Molar Absorptivities. The absorptivities of the components were obtained via an experimental protocol that utilized the explicit linear additive modeling capability of the Beer-Lambert law in conjunction with classical least-squares (CLS) regression analysis. Pure-component aqueous samples of all the seven components (glucose, lactate, urea, ascorbate, alanine, triacetin, and proline) were prepared at five concentration levels between 50 and 250 mM in water. Triplicate single-beam spectra were acquired for each sample in this analysis. Samples in this case were composed of the five pure-component concentration levels, phosphate buffer, and water. Corresponding absorbance measurements were obtained by taking the ratio of the measured single-beam spectra to the spectrum of the blank cell with the cell windows butted (i.e., spacers removed). This was done to minimize the occurrence of interference fringes caused by internal reflections of the source beam within the cell. Pure spectral components were obtained by computing the linear least-squares solution given by
(
-log
)
2
∑
I(ν¯) ) ε(ν¯)ibci ) ε(ν¯)solutebcsolute + ε(ν¯)waterbcwater RI0(ν¯) i)1 (2)
In eq 2, I(νj) corresponds to the single-beam spectral output for the sample, I0(νj) is the blank cell reference single-beam spectrum, R is a scaling term that corrects for differences in the optical configuration between the measurement of I(νj) and I0(νj), ε(νj)solute and ε(νj)water refer to the molar absorptivity of the solute and water, respectively, and csolute and cwater denote the concentrations of the solute and water, respectively. Because the same source intensity cannot be used to probe samples and the blank cell due to detector saturation concerns, the scaling parameter R was used to offset this difference in the optical configurations. Experimentally, the transmitted light intensity from the blank cell was attenuated through the use of a screen-type neutral density filter. This was in addition to the 63% transmittance neutral density filter mentioned in the experimental section. The value of the scaling term was evaluated by taking the ratio of a single-beam spectrum of water collected in the 1212
Analytical Chemistry, Vol. 81, No. 3, February 1, 2009
R(ν¯) )
I(ν¯)water,high I(ν¯)water,low
(3)
Because both single-beam spectra are wavenumber-dependent, R will also vary with wavenumber. Also, since the spectral responses of the neutral density filters are not constant (i.e., perfectly flat), there also exists a possibility of a nonlinear relationship in eq 3. To address this issue, plots of I(νj)water,high versus I(νj)water,low were examined over the 5000-4000 cm-1 spectral region. No evidence of nonlinear behavior was observed, and the linear correlation coefficient was 0.99. Accordingly, R was taken as the least-squares slope of a linear fit. To account for variance within measurements, both single-beam spectra (I(νj)water,high and I(νj)water,low) were collected in triplicates, and the mean over all nine possible estimates (R ) 62.5) was used. Concentrations of the solutes for each component were computed accurately from the solution preparations. Water concentration values depend on the density of water, concentrations of the solutes, and their corresponding solvent displacement factors. Using the density of water at 37 °C, the concentration of water, cwater, corresponding to each pure-component aqueous sample in this experiment is given by eq 4: cwater ) cpure water - dsolutecsolute
(4)
where cpure water is the concentration of pure water which was determined to be 55.138 ± 0.001 M from density measurements,19 dsolute is the water displacement factor for the purecomponent sample,16 and csolute is the concentration of the purecomponent sample. The relationship in eq 4 shows that the concentration of water decreases as a function of the product of the concentration of the solute and its corresponding displacement factor. The molar absorptivities for each component were obtained by representing eq 2 as a CLS model given by A ) CK + E
(5)
where A is a 15 × 518 matrix of spectral absorbances for 15 spectra (5 concentration levels × 3 replicates) at 518 resolution elements for the pure-component samples, C is a 15 × 2 matrix of concentration-path length products, K is a 2 × 518 matrix of leastsquares estimates of pure-component spectra of the solute and water, and E is a 15 × 518 matrix of errors or residuals about the fitted CLS model. The least-squares estimate of the pureˆ and is given by component spectra, K, is denoted by K ˆ ) (CTC)-1CTA K
(6)
where the superscript T represents the transpose of the matrix ˆ is also a 2 × 518 matrix in which the rows are the pureand K component spectra (absorptivities) of the solute and water. Figure (19) Amerov, A.; Chen, J.; Arnold, M. A. Appl. Spectrosc. 2004, 58, 1195–1204.
Figure 2. Molar absorptivities of glucose (solid line), lactate (dotted line), urea (dash-dot line), ascorbate (dashed line), alanine (large dashed line), triacetin (dash-dot-dot-dot line), and proline (large dash-small dash line) determined by CLS modeling. Negative values occur in spectral regions where the CLS fit degrades because of the presence of high noise. These regions occur where very little light reaches the detector because of the absorption of the light by water.
2 depicts the molar absorptivities for the seven components that were generated from this experiment and subsequently used in the calculation of the synthetic data. Assessment of Synthetic Data. Using the measured absorptivities, component concentrations, and buffer spectra, the synthetic spectral generation algorithm12 was implemented to compute simulated single-beam spectra. To validate the viability of this algorithm, the generated synthetic single-beam spectra were compared to corresponding measured single-beam spectra collected in the same time span as the buffer spectra used in the simulation. Figure 3A shows the mean-centered experimentally measured single-beam spectra of the first 39 calibration spectra in experiment A (13 groups of three replicates). The corresponding synthetic spectra are plotted in Figure 3B. Comparisons between the measured and synthetic data sets for experiment B produced similar results. Comparison of the magnitude and spectral shapes between the measured and synthetic data sets in Figure 3 clearly shows the success of this simulation approach to approximating the measured data. Variation in sample temperature is reflected in the greater variation within the groups of replicates in the measured data in Figure 3A. A further level of comparison between the measured and synthetic data sets was performed using principal component analysis.20 Figure 4A displays a score plot based on the first and second principal components computed from the normalized, mean-centered synthetic and measured data across the 4800-4200 cm-1 range for experiment A (globar source). Data from the calibration set and predictions sets 1-3 were used. These two principal components explained more than 79% of the total data variance. Figure 4B depicts an analogous plot for the data from experiment B (tungsten-halogen source). These principal components spanned 81% of the data variance. Examining these score (20) Jollife, I. T. Principal Component Analysis; Springer-Verlag: New York, 1986.
Figure 3. (A) Mean-centered single-beam spectra of the first 39 measured spectra (13 groups of three replicates) from experiment A (globar source). (B) Mean-centered single-beam spectra of the corresponding synthetic spectra. Increased variation among the replicates in the measured data reflects variation in sample temperature during the data acquisition.
plots clearly reveals the extent of overlap between the measured and synthetic data sets. Similar degrees of overlap were witnessed for all possible dual comparisons of 12 principal components computed. Data Analysis with Synthetic Spectra. Buffer spectra collected with each prediction set were used to generate a synthetic calibration data set corresponding to that measurement session only. The same experimental design used in the collection of the experimentally measured calibration set was employed in the generation of the synthetic spectra. Thus, a new calibration model based on synthetic spectra was computed for each prediction set. The motivation for this approach was to formulate an independent synthetic data set that represented both sample and nonsample variations inherent to a particular prediction set. In constructing the synthetic calibration spectra for a given prediction set the entire set of 44 buffer spectra collected in each prediction set was used in the approximation of the source spectra.12 In computing each calibration model, the same optimization steps described previously for the measured data were used. This included the grid search for wavenumber optimization and the determination Analytical Chemistry, Vol. 81, No. 3, February 1, 2009
1213
Figure 4. Principal component (PC) score plots comparing measured (circles) and simulated (triangles) single-beam spectra in the calibration data sets from experiments A (A) and B (B). Scores along the second PC are plotted with respect to the scores along the first PC. The first two PCs accounted for 79% and 81% of the data variance for experiments A and B, respectively.
of the optimal number of PLS latent variables. The F-test was based on the 99% confidence level. Although it has been demonstrated that the simulation approach generates data that spans the same amount of variation as the measured data, the corresponding SEC values were below 0.1 mM, thereby requiring a stricter F-test rule. The optimized PLS calibration model for each prediction set was then tested by predicting the glucose concentrations in the measured spectra for that prediction set. The process was repeated and a new model developed for the next prediction set. Figure 5A is a bar plot that displays the SEP for each prediction set based on synthetic calibration data sets (striped bars) and the corresponding SEP values based on measured calibration data obtained with the globar source (experiment A). For the results obtained with measured calibration data, solid and open bars denote the models based on normalized single-beam spectra and spectra in AU, respectively. The SEP value for a given prediction set was obtained by pooling the results for the two prediction days 1214
Analytical Chemistry, Vol. 81, No. 3, February 1, 2009
Figure 5. (A) Prediction results corresponding to PLS calibration models optimized with measured single-beam spectra (solid black bars), measured spectra in AU (open bars), and synthetic spectra (striped bars) from experiment A (globar source). The y-axis in the plot is clipped at SEP ) 2.5. The clipped value for prediction data set 9 is 3.10 mM. (B) Corresponding results for experiment B (tungsten-halogen source).
that comprised the set. Figure 5B displays the corresponding results for data obtained with the tungsten-halogen source (experiment B). The trend of the SEP values in both Figure 5, parts A and B, clearly shows an improvement of the results when the synthetic data generation algorithm is used. This can be attributed to the inclusion of nonchemical spectral variations emanating from instrumental drift or environmental factors particular to each data collection session into the calibration model built exclusively for that session. Table 2 lists the values of SEC (synthetic data domain) and SEP (measured data domain) for the entire data set. The SEP values for experiments A and B range between 0.50 and 0.91 mM and 0.43 and 0.95 mM, respectively. From these SEP values, there is no apparent pattern that would suggest degradation of the synthetic calibration models. Calibration Performance across Different Instrumental Configurations. Since the proposed synthetic data generation and calibration protocol utilizes buffer spectra to accommodate up-todate nonchemical spectral components, the performance of this scheme was tested in predicting glucose concentrations in the
Table 2. Summary of Prediction Results from Models Computed with Synthetic Data days
prediction set
5-6 13-14 20-21 34-35 48-49 62-63 86-87 100-101 114-115 128-129 142-143 156-157 170-171 184-185 205-206 219-220 240-241 254-255 268-269 282-283 296-297 310-311 324-325
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
wavenumber range (cm-1) 4508-4233 4483-4108 4533-4208 4608-4183 4583-4183 4458-4133 4583-4183 4808-4208 4683-4233 4608-4133 4608-4233 4708-4233 4508-4108 4583-4208 4683-4183 4558-4233 4558-4233 4533-4108 4808-4233 4533-4108 4608-4158 4508-4108 4558-4208
PLS
SEC (mM)a
SEP (mM)b
13 14 14 13 13 14 14 14 14 14 14 14 12 14 14 14 14 13 14 14 13 14 14
0.12 0.09 0.09 0.15 0.04 0.07 0.09 0.19 0.13 0.11 0.07 0.16 0.11 0.09 0.11 0.10 0.10 0.09 0.13 0.08 0.18 0.10 0.09
0.55 0.55 0.50 0.60 0.67 0.50 0.65 0.91 0.70 0.54 0.43 0.95 0.71 0.56 0.63 0.76 0.76 0.74 0.90 0.48 0.87 0.71 0.94
a Calibration errors of the models based on synthetic data. b Prediction errors of the measured data obtained with calibration models based on synthetic spectra.
measured spectra for all prediction sets across both data sets A and B. Figure 6 displays a bar plot of the SEP values obtained when synthetic data generation and PLS were employed to predict glucose concentrations in each of the prediction sets in experiments A and B (striped bars). For comparison, the solid and open bars in the figure indicate the SEP values obtained when the optimized calibration models computed with the measured data from experiment A were used to predict glucose concentrations in the prediction sets for both experiments A and B. Solid and open bars again denote models based on measured single-beam and absorbance data, respectively. Panel A in Figure 6 displays the full range of the results, whereas panel B clips the y-axis at SEP ) 2.5 mM to allow the better results to be seen more clearly. For the models based on measured data, this plot reveals the same degradation of the results as observed previously in Figure 5A and evidenced by an increase in the SEP values as the time lapse between the calibration and prediction weeks widens. From prediction set 11 onward, the model based on single-beam data fails completely with an average SEP value of 32.3 mM. This failure is partly due to the effect of elapsed time, as the prediction sets in experiment B were collected between 128 and 325 days after the calibration set in experiment A was measured. This is an extremely long time, especially in a near-IR calibration. Another issue that is more prominent than the effect of time in this analysis is the change in instrumental configuration (i.e., from the globar source to the tungsten-halogen lamp). In this case, the calibration data are not representative of the prediction data. The model based on absorbance data performs better in this test, although errors exceed 1 mM for each prediction set past 14. The SEP reaches 1.85 mM at prediction set 23. Neither of the models based on measured calibration data is competitive with the models computed with synthetic spectra.
Figure 6. Prediction results across both experiments A and B. Solid, open, and striped bars correspond to calibration models based on single-beam spectra collected with the globar source (experiment A), absorbance data from experiment A, and synthetic spectra, respectively. Panel A displays the full range of the results, whereas panel B clips the y-axis at SEP ) 2.5 to allow the better results to be observed more clearly.
From these results, we conclude that the synthetic calibration approach clearly outperforms the measured calibration scheme and is unaffected by changes in the instrumental configuration. This stems from the fact that by constructing synthetic calibration data sets from the measured buffer spectra for each prediction day, both instrumental and environmental factors inherent to that prediction day are incorporated into the simulation. Effect of Changing Sample Matrix. The viability and robustness of this synthetic calibration approach was evaluated further by use of a set of eight samples that contained an extra component (proline). This analysis was oriented to diagnose the existence of an extra component by examining the prediction results of these eight samples. As mentioned previously, these samples were measured at the same time as prediction set 23, and hence, buffer spectra measured on that week were used in the generation of the corresponding synthetic calibration set. This corresponded to using the optimized PLS calibration model for prediction set 23 to predict glucose concentrations in the eight samples. The resultant SEP value for this operation was 38.0 mM. The models based on measured calibration data also failed in a similar manner. For example, the calibration model based on the single-beam data from experiment B produced an SEP value of 29.1 mM when applied to the samples that contained proline. An examination of concentration residuals revealed that samples containing proline had a large negative bias, indicating the presence of systematic error. Analytical Chemistry, Vol. 81, No. 3, February 1, 2009
1215
To numerically identify and diagnose the presence of samples with a different chemical composition during prediction, Hotelling’s T2 statistic was employed.21 The T2 statistic in this implementation is the sum of normalized squared scores for each PLS factor. For this case, the number of PLS factors was held constant with respect to the optimized PLS calibration model corresponding to the synthetic data set used with prediction set 23. Samples corresponding to prediction set 23 and the eight samples containing proline were added sequentially, one sample at a time. After predicting each sample, the corresponding T2 value was computed and divided by the T2 limit at the 95% confidence interval, resulting in a parameter known as the T2 ratio. A plot of the T2 ratio with respect to the sample number revealed clear evidence that samples with proline were significantly different from the synthetic calibration and measured spectra in prediction set 23. Hence, the synthetic calibration set was not representative of the prediction samples containing proline, thereby resulting in large SEP values. The performance of the simulation approach was evaluated further with the inclusion of proline into the synthetic data generation algorithm. This meant inputting the absorptivity and water displacement factor of proline into the algorithm and ultimately constructing a seven-component synthetic calibration data set composed of glucose, lactate, urea, ascorbate, alanine, triacetin, and proline. An optimized PLS calibration model was evaluated by predicting glucose concentrations in the eight samples. This produced an acceptable SEP value of 0.92 mM. CONCLUSIONS Results in this study demonstrate the success of using the synthetic data generation algorithm employing buffer spectra to compute an up-to-date calibration data set and corresponding PLS calibration model. The key to the success of this approach is the incorporation into the calibration model of updated information pertaining to sources of nonchemical variation specific to the prediction day. By using measured buffer spectra for each data (21) Stork, C. L.; Kowalski, B. R. Chemom. Intell. Lab. Syst. 1999, 48, 151–166.
1216
Analytical Chemistry, Vol. 81, No. 3, February 1, 2009
collection session as the basis for generating the synthetic calibration data sets, this algorithm provides a simple and efficient strategy to incorporate nonchemical spectral variation into the calibration model. This approach outperformed conventional calibration models based on either single-beam or absorbance data. An added advantage of this protocol is its invariance to changes in the instrumental configuration. In the work described here, by utilizing buffer spectra collected on the prediction day in the synthetic data generation, the change from the globar (experiment A) to the tungsten-halogen source (experiment B) had no effect on the prediction results. By contrast, when the conventional calibration models built from the measured data in experiment A were applied to spectra from experiment B, significant degradation in performance was observed. The model based on single-beam data failed completely, with SEP values exceeding 30 mM. Attempts to predict glucose concentrations in samples containing an extra component were not successful with calibration models based on either measured or synthetic data. However, employing the Hotelling T2 statistic as an outlier testing scheme, the disparity between sample spectra composed of six and seven components was successfully identified. By incorporating information pertaining to proline into the simulation and subsequently generating an optimized calibration model, the SEP was reduced from 38.0 to 0.92 mM. A clear limitation of the synthetic calibration approach is the requirement that the sample matrix be well enough characterized such that realistic synthetic spectra can be computed. Although this limitation places some restrictions on the application of the proposed methodology, there are many situations in which repeated quantitative analyses are applied to well-characterized samples. Process monitoring or quality control measurements in industry may be particularly well-suited to this methodology.
Received for review August 19, 2008. Accepted November 25, 2008. AC801746N