Near-infrared spectroscopic measurement of glucose in a protein matrix

Genetic Algorithm-Based Method for Selecting Wavelengths and Model Size for Use with Partial Least-Squares Regression: Application to Near-Infrared ...
0 downloads 0 Views 896KB Size
AMI. C b m . 1993, 65, 3271-3278

3271

Near-Infrared Spectroscopic Measurement of Glucose in a Protein Matrix Lois A. Marquardt and Mark A. Arnold' Department of Chemistry, University of Iowa, Iowa City, Iowa 52242

Gary W.Small Center for Intelligent Chemical Instrumentation, Department of Chemistry, Clippinger Laboratories, Ohio University, Athens, Ohio 45701

A method is described for measuring clinically relevant levels of glucose in a protein matrix by near-infrared (near-IR) absorption spectroscopy. Results from an initial screening of major blood constituents identify protein as a major potential interference to the near-IR measurement of glucose in blood. The interference by protein is caused by relatively high concentrations coupled with strong near-IR absorption bands between 5000 and 4000 cm-l(2.0-2.5 pm). Calibration models based on a simple univariate calibration procedure are not capable of providing accurate glucose concentrations from an independent set of prediction spectra. By use of the multivariate technique of partial least squares (PLS) regression, glucose concentrations can be determined with a 0.35 mM (6.3 mg/dL) standard error of prediction. The spectral range for this calibration model extends from 4600 to 4200 cm-l, and the optimum number of PLS factors is 14. In addition, calibration models based on a combination of digital Fourier filtering and PLS regression have been constructed and evaluated. Superior calibration models are obtained by using a preprocessing digital filtering step to remove spectral features not associated with glucose. The best overall calibration model was obtained by using a Gaussian-shaped Fourier filter defined by a mean position of 0.03f and standard deviation of 0.007f coupled with a 12-factorPLS regression computed over the spectral range from 4600 to 4200cm-l. This model provided a standard error of prediction of 0.24 mM (4.3 mg/dL) for an independent set of prediction spectra. The effects of spectral range and number of PLS factors are considered for calibration models based on both PLS alone and PLS coupled with Fourier filtering. For both calibration methods, the 4400-cm-l (2.27-pm) glucose absorption band appears to provide most of the reliable glucose concentration information. INTRODUCTION The tremendous potential of direct, noninvasive clinical analysis has stimulated considerableinterest in assessing the utility of using near-infrared (near-IR) spectroscopy to memure blood glucose noninvasive1y.l The concept is to pass (1)Amato, I. Science 1992, 258, 892-893. 0003-2700/93/0365-3271$04.00/0

a selected band of near-IR radiation through a vascular region of the body and then to determine the blood glucose level from the resulting transmission spectrum. The key is to measure glucose accurately in the presence of a widely changing and complex biological matrix solely from spectral information. Ultimately, the success of this approach will be governed by the quality of spectra and the effectiveness with which the analytical information pertaining to glucose can be discriminatedfrom both noise and spectral features associated with other matrix components. Several research groups have recently reported moderate success in using near-IRspectroscopyto measure blood glucose directly in human subjects.2J Their basic approach has been to collect near-IR transmission spectra through the finger tips of individual subjects while simultaneously collecting blood samples for subsequent glucose determination. Generally, spectra and blood samples are collected duringa routine glucose tolerance test in order to obtain a wide range of blood glucose values. Multivariate calibration procedures, such as partial least squares (PLS) regression and principal components regression (PCR), are used to correlate variations in the spectra with the corresponding glucose concentrations. Other spectral-processing steps, particularly first and second derivatives, are sometimes used to enhance the signal-tonoise ratio by removing unwanted spectral features such as baseline drift. Haaland and co-workershave published results that reveal a strong correlation between features within these near-IR spectra and in vivo glucose levels.2.4 Others have claimed similar S U C C ~ Sbut S , ~insufficient ~~ data have been presented to substantiate these claims. Unfortunately, none of these calibration models appears capable of actually predicting in vivo glucose concentrations from independent spectra. Our approach is to establish the utility of near-IR spectroscopy to measure glucose in complex biological matrices by systematically increasing the matrix complexity while critically evaluating the soundness of the measurement at each step. In this way, the critical chemical and physical parameters that affect the measurement can be identified and steps to overcome potential problems can be developed in a rational manner. This approach is particularly amenable to assessing the functional limitations of a given method. In fact, a key feature of our experimental protocol is the generation of an independent data set that is used to evaluate (2)Robinson, M.R.; Eaton, R. P.; Haaland, D. M.; Koepp, G. W.; Thomas, E. V.; Stallard, B. R.; Robinson, P. L. Clin. Chem. 1992, 38, 1618-1622.

(3)Rosenthal,R.D.;Paynter,L.N.;Mackie,L.H.U.S.Patent5,086,229, February 4,1992. (4)Haaland, D. M.; Robinson, M. R.; Koepp, G. W.; Thomas, E. V.; Eaton, R. P. Appl. Spectrosc. 1992,46, 1575-1578. (5)Barnes, R.H.; Brasch, J. W. U.S.Patent 6,070,874,December 10, 1991. 0 1993 American Chemlcai Society

3272

ANALYTICAL CHEMISTRY, VOL. 65, NO. 22, NOVEMBER 15, 1993

the ability of a potential calibration model to predict glucose concentrations. Our work started by establishing the feasibility of measuring clinically relevant levels of glucose in an aqueous matrix.6 A procedure was developed that allowed the measurement of glucose in the 1-20 mM (18-360 mg/dL) concentration range ina0.l Mphosphate buffer atpH7.2. Aunivariatecalibration model was used based on the integrated area under a glucose absorption band centered at 4400 cm-1(2.27 pm). A key step in the spectral processing scheme used a digital Fourier filter to enhance the analytical information by greatly suppressing spectral noise and baseline variations. The resulting calibration model demonstrated the ability to predict glucose concentrations from independent spectra with a maximum prediction error of 0.3 mM and a mean percent error of 2.5%. In addition, we have investigated the effect of temperature variation on the accuracy of near-IR-based glucose measurements in an aqueous matrix.' Changes in temperature cause large baseline variations because of temperature-dependent shifts in the positions of two strong water absorption bands. Temperature effects have been examined over the range from 32 to 41 "C. Different sample temperatures result in baseline variations that are typically orders of magnitude greater than the absorbance from clinical levels of glucose. A spectralprocessing schemebased on the combination of digital filtering and multivariate calibration methods effectively eliminates the adverse effects caused by such temperature variations. A series of two papers extends our approach from a relatively simple water matrix to undiluted bovine plasma. This first paper details our success in measuring glucose in a buffered-protein matrix. Results are presented that identify protein as a major potential interference. Digital Fourier filtering is combined with PLS to provide a calibration model that permits accurate measurements of glucose at clinically relevant concentrations. The second papel$ describes a singlecalibration model capable of measuring glucose in samples prepared from three unique undiluted plasma matrices. In addition, the protocols used to couple digital filtering techniques with the PLS multivariate calibration method are detailed in this second paper.

EXPERIMENTAL SECTION Apparatus and Reagents. All spectra were collected with a Nicolet 740 spectrometer configured for the near infrared. For the initial screeningexperiment,the spectrometerwas equipped with a 75-W tungsten-halogen lamp, quartz beam splitter, and room temperature lead selenide detector. For spectra of glucose solutions in a buffered-protein matrix, the spectrometer was configured with the same 75-W lamp, a calcium fluoride beam splitter, and a cryogenicallycooled indium antimonide detector. In both cases,the aperture was fully open and a narrow bandpass interferencefilterwas placed in the optical path beforethe sample. This filter nominally passed light from 5000 to 4000 cm-l(2.0 to 2.5 pm) with a peak transmission of ca. 70%. Samples were placed in a 1-mmpath-length rectangular cell composed of Infrasil quartz. This cell was positioned in a water-heated aluminum block cell holder in conjunction with a VWR Model 1140 refrigerated temperature bath. The sample temperature was measured by placing a copper-constantan thermocouple probe directly in the sample solution with temperature readings obtained from an Omega Model 670 digital meter. Dried, COzfree air was provided by a Balston Model 75-60air-dryingsystem. All solutions were prepared with appropriately dried, reagent grade materials obtained from common suppliers. The antimicrobial agent 5-fluorouracil,bovine serum albumin (BSA,fraction (6) Arnold, M. A.; Small,G. W . Anal. Chem. 1990,62, 1457-1464. (7) Hazen,K. H.; Amold,M. A.;Small,G. W., submittedforpublication in Appl. Spectrosc. (8)Small,G.W.; Amold,M. A.;Marquardt,L. A. Anal. Chem.,following paper in this issue.

V), urea, glycine, and kanamycin were purchased from Sigma Chemical Co. (St. Louis, MO). Distilled-deionized water with a minimal resistance of 15MQwas used to prepare all solutions. This water was freshlygenerated by passing the building distilled water through a Milli-Q three-stage water purification unit. Procedures. Data Collection. Single-beam spectra were collectedas double-sidedinterferogramswith 16 384 points based on 256 coadded scans. Interferogramswere triangularly apodized and Fourier transformed to produce single-beam spectra with 1.9-cm-1point spacing. Mertz phase correction was applied to the spectra with the phase array used based on 200 points on each side of the interferogram center burst. The resulting singlebeam spectra were transferred for processing to either the Prime 9955or Vax 6400 computer system located at the Gerard P. Weeg Computing Center at the University of Iowa. All software was implementedin FORTRAN 77. Subroutinesfor Fourier filtering and multiple linear regression computationswere obtained from the IMSL software package (IMSL, Inc., Houston, TX). Screening Experiment. Individualsolutionswere prepared for each test compound by dissolving a suitable amount of the test compound in a 50-mL aliquot of a pH 7.2,O.l M phosphate buffer that contained 8 mM glucose. The following concentrations were used for the corresponding test compound 1.0 mM glycine, 0.2 mM ascorbic acid, 0.6 mM uric acid, 6.5 mM urea, 0.08 mM kanamycin, and 70 g/L BSA. In addition, a series of glucose standards was also prepared in this buffer solution. Single-beam spectra were collected and processed for each test solution. Initially, samples were incubated in the sample holder until a constant temperature of 20.0 0.2 OC was reached. Spectra were then collected and processed according to our previously reported univariate calibration procedure.6 Briefly, each single-beamspectrum was normalized over the range from 4700 to 4600 cm-l followed by generation of absorbance spectra by ratioing to a background spectrum collected with plain phosphate buffer. Each absorbance spectrum was fiitered by a Gaussian-shaped Fourier fiter defied by a mean of O.O23ff(digital frequency unite) and a standard deviation of 0.005f. After a simple two-point baseline adjustment, the previously described dynamic area calculation6 was used to compute the integrated area under the resulting absorbance band centered at 4400 cm-l. Although the exact frequencies of the integration varied slightly accordingto the dynamicarea calculation,Bthe areawas essentially computed over the wavenumber range from 4369 to 4445 cm-l. Buffered-Protein Matrix Spectra. Samplesolutionswere prepared by dilutinga 1.0024 M glucose standard witha bufferedprotein solution. The buffered-protein solution was composed of 60.8 g/L BSA, 0.1 M sodium phosphate, and 0.04% 54110rouracil adjusted to pH 7.4. Glucose solutions were prepared so that the final protein concentrationwas constant acrosssamples. Fourteen individual glucose solutions were prepared spanning the concentration range from 1.2 to 20.0 mM. A spectral data set consisting of 97 spectra of the 14 glucose standard solutionswas collectedover a period of 4 days. Bufferedprotein background spectra were collected at the beginning of each day and then after every third standard. Three singlebeam spectra were collected for each aliquot of a given solution, and multiple aliquota were run in random order for each sample. Spectra were collected after the solution had stabilized at 37.0 f 0.2 O C . The original 14 glucose solutions in the data set (97 spectra) were then separated into individual calibration and prediction data sets. The calibration set was composed of 67 spectra representing the following concentrations: 1.2, 3.2,5.2, 6.0,8.0,12.0,14.0,18.0,and20.0mM. Theremaining34spectra, corresponding to concentrations of 2.0, 4.0, 7.2, 10.0, and 16.0 mM, were placed into the prediction set. These spectra were analyzed by both univariate and multivariate calibration procedures coupled with digital Fourier filtering. The univariate procedure was the same as that mentioned above, and the multivariate procedure is detailed below.

*

RESULTS AND DISCUSSION Screening Experiment. A simple screening experiment was used to identify the major blood constituents that most significantly interfere with our near-IR method for measuring glucose in an aqueous matrix.s This experiment involved

ANAL.YTICAL CHEMISTRY, VOL. 65, NO. 22, NOVEMBER 15, iQQ3 S27S ~~~

Table I. Interfemnce from Selected Blood Constituents compound integrated area0 no. of observ mean % error glucose alone 5.06 0.33 15 glycine 4.30 0.44 9 15.0 urea 5.56 0.34 9 9.9 uric acid 4.94 0.18 6 2.4

** ** 4.95 * 0.26 5.46 * 0.45

ascorbic acid kanamycin a

(Mean

6 6

2.2 7.9

c .r

4

2C

standard deviation) X 10oO

I

I

10

e,

08

C

m

f

0 6 4 D

15

: u

. 0

02

0

0 0 4800

4600

4400

4200

Wavenumber, (cm-')

Flgure 1. Near-IR absorbance spectra for aqueous solutionsof bovlne (66.5 g/L) and glucose (10 mM). In both cases, the optical path length is 1 mm and the matrix isa pH 7.4 phosphate buffer. 8" albumin

comparing integrated areas obtained from solutions that contained 8 mM glucose alone and 8 mM glucose with the specific test compound. The concentration used for each test compound corresponds to the upper end of the normal range for this compound in human serum.9 The instrument configuration used in this experiment was limited in terms of optical throughput, which restricted the temperature to 20 O C and resulted in relative standard deviationsfor measured integrated absorbances approaching 10?6 Subsequent instrumental configurations provided higher optical throughput, which permitted measurements at 37 "C and provided significantly better reproducibility. Nevertheless, these experimental results clearly identify protein as the foremost interference of the compounds tested. Integrated areas in the presence of protein were 1 order of magnitude larger than areas without protein. In fact, differences caused by protein were so much larger in this experiment, the results could only be taken semiquantitatively. Results from the other tested compounds were more reliable. Mean areas are summarized in Table I for all compounds tested except BSA. A t-test analysis a t the 96% confidencelevel indicates no significantdifference caused by either uric acid or ascorbic acid. A slight positive interference is indicated for both urea and kanamycin with relative errors of 10 and 875, respectively. Glycine causes a significant negative interference with a relative error of -15%. Buffered-Protein Matrix. Our subsequent efforts focused on protein because of the overwhelming interference indicated in the screening experiment. Protein interference is likely caused by severalrelativelystrong absorptionfeatures between 5000 and 4000 cm-1. Absorbance spectra for both BSA and glucose are presented in Figure 1 over this wavenumber range. The strong protein absorption band centered at 4370 cm-1 (2.29pm) overlaps significantly with

.

(9) Peace, A. J.; Kaplan, L.A. Methods in Clinical Chemistry; C . V. Moeby: St. Louie, MO,1987.

5

10

20

15

Actual Glucose Concentration, (mM)

Flgurr 2. Unlvarlate calibrationmodel for g l u m in a buffered-protein matrix showing both calibration data (open circles) and predictiondata (closed circles).

the 4400-cm-1 band of glucose, thereby increasing the measured integrated area. A more detailed experiment confirmed the severe interference by BSA and the inability of a univariate calibration procedure to compensate for protein effects. The univariate calibration procedure developed for glucose in waters was attempted in a buffered-protein matrix where the protein level was constant across samples. The resulting calibration model relating glucose concentration to the integrated area was linear with a slope of 0.441 0.026 mAlmM, a y-intercept of 1.74 f 0.94 mA, and an R2 of 0.9144. A corresponding correlation plot for both calibration and prediction data sets is presented in Figure 2. This figure shows the correlation between the concentration of glucose estimated from the calibration model and the actud glucose concentration. The unity line is provided for comparison. The open circles represent data used to generate the regression parameters for the calibration model, and the closed circles represent data used to assess the prediction ability of the model. The correlation line for the prediction data set is defined by a slope of 1.103 f 0.039 and a y-intercept of -1.6 1.4. The R2 is 0.9548and the calculated standard error of prediction (SEP) is 1.3 mM across the entire concentration range. The large scatter at low concentrationsindicatesa limitof detection above the clinicallyrelevant range. Overall,the considerable scatter about the calibration line limits the utility of this univariate model. PLS regression has been investigated as a means to enhance the selectivity for glucose. In this investigation, the protein concentration was constant across samples and the ability of the PLS algorithm to extract glucose information in the presence of the strong protein absorbance features was evaluated. In addition, the utility of combining PLS with digital Fourier filtering has been evaluated. In the case when no digital filteringis used, the most critical design parameters for an optimal calibration model are the number of PLS factore and the spectral range. With Fourier fitering and PLS combined, the key parameters included the number of PLS factors and the spectral range, as well as the position and width of the Gaussian-shaped fiter response function.

*

*

3274

ANALYTICAL CHEMISTRY, VOL. 65, NO. 22, NOVEMBER 15, 1993 7

I

B

A 6 h

E

E

v

d 5 .-U 0 .3

a

t 4

a c 0

k

3

L

w

a 2

a

2

c

a

42

r n

0

5

1

0 10

15

20

0

Number of PLS F a c t o r s

5

10

15

20

Number of PLS F a c t o r s

Flguro 3. Effectof number of PLS factors on the standard errors of calibration (A) and predlctlon (B) for the following spectral ranges: 5000-4002 (circles),4900-4200 (down triangles), 4600-4200 (squares), 4500-4200 (up triangles), and 4500-4300 cm-I (diamonds).

PLS Alone. Seven spectral regions have been examined initially with the PLS algorithm alone. Analysis of absorbance spectra collected over the range from 5000to4000 cm-1 reveals three predominant glucose absorption bands. A large band centered at 4700 cm-1 and two smaller bands centered at 4400 and 4300 cm-1 are clearly apparent in the sample spectrum provided in Figure 1. The noise level increases dramatically beyond 4900 and 4200 cm-1 because the interference filter and water absorption limit transmissionat these frequencies, thereby greatly lowering the amount of light present. The first spectral range used (5000-4002 cm-9 represents the entire frequency range collected. The second range (4900-4200cm-1) eliminates noisy regions on both ends while retaining all three glucose bands. The third range (46004200 cm-1) cuts into the 4700-cm-1 band while retaining both the 4400- and 4300-cm-l bands. The fourth range (45004200 cm-1) completely eliminates the 4700-cm-1 band while retaining both the lower frequency bands. The fifth range (4600-4300 cm-1) isolates the middle absorption band, which was found to be effectivein the univariate calibration method.' The last two ranges (4480-4330and 4450-4350 cm-9 cut into this glucose band. The optimum number of PLS factors has been determined for each spectral range by identifying the minimum SEP for each individual calibration modeL The quality of a calibration has been judged by the ability to predict accurately from independent spectra. As expected, a minimum SEP was obtained as the number of PLS factors was increased. The resulting data are summarized in Figure 3 with standard errors plotted as a function of the number of PLS factors used to generatethe calibration model. For all spectral rangestested, the standard error of calibration (SEC) decreases continuously as the number of PLS factors increases (see Figure 3A). Lower SECsare expectedas more of the variation within the spectral data set is accounted for by additional factors. Initially, the SEP also decreases with more PLS factors (see Figure 3B), which indicates that the variation attributed to these factors is associated in some way with glucose concentration. The SEP increases at the point where the system is over-modeled and additional factors no longer provide information about glucose,but about noise in the calibration data set. The result

Table 11. Results from the Best Calibration Models Obtained with PLS Alone spectral optimumno. SEC SEP mean % mean range (cm-1) of factors (mM) (mM) error % rec 5ooO4002 4900-4200 4600-4200 4500-4200 4500-4300 Me4330 4450-4360

16 15 14 14 9 9 7

0.423 0.127 0.106 0.106 0.313 0.342 0.732

1.38 0.515 0.386 0.421 0.417 0.407 0.673

24.1 8.52 5.57 6.19 7.20 1.25 9.00

105.7 94.75 98.29 97.44 100.65 100.15 103.40

is a minimum in the SEP which can be used to identify the optimum number of factors. The salient features of the best calibration model for each spectral range are summarized in Table 11. The worst of these calibration models was obtained by using the 50004002-cm-l spectral range with 16 PLS factors. The widest spectral range resulted in a calibration model with the highest prediction error, the largest mean percent error, and the poorest mean percent recoveries compared to allother spectral ranges tested. The correspondingconcentration correlation plot is provided in Figure 4A. The large scatter of points about the unity line, particularly at concentrations below 10 mM, indicates the poor utility of this model. In addition, the distribution of the prediction points is noticeably wider compared to that for the calibration points, which indicates the inability of this model to predict glucose concentrations accurately. Noise at the extremesof the 5O004002-cm-1range adversely affecte the ability of the PLS algorithm to extract the glucose information. Similar results were found when attempts were made to measure glucose with changing temperatures' and in undiluted plasma matrices.8 Improved calibration models are obtained by narrowing the spectral range to eliminate noisy regions that do not contain glucose information. The SEPs for each of the narrower spectralranges are sienificantlybetter than the value of 1.38 mM obtained for the 5000-4002-cm-1 range. The lowest SEP is 0.35 mM for the model consisting of 14 PLS factors with the 4600-4200-~m-~ range. The 4900-4200-cm-1 range required one additional factor to achieve a SEP of 0.52

ANALYTICAL CHEMISTRY, VOL. 65, NO. 22, NOVEMBER 15, 1993

3275

A

I;d )I/( 2o

4 e,

z

4

.A

.3

v)

w

' 1

i

+ m

4

w

0

5

10

15

20

Actual Glucose Concentration, (mM)

0

5

10

15

20

Actual Glucose Concentration, ( m M )

Flgurr 4. Calibration models for PLS alone for the spectral range from 5000 to 4002 cm-' with 16 PLS factors (A) and the spectral range from 4600 to 4200 cm-l with 14 factors (B). Circles and diamonds correspond to the callbratlon and prediction data points, respectively. Inset In plot B shows the concentratlondependent percent error In the glucose measurement for this model.

mM, whereas the lowest SEP for the 4500-4200-~m-~ range was 0.42 mM with 14 factors. Although the narrower spectral ranges provided superior SEPs with fewer factors compared to the entire 50004002-cm-l range, relatively small differences in SEP and number of factors were found for this group of narrower spectral ranges. Indeed, Figure 3B shows that SEPs approached the same value for each of the narrower spectral ranges. These data suggest that the 4400-cm-1 glucose band provides the most reliable information. The 4500-4300- and 4480-4330-cm-l ranges essentially isolate this absorption band, and the corresponding calibrationmodels provide nearly the same SEPs with fewer factors compared to the wider 4600-4200-~m-~ range. As is revealed in the values presented in Table 11, the quality of the calibration model drops significantly as the spectral range is narrowed further (i.e., 4450-4350 cm-1) and begins to cut into the 4400-cm-1 glucose band. The best calibration model, as judged by the lowest SEP, was obtained with a spectral range from 4600 to 4200 cm-l with 14 PLS factors. The concentration correlation plot in Figure 4B presents the ability of this model to predict glucose concentrations. Compared to the results shown in Figure 4A, the 4600-4200-cm-l model clearly demonstrates tighter clustering of both calibration and prediction data. The distribution of the prediction data is still wider than that for the calibration data, albeit only slightly. Nevertheless, the superiority of the 4600-4200-cm-1 model is clearly evident. The inset in Figure 4B shows the distribution of the percent error as a function of glucose concentration. Because the absolute error is independent of concentration, larger percent errors are observed at lower concentrations as indicated in the inset. The mean percent error is 8.2% at 2 mM, 2.8% at 10 mM, and 1.5% at 16 mM. Fourier Filtering and PLS. The specific features of the Fourier filtering process used in this work are detailed in the next paper.8 Basically, a Gaussian-shaped filter response function is used in which the mean position and width, or standard deviation of the Gaussian, are the primary design parameters. The strategy used to identify the ideal position

and width of the Gaussian-shapedfilter is also detailed later.8 Briefly, the overall set of spectra is split into a calibration set and two independent prediction seta (sets A and B). The calibration set and prediction set A are used to identify the ideal combination of mean and standard deviation for the Gaussian in terms of minimum values for the SEC and SEP. A response surface is constructed by generating calibration models from all combinations of means and standard deviations over a specified range. The response function equally weights the ability of the model to provide a suitable calibration and to predict l/(mean square error + mean square prediction error). Prediction set B is used to assess the prediction accuracy of the ultimate calibration model. It is important to emphasize that spectra within prediction set B are never involved in generating the calibration parameters. The quality of the calibration model is strongly influenced by both the number of PLS factors and the Fourier filter parameters. A key question centers around the interrelationship between the ideal filter parameters and the number of factors used in the PLS calibration model. A series of response surfaces was generated in order to addressthiscritical point. In this experiment, the number of factors was incremented for each response surface and the surface morphology was examined in order to ascertain the effect of number of factors on the ideal mean and standard deviation for the Fourier filter. This experiment was performed for both the 4600-4200- and 4500-4200-~m-~ spectral ranges. The number of PLS factors has a significant effect on the morphology of the resulting response surface. Two representative surfaces are presented in Figure 5. Plots of A and B of Figure 5 show different angles for the surface generated with five PLS factors. Plots C and D show the same angles for the surface obtained with 14 PLS factors. In both cases, the spectral range was from 4600 to 4200 cm-l. Inspection of both surfaces reveals a relatively simple surface when few PLS factors are used. Such surfaces are characterized by a single predominant peak with coordinates correspondingto a mean position of 0.03f and a standard deviation of 0.007f. Although the exact coordinates for the optimum response varied slightly for different numbers of factors, there was no

3276

ANALYTICAL CHEMISTRY, VOL. 65, NO. 22, NOVEMBER 15, 1993

-

".O1

0.00

L-

h

W

a K n

.05

5 Figure 5. Response surface maps for the 4800-420O-~m-~spectral range wlth 5 PLS factors (A and B) and 14 PLS factors (C and D). Plots A and C show front views whlle plots B and D show sMe vlews.

dramatic variation when between two and nine factors were used. As the number of factors increased, however, the region behind the 0.03-0.007f peak increased in magnitude. After nine factors, the largest response function values were obtained at numerous combinations of means and standard deviations as the region behind the 0.03-0.007f peak continued to grow. Plots C and D of Figure 5 clearly illustrate the dramatic increase in this region behind the 0.03-0.007f peak. Although the 0.03-0.007f peak is still evident, it is nearly overshadowed by the magnitude of the background region. The background completely obscures the 0.03-0.007f peak when more than 15 factors are used. Several important points are noteworthy. First, the similarity between the optimal filter parameters for a plain water matrix (0.023 and 0.005f),6 where the 4400-cm-1 band is used exclusively, and this buffered-protein matrix (0.03 and 0.007f) suggests that the 4400-cm-l band is the major spectral feature in this analysis. In addition, Plots A and B of Figure 5 reveal a thin ridge-shaped surface with a narrow range of standard deviations and a relatively broad range of mean positions. This finding implies that the width of the Gaussian-shaped filter is critical for successful results. Finally, Plots B and D show that small standard deviations are useless even with many factors, which is understandable considering narrower filters pass less information.

Table 111. Results from Calibration Model with Combined Fourier Filtering and PLS*

spectral range

(cm-9 5000-4002 4900-4200 46OU-4200 4500-4200

no.of SEC SEP(mM) factors (mM) A B

20 12 12 12

0.279 0.249 0.298 0.343

0.454 0.386 0.299 0.296

0.463 0.326 0.243 0.296

mean% error A B 7.55 6.76 5.93 6.12

7.81 7.05 4.95 5.68

mean% rec A

B

95.93 96.78 101.87 102.67

97.99 95.70 99.40 97.90

Filter parameters: mean, 0.03f;standard deviation, 0.007f. An experiment was performed to establish the analytical utility of using a digital Fourier filtering step prior to the PLS algorithm. In this experiment, calibration models were generated with different numbers of PLS factors for each of the four major spectral regions. The Fourier fiiter was the same in each case and this fiiter was defined by a mean position of O.03f and standard deviation of O.Oo7f. The salient features for the best calibration model obtained for each spectral range are tabulated in Table 111. Statistics for both prediction sets A and B are listed individually. Compared to PLS alone, the combination of Fourier filtering and PLS analysis reduces the number of PLS factors

ANALYTICAL CHEMISTRY, VOL. 65, NO. 22, NOVEMBER 15, 1993 7

I

2o

0

Number of PLS Factors

6

I5

10

t

P Z O , ,

90

-I E 5

-

0

'

4500

4200 cm-l

-

i

-

10 -

4200 cm-l

, *

..

5 - '

6

4

4

3

3

z

2

0

Y

E

5

10

15

20

Glucose cone., (mM)

Q

j

,

,

15 -

v

i

I

Number of PLS Factors

J-T-l-7 r - 7 4600

P

I

I

3277

0

5

10

15

20

Actual Glucose C o n c e n t r a t i o n , (mM)

v)

1

1

0

0 0

6

10

16

Number of PLS Factors

20

0

6

IO

I5

20

Number of PLS Factora

Figvr 6. Standard errors of calibration (ckcles)andpredictkn (squares for prediction set A and triangles for prediction set E) for models as a function of number of PLS factors used after Fowier fittering the spectra.

needed while improving the quality of the calibration model. The standard errors of calibration and prediction for each spectral range are plotted in Figure 6 as a function of the number of PLS factors used in the model. As was found with PLS alone, the 5000-4002-~m-~ range resulted in higher standard errors overall compared to the narrower ranges. For all other ranges, the SECs and SEPs drop rather steeply and reach limiting values after only a few factors. Although the overall minimum SEP (0.24 mM) was obtained from the 46W4200-cm-1 range with 12 factors, only slight improvements in SEC and SEP values were realized after the first 5 factors for each of these spectral ranges. The shape of these curves is much steeper with filtering compared to analogous curveswith PLS alone (see Figure 3). Standard errorebetween 5 and 6 mM are obtained for both methods when only one factor is used. With five factors, however, the Fourier filtering/PLS method results in standard errors near 0.5 mM compared to values between 2 and 3 mM with PLS alone. Compared to PLS alone, the quality of the calibration model is relatively insensitive to spectral range. The values summarized in Table 111and those plotted in Figure 6 indicate that each of the narrow ranges provides essentially the same calibration statistics with a similar number of factors. The best calibration model, as judged by the lowest SEP for prediction set B, was obtained by using the 4600-4200-cm-~ spectral range with 12 factors. The corresponding concentration correlation plot is provided in Figure 7. This calibration model is characterized by a SEC of 0.30 mM and SEPs of 0.30 and 0.24 mM for prediction sets A and B, respectively. Distributions for the prediction points closely match those for the calibration points as expected for a valid analytical model. The lower SEP for this model compared to the analogous model with PLS alone (see Figure 4B) is demonstrated by the tighter clustering of the prediction points. As with PLS alone, the absolute prediction error is independentof glucose concentration,which results in a strong

Flgurr 7. Concentration correlation plot for the callbration model (lenerated by using Fourier filtered spectra over the 4600-4200cm-' Spectral range and 12 PLS factors. The plot shows calibration points (clrcles),predictlon set A points (up triangles), and prediction set B points (down triangles). Inset shows the ~ncentrationde~ndent percent error In the glucose measurement for this model.

concentration-dependent percent error of prediction. The inset in Figure 7 shows how the percent error varies with glucose concentration. The mean percent errors at 2,10, and 16 mM are 16.9, 2.21, and 1.26 %, respectively.

CONCLUSIONS The results presented here demonstrate the feasibility of measuring clinically relevant concentrations of glucose in a buffered-protein matrix. Although protein strongly absorbs in this region of the spectrum, both PLS alone and PLS combined with Fourier filtering can effectively discriminate glucose in the presence of a high level of protein. The combinationof Fourier filteringand PLS regression, however, is superior to using PLS alone. The Fourier filtering-PLS system provides slightlybetter calibration modele with a fewer number of PLS factors. The filtering step presumably removes noise and other non-glucose-dependent sources of variation, thereby reducing the number of factors required to account for such variations. Several results indicate that the 4400-cm-1 spectral band of glucose provides a majority of the glucose concentration information. The optimum mean and standard deviation values found here for the Gaussian-shaped Fourier fiiter are similar to those found in an earlier investigation that focused solely on this 4400-cm-' band.6 In addition, improved calibration models are obtained with PLS alone by narrowing the spectral range to focus essentially on this single band. A large number of PLS factors is required compared to the number of chemical components in these sample solutions. Indeed, 14 and 12 factors were found to be ideal for PLS alone and for PLS with the Fourier filtering step. Although these number of factors provide the minimum SEPs, only slight improvements in SEP are realized after the f i s t few factors. For PLS with Fourier filtering, in particular, the calibration quality is essentially the same after the first five factors. Although rigorous interpretation of loading vectors for each factor is difficult and inconclusive, we spectulate

3278

ANALYTICAL CHEMISTRY, VOL. 65, NO. 22, NOVEMBER 15, 1993

that the latter factors account for minor baseline variations and nonlinearity in the calibration model. The use of such a large number of factors can be justified as long as these additional factors improve prediction from a set of independent spectra, such as prediction set B. Although slight improvements may be possible by continuing to fine tune the spectral range, number of PLSfactors, and Gaussian filter parameters, no major improvements are likely to be realized. The next major question concerns the effect of a changing matrix on the accuracy of these glucose measurements. Changes in the concentration of strongly absorbing matrix components,such as protein, will c a w large variations in the background spectrum, and the spectral processing must be able to compensate for such variations.

The next paper deals with background variations by constructing a calibration model that accurately predicts clinically relevant levels of glucose in a seriesof undiluted bovine plasma matrices.

ACKNOWLEDGMENT This work was supported by grants from the National Institutes of Health (RR04583 and DK45126). RECEIVED for review June 2, 1993. Accepted August 20, 1993.' 0

Abstract published in Aduance ACS Abstracts, October 1, 1993.