In the Laboratory
An Advanced Undergraduate Chemistry Laboratory Experiment Exploring NIR Spectroscopy and Chemometrics
W
Randall Wanke* Department of Chemistry, Augustana College, Rock Island, IL 61201-2296; *
[email protected] Jennifer Stauffer Hopkins Science Department, 986 Forest Road, New Haven, CT 06515
The coupling of NIR spectroscopy and chemometrics has become commonplace for the rapid analysis of ingredients in agricultural, pharmaceutical, polymer, petroleum, and food products and for their process control (1). Undoubtedly, any chemist embarking on a career in these industries will run into a NIR–chemometrics “smart analyzer” specifically designed to assess product quality or to monitor and control process conditions. The purpose of this chemistry lab experiment is to introduce students to the key concepts and considerations underlying the NIR–chemometrics measurement. In this experiment, NIR spectroscopy is utilized to measure the composition of toluene, heptane, and isooctane (tol兾hep兾iso) solutions using a student-generated chemometrics calibration model (2). Different experimental exercises are described not only to show the precision and accuracy of NIR spectroscopy–chemometrics but also its limitations and how to discern when a chemometrics calibration model is inappropriate for a given analysis. Molecular absorbances (Figure 1) in NIR spectroscopy (4000–12,500 cm-1; 2500–800 nm) are combination bands and overtones of fundamental absorbance bands in the midIR and are thus very weak (3). The weak NIR absorbances are well-suited to chemometrics because they stay within the linear range of the Beer–Lambert law when using convenient and easily reproducible millimeter to centimeter path lengths and neat samples (4). Chemometrics is a mathematical routine that utilizes principal component analysis (PCA) to derive as few factors as possible to predict component concentrations. PCA takes the wavelength versus absorbance matrix and discerns a minimum set of orthogonal eigenvectors (factors or principal components) that can map these data. These factors involve linear combinations of spectral wavelengths that account for the greatest variance in the spectra of the calibration standards. Matrix algebra is then used to determine the multiple linear regression coefficients or weightings matrix that interrelates the reduced dimensionality absorbance matrix to the concentration matrix (5): C = Fcal A
The NIR spectrum of an equivolume mixture of toluene, isooctane, and heptane is shown in Figure 1. The 1st overtone of the overlapped aromatic C⫺H and aliphatic C⫺H absorbance band covering 6400–5150 cm᎑1 was chosen for the analysis since it gave an absorbance maximum of approximately 1, which is well-within the measurement capability of the instrument, provides the best signal-to-noise, and is within the linear range of the Beer–Lambert law. As a rule of thumb, at least five standards are needed for every variable in the calibration model. Therefore, for the present three component analysis, at least 15 standards are needed. For this experiment, an abbreviated full-factorial training set of 16 standards was used (6). A full-factorial sample set would consist of all combinations of the low, average, and high component concentrations giving 3(# of components) standards.
Figure 1. Stacked NIR spectra over the instrumental range of 4000– 9000 cm᎑1. From bottom to top: isooctane; heptane; an equivolume mix of toluene, heptane, and isooctane; and toluene. Combination bands and C–-H stretch overtones are labeled.
Table 1. Analysis Parameters for the NIR–Chemometrics Analysis of Toluene, Heptane, and Isooctane Solutions
Parameter
Specifics
Spectrometer
Perkin-Elmer Spectrum GX FT-IR/NIR spectrometer
NIR Analysis Range
6400–5150 cm᎑1
Experimental
Spectrometer settings
4 cm᎑1 resolution, 4 scans, 2 mm path length quartz cell
Analysis parameters are specified in Table 1. The various spectral and software parameters were chosen to optimize the accuracy of the toluene, isooctane, and heptane quantitation.
Quantitation Software
Perkin-Elmer Quant+ (version 4.1)
Calibration
Abbreviated full-factorial training set
Chemometrics algorithm
PLSR, mean-centered data
where C is the concentration matrix, Fcal is a composite of both the coordinate transformation and weightings matrices, and A is the absorbance matrix.
www.JCE.DivCHED.org
•
Vol. 84 No. 7 July 2007
•
Journal of Chemical Education
1171
In the Laboratory
Figure 2. Factor plots to discern the proper number of principal components for toluene, heptane, and isooctane in the chemometrics calibration. The standard error of estimation is the root-meansquare error between the estimated volume percent concentration and actual volume percent concentration of each training set standards. Error is sufficiently minimized using 3, 4, and 4 principal components in the calibration model for toluene, heptane, and isooctane, respectively.
Figure 3. Measured and back-calculated NIR spectrum for the sample in Table 2 in which hexane is substituted for heptane in the solution. Subtle mismatches occur in the spectra between 5600– 5900cm᎑1. Note that no mismatch was visually observed between the measured and back-calculated NIR spectra for samples that included only toluene, heptane, and isooctane, which are explicitly calibrated for in the chemometrics calibration.
Table 2. NIR–Chemometric Analyses of Four Validation Samples Spanning the Calibration Range
Sample prep, NIR spectra acquisition, and chemometrics calibration testing were accomplished in two, threehour lab periods. During the first lab period, student teams of two or three students prepared the training set samples and acquired the NIR spectra. During the second lab period, students performed the calibration using the PerkinElmer Quant+ software and investigated the chemometrics calibration by running some validation standards and component substitution samples as described and discussed below.
Prepared Vol of Tol/Hep/Iso (%)
Measured Vol of Tol/Hep/Iso (%)
RMS Difference f/au
30/40/30
29.8/40.2/30.0
90/5/5
90.1/5.1/4.8
0.00053
5/5/90
4.7/ 5.2/ 90.0 ᎑0.1/100.4/᎑0.2
0.00041
0/100/0 a
25/25/25/25 30/40/30
b
30/40c/30 d
30/40 /30 e
30 /40/30
0.00053
0.00049
39.9/2.0/61.0
0.060
29.2/53.6/16.9
0.0072
29.9 /27.4/42.7
0.0075
29.4/43.6/26.9
0.0032
30.7/27.1/43.7
0.050
NOTE: The data in the shaded area are NIR–chemometrics analyses of component substitution samples and a sample with methanol added as an additional component: amethanol, b2-methylpentane, chexane, d e f octane, and benzene. The difference is between the back-calculated and measured spectrum.
The Quant+ software offers the analyst the option of principal component regression (PCR) or partial least square regression (PLSR) calibration. Equivalent results were obtained using either chemometrics algorithm. Mean centering allows for equal weighting of small and large absorbances. Spectra are replotted about the average absorbance of each spectrum. These “centered” spectra were used in the calculations. More accurate results, requiring fewer factors (eigenvectors) were obtained when mean-centering the measured spectra. Other sources of noncommercial (7) and commercial (8) chemometrics PCR and PLSR software are available, but were not tested.
1172
Journal of Chemical Education
•
Hazards Sample preparation should be done in a fume hood as some petroleum distillates, such as toluene and benzene, are suspected carcinogens. Normal precaution for flammability of the common organic solvents (toluene, heptane, isooctane, 2-methylpentane, methanol, benzene, hexane, and octane) is also advised. All of the organic solvents are harmful if inhaled or swallowed. Results and Discussion This experiment has been carried out in instrumental analysis lab and has evolved to include specific analyses to critically investigate the chemometrics calibration. Calibration involves finding the minimum number of factors, also known as principal components or eigenvectors, that are needed to accurately predict the component concentrations. Typical factor plots are shown in Figure 2. Student results for instructor-provided midrange and upper and lower range “unknown” samples are shown in Table 2. Results are accurate and reproducible to within 0.6% throughout the entire concentration range.
Vol. 84 No. 7 July 2007
•
www.JCE.DivCHED.org
In the Laboratory
Calibration in chemometrics is specific to a certain sample type. Quantitation is inaccurate if an additional component contaminates the mixture or if components other than those calibrated for substitute in the mixture. As shown in Table 2, quantitation is inaccurate when an extra component, methanol, is added to the mixture or when 2-methylpentane is substituted for 2,2,4-trimethylpentane (isooctane). Inaccuracies can be flagged by comparing measured spectra to back-calculated spectra. Notice in Table 2 that the root-meansquare (RMS) difference between the back-calculated and the measured spectrum is approximately ten times greater for the substitution and added component samples than for the intended toluene, heptane, isooctane mixtures, which were explicitly considered in the calibration. This mismatch is shown in Figure 3. The RMS difference can, therefore, be used to flag suspect samples. Component substitution experiments also help in understanding the chemometrics calibration, specifically what the principal components or factors derived from PCA physically relate to. One factor in the present calibration is related to spectral features distinguishing CH3 from CH2 and their relative prominence in an alkane (9). For example, heptane is a straight-chain hydrocarbon with a high ratio of CH2 to CH3; whereas, isooctane (2,2,4-trimethylpentane) is a highly branched hydrocarbon with a much lower CH2兾CH3 ratio. When substituting hexane (Figure 3), a molecule with a lower CH2兾CH3 ratio, for heptane, the chemometrics calibration sees a smaller ratio of CH2 to CH3 features in the spectrum. As a result, the chemometrics calibration predicts more isooctane and less heptane than the unsubstituted sample. If octane, a molecule with a higher CH2兾CH3 ratio, is substituted for heptane, the quantitation goes the other way (Table 2). This insight is further substantiated by the 2-methylpentane– isooctane substitution sample. Here the lesser branched 2methylpentane results in more heptane and less isooctane
www.JCE.DivCHED.org
•
being predicted by the calibration model. Furthermore, benzene is quantitated much like toluene. This substitution indicates that a spectral region associated with the aromatic ring weighs heavily in the quantitation of toluene. Acknowledgments We are grateful to Augustana College for faculty and student summer research grants that enabled this project. W
Supplemental Material
Instructions for the students and notes for the instructor are available in this issue of JCE Online. Literature Cited 1. McClure, W. F. Anal. Chem. 1994, 66, 43A. 2. (a) Kelly, J. J.; Barlow, C. H.; Jinguji, T. M.; Callis, J. B. Anal. Chem. 1989, 61, 313. (b) Chung, H.; Lee, J-S.; Ku, M-S. American Laboratory 1999, 31 (22), 24. 3. Crandall, E. W. J. Chem. Educ. 1987, 64, 466. 4. Weyer, L. G. Applied Spectroscopy Reviews 1985, 21, 1. 5. Kramer, R. Chemometric Techniques for Quantitative Analysis; Marcel Dekkar: New York, 1998. 6. Bjorsvik, H.-R.; Martens, H. Practical Spectroscopy 1992, 13, 159. 7. Goicoechea, H. C.; Olivieri, A. C.; Pagani, A. P.; Ribone, M. E. J. Chem. Educ. 2000, 77, 1330. 8. Chemometrics toolbox for MATLAB. http:// www.chemometrics.com/software/chemometrics.html (accessed Jan 2006). 9. Instrumental Methods in Food and Beverage Analysis; Wetzel, D., Charalambous, G., Eds.; Elsevier Science: New York, 1998.
Vol. 84 No. 7 July 2007
•
Journal of Chemical Education
1173