Effect of Resolution on Quantification in Open-Path ... - ACS Publications

The effects of resolution, spectral window, and background type on the predictive ability of classical least squares regression (CLS) on spectra measu...
0 downloads 0 Views 164KB Size
Environ. Sci. Technol. 2000, 34, 1337-1345

Effect of Resolution on Quantification in Open-Path Fourier Transform Infrared Spectrometry under Conditions of Low Detector Noise. 1. Classical Least Squares Regression BRIAN K. HART AND PETER R. GRIFFITHS* Department of Chemistry, University of Idaho, Moscow, Idaho 83844-2343

The effects of resolution, spectral window, and background type on the predictive ability of classical least squares regression (CLS) on spectra measured by an open-path Fourier transform (OP/FT-IR) spectrometer were tested in some detail. It is shown that the most accurate quantitative results are obtained by using equidistant backgrounds, reduced spectral windows, and low resolution. The effect of interfering compounds is shown to be particularly serious when CLS regression is used to process OP/FT-IR spectra.

Introduction In 1990, through the Clean Air Act (CAA), the U.S. Environmental Protection Agency (EPA) mandated the inventorying, monitoring, and regulation of over 180 vapor phase compounds. Over 130 of these compounds, designated hazardous air pollutants (HAPs), can be detected and quantified using open-path Fourier transform infrared (OP/FT-IR) spectrometry. OP/FT-IR spectrometry has the capability of performing real-time, nonobtrusive analysis of nonpoint sources that may be emitting one or more HAPs (1). Thus OP/FT-IR would appear to be an ideal method for use by public, industrial, and regulatory agencies for the monitoring of these HAPs. However, OP/FT-IR spectrometry is only a minor player in the analysis of HAPs, in part because of the large startup costs and need for trained and skilled spectroscopist for the interpretation of OP/FT-IR spectra. Protocols for OP/FT-IR monitoring are being developed for the situation where the spectrometer is permanently mounted and continuously characterizing a given locale. This approach is commonly used for the detection of fugitive gas releases in a controlled environment such as fence-line monitoring of an industrial plant. In this case, a permanent stationary spectrometer is used, and the sample spectrum is usually ratioed against a single-beam background spectrum measured over the same geographical path in the absence of any pollutants but displaced in time. We will refer to such spectra as equidistant background spectra. Even though the path length for the sample and background spectra may be identical, the temperature, pressure, and humidity at the time the two spectra are measured may be quite different. Thus it is common to see residual spectral features due to * Corresponding author phone: (208)885-5807; fax: (208)885-6173; e-mail: [email protected]. 10.1021/es9904383 CCC: $19.00 Published on Web 02/23/2000

 2000 American Chemical Society

FIGURE 1. “Equidistant background” spectrum obtained by ratioing two single-beam spectra measured over a path of 300 m. These two spectra were not measured on the same day, but conditions were very similar. This background is representative of the type used for the EQDI0 data. The noise in the region of strong atmospheric absorption would be greater were it not for the presence of about 1% stray light in the single-beam background spectra. Inset are baseline spectra in the atmospheric windows with the ordinate scale expanded by a factor of 10 (from -0.01 to 0.03 AU). atmospheric H2O and CO2 in OP/FT-IR spectra measured with equidistant backgrounds, even in the atmospheric windows (see Figure 1). On the other hand, the baseline of OP/FT-IR spectra calculated with an equidistant background spectrum is usually quite flat and is centered around zero absorbance units (AU) (see Figure 1). Consequently, equidistant background spectra are specified by the draft EPA OP/FT-IR operating protocol, Compendium Method TO-16. Permanently mounted OP/FT-IR spectrometers are normally set up to monitor a small number of known molecules, often monitoring only a single analyte. The possible interferences are also usually well defined in this type of scenario. As a result, spectra measured by this approach are well suited for most multivariate analytical techniques, including classical least squares (CLS) and partial least squares regression (PLS), with CLS being more commonly used in practice and is recommended in most EPA protocols. It should be noted that it is often necessary to measure a large number of pollutant-free background spectra over a wide range of humidity and temperature in order that there is a reasonable probability of finding one background spectrum that is measured under identical meteorological conditions as a particular sample spectrum. A second scenario where OP/FT-IR spectrometry shows great promise is the field measurement of a geographical area with a completely unknown atmospheric composition using a portable spectrometer, e.g., during the remediation of a Superfund site. In this case acquiring an analyte-free equidistant single-beam background spectrum is often difficult, if not impossible. In such cases, it would be far easier if a short-path (1-5 m total pass) single-beam background spectrum could be used to calculate all absorbance spectra. However, short-path background spectra have the undesirable effect of introducing baseline shifts and increasing the number and intensity of observed features due to atmospheric H2O and CO2 in the spectrum after it is converted to a linear absorbance format (see Figure 2). These interferences are the major source of analytical error when VOL. 34, NO. 7, 2000 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

1337

FIGURE 2. “Short-path baseline” spectra obtained by ratioing a single-beam spectrum measured over a total path of 300 m and a single-beam spectrum measured over a total path of 5 m. This background is representative of the type used for the SPI0 data. performing many multivariate calibration routines. Since both the number and composition of the analytes contributing to spectra measured with portable instrumentation are often unknown, several problems may be encountered when a multivariate calibration system is being designed. A trained spectroscopist is often required to interpret the spectrum and identify all the compounds that contribute absorption features to the spectrum before setting up the CLS calibration. In this paper we report a detailed investigation into the factors that affect the accuracy of CLS calibrations. For OP/ FT-IR spectra measured with equidistant backgrounds, interference from atmospheric H2O and CO2 is significantly less than the situation where short-path single-beam spectra are used for calculation of the absorbance spectra. This effect, along with changes in resolution, size and number of analytical windows, and the interference from other compounds will be examined under conditions that simulate true OP/FT-IR spectra, but for which the concentration of the analytes is known exactly. Any factor that degrades CLS calibrations may be termed “noise” (2). Detector noise is often assumed to be the most important source of noise in most forms of FT-IR spectrometry (3-5). However, the peak-to-peak detector noise on OP/FT-IR spectra measured using a mercury cadmium telluride (MCT) detector with total path lengths of up to 300 m is usually much lower than the height of interferences caused by atmospheric H2O vapor lines that are not completely eliminated in the absorbance spectra that result when the ratio of the sample and reference spectra is calculated, even in the atmospheric windows. Besides detector noise and atmospheric interferences, quantitative errors may also be derived from the presence of unanticipated interferents, baseline drift, and small wavenumber shifts caused by changes in the alignment of the interferometer or retroreflector. As noted above, most recent work has focused on the effect of detector noise and resolution on the signal-to-noise ratio (SNR) of OP/FT-IR spectra (3-6). Griffiths et al. (4) have shown that the noise level on the baseline of an absorbance spectrum is directly proportional to the resolution, ∆v, when detector noise is dominant. In this case, the optimum resolution will depend on the full width at halfheight (fwhh) of the absorption bands of the analyte. When the fwhh of the analyte bands is greater than ∆v, a spectrum measured at, say, 8-cm-1 resolution should have a SNR eight times greater than a measurement at 1-cm-1 resolution. (In practice, the fwhh of bands in the spectra of most volatile 1338

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 34, NO. 7, 2000

organic carbons (VOCs) with a molecular weight greater than 100 is usually greater than 10 cm-1.) This conclusion has been confirmed by laboratory measurements of analytes in a gas cell made using a pyroelectric detector. In these measurements, both interference by atmospheric H2O and CO2 and shifts in the baseline and wavenumber scale are minimal (7). On the other hand, for most OP/FT-IR measurements made in the field with a MCT detector with total path lengths up to 300 m, detector noise is rarely the greatest source of variation in the baseline. It is the purpose of this paper to investigate and report the effect of resolution on the predictive ability of CLS on OP/FT-IR spectra acquired under conditions of very low detector noise but significant levels of atmospheric interferences. A basic assumption in any quantitative regression analysis is that the error resides primarily in the dependent variable (8, 9). CLS is based on the Lambert-Beer law which for the analysis of n components using p wavelengths and q spectra may be expressed in matrix form as

A ) KC

(1)

where A is the q × p absorbance matrix, K is an n × p matrix with each value proportional to the product of path length and absorptivity of each component, and C is the n × q matrix of concentrations, with bold case denoting the matrix format. In most CLS calibrations, A is the dependent variable and the main source of error is in the optical signal. In OP/ FT-IR this is rarely the case because the concentration of each analyte in the plume is never known accurately. For successful calibration by CLS, the presence and concentration of each analyte must be known very accurately for every measurement. Simple baseline variations may be modeled by including shifts that vary as νN, with N ) 0, 1, and 2, but in OP/FT-IR spectrometry, the baseline variations are often far more complex and are usually manifested as errors in the predicted concentration of the analytes being measured. The baseline can be corrected manually but most forms of baseline correction introduce some operator bias; baseline correction reduces, but does not eliminate, errors. Overlapping spectral features from unmodeled analytes and atmospheric constituents also create errors in the concentration values. These problems are particularly severe when OP/FT-IR spectra are calculated using short-path background spectra, and it is presumably for this reason that most analytical protocols for OP/FT-IR spectrometry call for the use of equidistant backgrounds and the determination of only one compound at a time. The practice of creating a separate calibration for each analyte is, however, very timeconsuming and the interpretation of each spectrum by a trained spectroscopist is often needed. In addition, in practice, water and carbon dioxide are often included in the calibration set to minimize the effect of the atmosphere on concentration predictions. The successful inclusion of water and CO2 requires that a reference spectrum at approximately the same path length, humidity, and temperature be available or that these spectra are able to be calculated exactly from a compilation of band parameters such as HITRAN. Either of these approaches adds another step where a trained spectroscopist is required. This step was left out to provide a more thorough test of the effects of atmospheric constituents, which even in the best cases are never completely compensated for. In this paper, we report the effect of background type, resolution, and the size of the spectral window on the accuracy of predictions of OP/FT-IR spectra. The OP/FT-IR spectra were synthesized using over 1000 measured singlebeam background spectra to which were added scaled reference spectra of two test sets of five vapor-phase compounds. (In this way, the exact amount of each analyte’s contribution to each spectrum was known.)

TABLE 1. Concentration and Peak Heights at 1 Arbitrary Concentration Unit (ACU) maximum peak height at 1 ACU (AU) resolution (cm-1)

methane ethane propane butane pentane trichloromethane 1,4-dichlorobenzene 1,2-dichloroethane dichloromethane 1,1,2-trichloroethane

concn of 1 ACU (ppm-m)

1

2

4

8

16

32

∼410 706 332 378 369 127 336 880 369 525

0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30

0.214 0.193 0.276 0.297 0.299 0.297 0.300 0.300 0.299 0.299

0.169 0.140 0.222 0.284 0.292 0.295 0.296 0.300 0.299 0.287

0.112 0.132 0.210 0.262 0.271 0.287 0.295 0.298 0.297 0.281

0.066 0.122 0.179 0.250 0.270 0.261 0.272 0.283 0.287 0.257

0.045 0.113 0.116 0.228 0.243 0.238 0.248 0.282 0.287 0.237

FIGURE 3. Reference spectra of the five straight chain alkanes used in the calibration and validation data at 1-cm-1 resolution: A, methane; B, ethane; C, propane; D, n-butane; E, n-pentane. The first set consisted of spectra of the C1 to C5 straightchain alkanes. Reference spectra of each compound are shown in Figure 3. The spectra of these compounds were selected to demonstrate the effect of resolution on a group of compounds with overlapping bands with a wide range of line- or bandwidths. The only atmospheric window where the alkanes have significant absorption is between 3000 and 2800 cm-1. The fwhh of the lines in the rotation-vibration spectrum of methane is about a factor of 5 less than the highest resolution of our spectrometer, whereas for n-butane and n-pentane the separation of the rotational lines is less than their air-broadened width, so that only the rotational contour can be measured. The second set is a group of five chlorinated hydrocarbons (trichloromethane, 1,4-dichlorobenzene, 1,2-dichlorethane, dichloromethane, and 1,1,2-trichloroethane) taken from the list of hazardous air pollutants (HAPs) as defined by the Clean Air Act. These compounds, the spectra of which are shown in Figure 4, are representative of analytes with broad rotational contours and few overlapping spectral features. The peak height of the strongest spectral feature in the alkane and chlorinated molecules studies is listed as a function of resolution in Table 1. In each case, the peak absorbance in an air-broadened reference spectrum measured at 1-cm-1 resolution was normalized to 0.3 AU. As would be expected, the spectra of the lighter alkanes exhibit greater changes in peak height and width as a function of resolution than the spectra of n-butane or n-pentane. Since the rotational fine structure is not even resolved in the spectra of the chlorinated molecules at 1-cm-1 resolution, the spectra of these molecules are far less affected by changes in resolution than those of the lower alkanes.

FIGURE 4. Reference spectra of the five chlorinated molecules used in the calibration and validation data at 1-cm-1 resolution: A, trichloromethane; B, 1,4-dichlorobenzene; C, 1,2-dichloroethane; D, dichloromethane; E, 1,1,2-trichloroethane.

Experimental Section It might be thought that the best way to study the effect of resolution on OP/FT-IR spectra would be through a controlled release of the analyte into the atmosphere, collecting samples at various points along the path of the IR beam and analyzing them subsequently in the laboratory. In practice, however, this approach has never yielded unequivocal results, as the effect of the meteorological conditions is to cause the concentration of the analyte(s) to vary with time in both in the horizontal and vertical directions. Calibration by CLS requires that the path-integrated concentration of each analyte is known to a higher degree of accuracy than is feasible under the best circumstances with a controlled atmospheric release. It may also be noted that many of the compounds being studied are considered to be hazardous air pollutants (HAPs) by a number of government agencies, so that their controlled release into the atmosphere would have severe regulatory ramifications. To circumvent these problems, all calibrations and validations were performed using independent sets of 500 spectra synthesized using real OP/FT-IR background spectra to which were added the spectra of analytes with randomly selected (but known) scaling factors. Thus the exact concentration of each analyte contributing to a given spectrum was known exactly, and the precise effect of the background on prediction accuracy could be determined. To obtain the background spectra, double-sided OP/FTIR interferograms were collected over a period of 1 year at total path lengths ranging from 200 to 400 m using a Bomem MB-100 OP/FT-IR spectrometer operated in a monostatic configuration at an optical path difference of 1 cm (1-cm-1 nominal resolution). This procedure provided a widely VOL. 34, NO. 7, 2000 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

1339

varying basis set of single-beam background spectra. All interferograms were measured using a narrow-band MCT detector so the noise level was low. The typical noise on the baseline of a spectrum obtained when two spectra measured sequentially were ratioed and converted to absorbance was less than 0.0005 AU, while features due to uncompensated water vapor and baseline drifts were usually significantly higher than this. A set of 1000 1-cm-1 baseline absorbance spectra was created using the criteria outlined in the proposed EPA Compendium Method TO-16. This set used equidistant single-beam background spectra for the calculation of the absorbance spectra and is designated EQDI0. The procedure was repeated with the interferograms truncated to create 2, 4, 8, 16, and 32-cm-1 resolution data sets that were identical except for the change in resolution. Had separate spectra been measured at each resolution, the data acquisition time would be inversely proportional to ∆ν, i.e., for the same number of scans it would have taken 32 times longer to have measured the 1-cm-1 spectrum than the 32-cm-1 spectrum, neglecting the effect of changes in the interferometer duty cycle efficiency. All spectra used for the work reported in this paper were measured with the same number of scans. Several interferograms were also collected over short paths (1-5 m) over the course of a year. For these spectra the beam was slightly misaligned to avoid saturation of the MCT detector. From the pool of 200-400 m path length interferograms, 1000 absorbance spectra were created using shortpath single-beam spectra as the reference. This set was then shuffled to remove any bias and divided into two sets of 500 OP/FT-IR absorbance spectra. Identical sets, deresolved to 2, 4, 8, 16, and 32 cm-1 were again created and will be subsequently referred to as SPI0 spectra. The SPI0 spectra were created from the same set of interferograms used to create the EQDI0 spectra and were processed in an identical manner to avoid any bias in the calibration and validation sets. Finally, the baselines of the SPI0 set were corrected using a simple two-point baseline correction routine. Including the baseline-corrected set (SPBI0), a total of three different types of baseline spectra at six different resolutions were used to test the CLS multivariate analysis routine. These three data sets provided the means by which the effect of resolution and the type of background type on CLS predictions could be tested. For calibration and prediction by CLS, it is assumed that the path length is a constant and only the concentration is changing. By using a variety of path lengths and taking the measurements over a wide range of temperature and humidity, a “noise” term was introduced into the calibration/ validation sets that reflects changes in humidity and temperature that one would expect to encounter in the field. The spectra were acquired in the relatively pristine atmosphere around Moscow, ID, where the average humidity is quite low. Thus although the chance of unanticipated analytes contributing to the baseline spectra was quite low, so was the number of background spectra that were measured under conditions of high humidity. Given that most industrial sites in the U.S.A. are located in the more humid Eastern part of the country, it may be anticipated that the background spectra that were created in the project represent a “best case” for OP/FT-IR spectroscopy. In regions of higher air pollution, the amplitude of uncompensated water lines would be expected to be higher than those in the spectra used in this study for the same path lengths. To each of these calibration/validation sets of analytefree OP/FT-IR absorbance spectra, scaled reference spectra of all five analytes were added to simulate the presence of varying levels of contaminants, the concentration of which is known accurately. Most of the reference spectra were taken 1340

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 34, NO. 7, 2000

from the EPA library of vapor-phase FT-IR spectra of HAPs (10). When the reference spectra were not present in the EPA library, they were measured in-house using a vacuum calibration system (11). The use of reference spectra measured on a spectrometer other than the one used to collect the open-path spectra may introduce a certain amount of error into the whole test procedure. However, it was not possible to measure accurate reference spectra using the Bomem MB-100 spectrometer used to measure OP/FT-IR spectra without reconfiguring the optics in which case spectral shifts would also have been introduced. Spectra from the EPA database were compared whenever possible to spectra acquired using our in-house system and, with the exception of some small baseline drift, were found to be nearly identical. It should be pointed out that this study was designed to measure the how various factors affect the accuracy of the multivariate analysis routine and not to design a calibration protocol for field use. With this intent in mind any errors from transferring calibration spectra from one instrument to another become one of the sources of error that affect the root-mean-square error of prediction (RMSEP) for each analyte. Prior to addition of the reference spectra to the OP/FT-IR baseline spectra, the maximum absorbance of each reference spectrum (measured at 1-cm-1 resolution) within the normal atmospheric windows, 700-1300, 2000-2150, and 24003000 cm-1, was set to a value of 0.3 AU. This value was selected to remain well within the linear range described by Beer’s law (12) and to test the ability of CLS to yield accurate predictions when the maximum analyte concentrations were less than about 2 ppm. For each standard, the path-integrated concentration for which the peak absorbance of the strongest band in the spectrum measured at 1-cm-1 resolution was 0.3 AU, was set as one arbitrary concentration unit (ACU). The reason for using the ACU as a concentration unit rather than the actual path-integrated concentration is to allow direct comparison of the results between analytes with different absorptivities in the various calibration and validation sets. The 1-cm-1 reference spectra were then deresolved in the wavenumber domain using GRAMS-386 (Galactic Industries, Salem NH) to resolutions of 2, 4, 8 16, and 32 cm-1. Pathintegrated concentrations equivalent to 1 ACU and the maximum peak height for each compound at each resolution are given in Table 1. The analyte spectra were then randomly scaled and added to the OP/FT-IR spectra, that had been deresolved using interferogram truncation, to create 36 calibration sets and 36 validation sets (alkane-SPI0, chlorinated-SPI0, alkaneSPBI0, chlorinated-SPBI0, alkane-EQDI0, and chlorinatedEQDI0 each at resolutions of 1, 2, 4, 8, 16, and 32-cm-1). Each spectrum in the data sets contained all five analytes from the relevant compound test group. The concentrations of the analytes in the calibration and validation sets were randomly scaled between 0 and 1 ACU and kept constant between data sets to avoid any bias that may be introduced by a fortuitous selection of concentrations. This allowed us to test for the effect of interference from other compounds. The validity of this method of adding the calibration spectra to the background spectrum was tested on our in-house vacuum system. This was done by comparing synthetic spectra to their equivalent spectra acquired by the mixing of the analytes before measurement (13). The two methods produced nearly identical results as long as the true peak absorbance of the strongest band of the added analyte remained below 0.5 AU. (This is the major reason that an absorbance of 0.3 AU was selected as 1 ACU.) It should be noted that this whole procedure is only possible because we are working at low concentration in the vapor phase, so that intermolecular interactions would not lead to deviations from Beer’s law. This method for creating calibration spectra was

also tested with an actual controlled release study that verified the validity of the protocol. To test for the effect of analytical windows, each calibration/validation set was initially limited to the spectral regions of the normal atmospheric windows (700-1300, 2000-2150, and 2400-3000 cm-1). Sets constructed with the full atmospheric windows will be referred to as Type 1 calibration/ validation sets. This set not only provides for the easiest window selection but also includes the largest amount of irrelevant spectral information. The widths of the wavenumber regions investigated were then reduced to include only those regions where at least one of the analytes of interest had an absorbance of 0.015 AU or higher at a concentration of 1 ACU at 1-cm-1 resolution. This “Type 2” set excludes those regions where the analytes have no significant absorption so that the effect of atmospheric bands that may adversely attenuate the calibration procedure is reduced. If a given analyte has more than one strong band, all relevant regions were included. A third (“Type 3”) calibration/validation set was created by using the spectral region that corresponds to the top 90% of the most useful analytical band for each analyte (usually the strongest and in the atmospheric window). This procedure approximates the method prescribed in EPA Compendium Method TO-16 (14). Due to the extent of spectral overlap present in the spectra of some of the analytes, the selection of the analytical peak was based heavily on operator experience and does not fully meet the conditions required for analysis in EPA Compendium Method TO-16, which states that “If all of the features in the target gas are rejected [because of band overlap], the gas concentration cannot be measured by FT-IR”. For the groups of analytes selected in this study, most of the analytes would be rejected as immeasurable by TO-16 criteria. This criterion is necessary when using CLS where interfering bands can make analysis impossible; however, as can be seen in paper 2 of this series, the use of PLS obviates this criterion. Thus a total of 108 separate calibration/validation sets to be tested by the CLS algorithm were created. The K matrix was calculated using eq 2

K ) AC*CCT*inv(CC*CCT)

(2)

where the subscript C designates the calibration data sets and bold font designates a matrix format. The concentrations were then predicted from the validation set using eq 3

CP ) K*AV*inv(KT*K)*KT

(3)

where the subscript V designates the validation set. The accuracy of the predictions was determined by the RMSEP shown in eq 4

( )

1/2

n



RMSEP )

(CTi-CPi)2

i)1

n

(4)

where CTi and CPi are true and predicted concentration values for the ith measurement. All manipulation of spectra and multivariate calibration was performed using MATLAB 5.1 (The Math Works Inc., Natick MA).

Results and Discussion “Reality Check”. To verify the accuracy of the CLS programs and to confirm that the only significant sources of error were from the OP/FT-IR backgrounds, an additional data set was created for each of the different test conditions. These sets contained the scaled alkane or the chlorinated hydrocarbon reference spectra without the added OP/FT-IR atmospheric

backgrounds. Analysis of these data sets using the CLS algorithm shown above resulted in a RMSEP of less than 1 × 10-7 ACU for all cases. All prediction errors that were calculated from the other data sets must, therefore, be a direct consequence of using actual OP/FT-IR atmospheric backgrounds and must be attributed to the type of features that are ubiquitous in all OP/FT-IR spectra. Effect of Resolution. The effect of changing the resolution from 1 to 32 cm-1 is shown in Table 2 for all three windows investigated. The effect of the atmospheric interferences and other noise sources is immediately obvious. The addition of the OP/FT-IR backgrounds increased the prediction errors by 4-7 orders of magnitude over CLS predictions on the same data sets without OP/FT-IR backgrounds. These results emphasize the strong impact of the atmospheric spectral features on the performance of CLS in OP/FT-IR spectrometry. Previous studies have removed this effect by considering only the effect of random noise or the case where there is no change in the CO2 and water vapor concentration (7, 9). For most of the spectra calculated with equidistant backgrounds, there is a small improvement in predictive accuracy as the resolution is degraded from 1 to 8 cm-1, followed by a reduction in accuracy for most compounds as the resolution continues to decrease. The improvement on degrading the resolution from 1 to 8 cm-1 can largely be attributed to a broadening of the sharp spectral features of atmospheric interferences without a concomitant broadening of the bands in the spectrum of the analyte. Had detector noise been the dominant source of error, the improvement in RMSEP on degrading the resolution would have been expected to be much larger than the effect seen in Table 2. This result indicates that detector noise is only a minor factor in determining predictive accuracy for these data. Thus, in cases where the observation time, as opposed to the number of scans, is constant, it would be expected that the results would be very similar. Even though increasing the number of scans would result in a reduction in detector noise, this would yield little benefit because detector noise is already insignificant in comparison to the “noise” from uncompensated lines in the water and carbon dioxide spectrum. Only for the cases of methane and ethane computed with equidistant backgrounds (EQDI0) are the predictions obtained from 1-cm-1 resolution spectra better than the 4-cm-1 resolution spectra. As can be seen in Figure 3, the rotational fine structure of methane and ethane is well resolved at 1-cm-1 resolution. As the resolution is degraded, the height of the rotational lines in the spectrum of methane decreases significantly, from 0.3 AU in the 1-cm-1 spectrum to 0.045 AU in the 32-cm-1 spectrum. The effect of the line broadening is to reduce the capability of CLS to model the concentration in the presence of overlapping spectral bands. An analogous effect as the resolution at which the spectrum of ethane is degraded can also be seen. The effect of resolution on the RMSEP for the alkane data set measured with an equidistant background and computed with the TO-16 analytical window is shown graphically in Figure 5. For propane, n-butane, and n-pentane, the accuracy of the CLS predictions improves as the resolution is degraded from 1- to 4-cm-1, remains fairly constant to 16-cm-1 resolution, and then starts to become worse as the resolution is lowered further. These data suggest that the resolution required to optimize the measurement strongly depends on whether the analyte is a small molecule with resolvable rotational fine structure or a larger molecule where the spacing of the rotational lines is less than the collisionbroadened fwhh of the lines. An analogous trend is observed for the chlorinated hydrocarbons, as illustrated in Figure 6. None of these molecules has rotational lines that can be resolved at 1-cm-1 resolution. Thus it is not surprising that the accuracy of the VOL. 34, NO. 7, 2000 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

1341

FIGURE 5. RMSEP for the five straight-chain alkanes as a function of resolution for the equidistant baseline, TO-16 analytical window validation data set.

TABLE 2. RMSEP (ACU) of CLS Calibrations as a Function of Resolution and Spectral Window full atmospheric window, EQDI0 resolution (cm-1)

1

2

4

8

16

32

trichloromethane 1,4-dichlorobenzene 1,2-dichloroethane dichloromethane 1,1,2-trichloroethane methane ethane propane butane pentane

0.0623 0.0100 0.0599 0.0347 0.0732 0.2466 0.2222 0.8715 1.1913 0.6776

0.0333 0.0113 0.0662 0.0416 0.0807 0.2112 0.2537 0.6922 1.0300 0.6822

0.0223 0.0103 0.0365 0.0402 0.0599 0.3082 0.3032 0.8967 1.2723 0.8003

0.0184 0.0113 0.0363 0.0442 0.0624 0.3783 0.3406 1.0686 1.5146 0.9385

0.0188 0.0126 0.0385 0.0500 0.0689 0.4966 0.3547 1.2751 1.7656 1.0461

0.0302 0.0172 0.0476 0.0557 0.0945 0.7095 0.2800 1.7633 2.5884 1.4538

analytical windows corresponding to absorption features, EQDI0 trichloromethane 1,4-dichlorobenzene 1,2-dichloroethane dichloromethane 1,1,2-trichloroethane methane ethane propane butane pentane

0.0325 0.0076 0.0505 0.0316 0.0266 0.1078 0.0519 0.4427 0.5181 0.1596

0.0219 0.0091 0.0637 0.0338 0.0487 0.1026 0.0573 0.2648 0.2811 0.0902

0.0182 0.0072 0.0363 0.0307 0.0191 0.1590 0.0700 0.3017 0.3461 0.1175

0.0176 0.0078 0.0368 0.0326 0.0177 0.2513 0.0863 0.2357 0.2707 0.1000

0.0193 0.0086 0.0389 0.0360 0.0176 0.3648 0.0999 0.2609 0.3156 0.1186

0.0276 0.0121 0.0446 0.0318 0.0183 0.5314 0.0923 0.2427 0.4021 0.2510

0.0196 0.0082 0.0394 0.0361 0.0195 0.3799 0.0856 0.2634 0.3362 0.1443

0.0290 0.0112 0.0447 0.0281 0.0204 0.5394 0.0798 0.2987 0.5120 0.3207

TO-16 analytical windows, EQDI0 trichloromethane 1,4-dichlorobenzene 1,2-dichloroethane dichloromethane 1,1,2-trichloroethane methane ethane propane butane pentane

0.0344 0.0070 0.0505 0.0312 0.0286 0.1248 0.0520 0.5607 0.6922 0.2359

0.0226 0.0083 0.0634 0.0339 0.0502 0.1220 0.0577 0.3555 0.4159 0.1301

CLS predictions from spectra measured at 1- and 2-cm-1 resolution is quite low. The best results are found from spectra measured at 4, 8, and 16-cm-1 resolution. This result is 1342

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 34, NO. 7, 2000

0.0186 0.0068 0.0368 0.0306 0.0197 0.1709 0.0709 0.3536 0.4243 0.1434

0.0179 0.0073 0.0374 0.0326 0.0187 0.2569 0.0832 0.2399 0.2818 0.1086

presumably caused by the fact that all bands in the spectra of these molecules have widths of about 20 cm-1. Hence the shapes of these bands are only slightly affected as the

FIGURE 6. RMSEP for the five chlorinated molecules as a function of resolution for the equidistant baseline, TO-16 analytical window validation data set.

TABLE 3. RMSEP (ACU) of CLS Calibrations at 1-cm-1 Resolution as a Function of Type of Spectral Window and Background full atmospheric window, SPBI Type 1

trichloromethane 1,4-dichlorobenzene 1,2-dichloroethane dichloromethane 1,1,2-trichloroethane methane ethane propane butane pentane

Type 2

Type 3

SPBI0

SPI0

SPBI0

SPI0

SPBI0

SPI0

0.2714 0.1765 0.2239 0.2241 0.2288 0.4853 0.3264 0.9370 1.6089 0.9531

0.2599 0.7688 0.4803 0.1722 0.4468 0.7865 1.6172 3.1028 2.0924 0.9266

0.1950 0.1457 0.2263 0.1273 0.1728 0.2176 0.1086 0.8202 1.3180 0.6572

0.2505 0.5400 0.2079 0.1073 0.2977 0.4852 0.7486 1.8058 1.7855 0.6757

0.2042 0.1492 0.2265 0.1349 0.1794 0.2232 0.1083 0.8358 1.3512 0.6799

0.3251 0.6197 0.3633 0.1395 0.4649 0.5644 0.9501 1.8229 1.6165 0.6374

resolution is degraded, while the amplitude of the residual features near the baseline is reduced significantly. Again, had detector noise been the principal source of error in these spectra, the improvement in RMSEP on decreasing the resolution would have been expected to be much greater. Effect of Background Type. The RMSEP values for the EQDI0 data sets are given in Table 2. The corresponding data for the data sets measured with a short-path background (SPBI0 and SPI0) at 1-cm-1 resolution are shown in Table 3. For the latter case, the RMSEPs were typically in the range of 0.1-0.7 for the chlorinated molecules and 0.1-1.4 for the alkanes, as shown in Table 3 for spectra measured at 1-cm-1 resolution. Since the maximum path-integrated concentration for any compound was 1 ACU, it is fairly obvious that the results obtained with short-path background spectra are unacceptably poor. The effect of interference by atmospheric water vapor can be seen most readily by comparing the results obtained from the EQDI0 and the SPBI0 data sets, see Figure 7. This figure shows the results of using all three types of spectral window selection (see below) but is limited to the results obtained from spectra measured at a resolution of 1 cm-1 since this is the resolution specified in the EPA Compendium

Method TO-16. All points below the diagonal line represent data sets with higher prediction accuracy for the EQDI0 data sets, with the area above the diagonal representing cases where the SPBI0 data sets gave more accurate predictions. It is evident that under all conditions the EQDI0 data sets outperform the SPBI0 data sets. A value for the RMSEP of 0.5 ACU corresponds roughly to a relative error of 100%. The points below the horizontal dashed line in Figure 5 represent those instances in which CLS was able to predict the correct concentrations with an average error of less than 100% for the EQDI0 data sets. The points to the left of the vertical dashed line represent the corresponding accuracy for the SPBI0 data sets. The errors are always lowest when equidistant backgrounds are used. The effect of background is typified by the results for ethane, measured at 1-cm-1 resolution using the TO-16 window. The RMSEPs for the EQDI0, SPBI0, and SPI0 data sets increase from 0.0520 to 0.1083 to 0.8467 ACU, respectively. This shows that the use of equidistant backgrounds or some other type of background correction scheme to remove or minimize the effects of atmospheric constituents is necessary if one is to obtain reliable results using CLS on OP/FT-IR spectra. VOL. 34, NO. 7, 2000 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

1343

TABLE 4. RMSEP (ACU) as a Function of Number of Data Points no. of 1,4-di- 1,2-di1,1,2-tridata trichloro- chloro- chloro- dichloro- chloropoints methane benzene ethane methane ethane

1-cm-1 Resolution proportional

289

0.0336

0.0469 0.0585

0.0362

0.0413

0.0335 0.0322

0.0264 0.0289

0.0321 0.0307

0.0219 0.0191

0.0354 0.0377

0.0249 0.0265

0.1440 0.0377

0.0715 0.0236

0.0877 0.0527

0.0798 0.0618

2-cm-1 Resolution proportional interpolated

145 289

0.0378 0.0337

proportional interpolated

73 289

0.0334 0.0312

proportional interpolated

37 289

0.0320 0.0366

proportional interpolated

19 289

0.2341 0.0464

proportional interpolated

10 289

0.0902 0.0679

0.0508 0.0713 0.0443 0.0527

4-cm-1 Resolution 0.0475 0.0473 0.0465 0.0474

8-cm-1 Resolution

FIGURE 7. Comparison of the RMSEP for the EQDI0 and SPBI0 data sets. The diagonal line represents the case where the RMSEP for the EQDI0 data sets is equal to the RMSEP of the SPBI0 data sets. In the region above the diagonal line, SPBI0 data sets show superior predictions; the region below the diagonal, EQDI0 data sets show superior predictions. Dashed lines show the point where average predictions have 100% error. An RMSEP of less than 0.05 ACU is required for the average prediction error to be less than 10%. Thus most of the predictions were found to be very unreliable unless the EQDI0 data set (for which spectral features due to the atmospheric water vapor are minimized) is used. The alkane data set does not meet this level of accuracy under any circumstances, with ethane being the only analyte that comes close. This result is not unexpected, given the high degree of spectral overlap in the C-H stretching bands of propane, n-butane, and n-pentane (the only region where the spectra of these compounds have significant information in OP/FT-IR spectra), and points to the seriousness of any unmodeled nonrandom noise when using CLS regression. Effect of Spectral Window Selection. Further evidence of the effect of atmospheric spectral features on the modeling efficiency of CLS is shown by the increase in accuracy in the order Type 1 < Type 3 < Type 2. Type 1 data sets, which use all the information across the full atmospheric windows, include a large number of atmospheric features with no concentration information content. This large amount of irrelevant data in the calibration model reduces the accuracy of the predictions. Reduction in the amount of irrelevant information presented to the CLS algorithm results in improved predictions, as seen in the Type 2 and Type 3 data sets. However, limiting the available information in the calibration model to a single absorption feature also reduces the quality of the calibrations. The Type 3 analytical window, which contains only one spectral region corresponding to the single most isolated peak for each analyte, yields inferior predictions to the Type 2 windows. In the latter case, all the absorption features having an intensity greater than 5% of the peak of the strongest band are included. The use of Type 2 analytical windows also removes any operator bias in the window selection and provides for the implementation of an easily automated window selection routine. Both the alkanes, where there are few or no isolated strong spectral features (see Figure 3), and the chlorinated solvents, where there are several spectral features (see Figure 4), benefit from the fact that the Type 2 data set includes more than one band. The Type 2 analytical window allows the calibration model to give greater weight to those spectral features that have the largest correlation with concentration and less weight to those areas where the absorption features give little useable concentration information. The result is more useful spectral information in the calibration model leading to a more accurate prediction. However, increasing the 1344

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 34, NO. 7, 2000

0.0487 0.0555 0.0487 0.0561

16-cm-1 Resolution 0.0542 0.0586 0.0598 0.0561

32-cm-1 Resolution 0.0756 0.0871 0.0796 0.0983

spectral bandwidth also adds an additional amount of information that cannot be modeled, e.g., residual atmospheric spectral features, thereby reducing the improvement from that which would be expected if the atmospheric bands were not present. The effect of including information that has no correlation to the concentration in the calibration model is clearly visible in results for the Type 1 (full atmospheric window) set. This decrease in predictive ability is most pronounced for the alkanes, where there is already a large amount of spectral overlap. Effect of Data Spacing. As postulated above, reducing the resolution beyond ∼8 cm-1 may reduce the chance of having a set of unique data points related to an individual analyte. The sparseness of data could lead to an increased error in the concentration matrix and hence reduce the precision of the CLS predictions. This last point was tested by creating two additional EQDI0 data sets incorporating the spectra of the chlorinated molecules; the calculations were performed using the TO-16 spectral windows. It should be noted that the data sets used for this study are not the same as the EQDI0 data sets used in the rest of this paper. The equidistant baseline spectra created for this study were all made by ratioing single-beam spectra taken under conditions where the spectrometer setup was unchanged between measurements, and the measurements were acquired less than 20 min apart. This creates an optimum set of conditions, where the CLS results are expected to be more accurate than those obtained with the EQDI0 data set. Spectra in the first set all have two data points per resolution element (the default condition in most FT-IR spectrometers.) The number of data points in this set is, therefore, inversely proportional to the resolution. The second set was interpolated by zero-filling the interferograms to produce the same number of data points at all resolutions. The results, which are shown in Table 4, show that for trichloromethane, 1,2-dichloroethane, and 1,1,2-trichloroethane, the prediction errors increase significantly at 16cm-1 resolution for the case where there are two points per resolution element. However, when using interpolated points the prediction errors at 16-cm-1 are similar to the errors at higher resolution, and a reduction in predictive ability is not encountered until 32-cm-1 resolution. These three compounds contain the largest amount of spectral overlap, resulting in data points with a large interdependence between analytes for the CLS algorithm. Interpolation allows for a larger selection of data points to model the spectral overlap

resulting in more accurate predictions. This result is surprising because the interpolated data points contain no new spectroscopic information. When compounds have isolated absorption bands, no benefit is derived by interpolation. For example, in the determination of 1,4-dichlorobenzene, a band is used that is completely isolated from any of the bands due to the other four analytes. The accuracy to which the path-integrated concentration of this compound can be estimated shows no significant difference between the data sets constructed with and without interpolation. These results indicate that at lower resolution the CLS algorithm is affected by a decrease in the amount of independent information present in the data and not by the number of data points presented to the algorithm.

Acknowledgments The authors would like to acknowledge the assistance of Husheng Yang and R. James Berry, who assisted in the acquisition of the OP/FT-IR spectra used in this study. This work was supported in part by a grant from the Idaho National Engineering and Environmental Laboratory (INEEL) University Research Consortium. The INEEL is managed by Lockheed Martin Idaho Technologies Company for the U.S. Department of Energy, Idaho Operations under Contract No. DE-AC07-94ID13223.

Literature Cited (1) Newman, A. R. Anal. Chem. 1997, 69, 43A. (2) Ingle, J. D., Jr.; Crouch, S. R. Spectrochemical Analysis; Prentice Hall: Upper Saddle River, NJ, 1988. (3) Jaakola, P.; Tate, J. D.; Paakkunainen, M.; J. Kauppinen, J.; Saarinen, P. Appl. Spectrosc. 1997, 51, 1159.

(4) Griffiths, P. R.; Richardson, R. L., Jr.; Qin, D.; C. Zhu Open-Path Atmospheric Monitoring with a Low Resolution FT-IR Spectrometer. In Proceedings of Optical Sensing for Environmental and Process Monitoring; Simpson, O. A., Ed.; A&WMA SPIE: Bellingham, WA, 1995; Vol. VIP-37, SPIE Vol. 2365, pp 274-284. (5) Xiao H.-K.; Levine, S. P.; D’Arcy, J. B. Anal. Chem. 1989, 61, 2708. (6) Childers; J. W.; Thompson, E. L. Resolution Requirements in Long-Path FT-IR Spectrometry; In SP-89 Optical Sensing for Environmental Monitoring; A&WMA: Pittsburgh, PA, 1994; pp 38-46. (7) Russwurm, G. M.; Childers, J. W. FT-IR Open-Path Monitoring Guidance Document; ManTech Environmental Technology, Inc.: Research Triangle Park, NC, 1995. (8) Mark, H.; Workman, J. Statistics in Spectroscopy; Academic Press: New York, 1991. (9) Martens, H.; Naes, T. Multivariate Calibration; John Wiley and Sons: New York, 1989. (10) U.S. Environmental Protection Agency Technology Transfer Web, Emission Control Center, http://www.epa.gov/ttn/emc/ ftir/ignam.html. (11) Richardson, R. L., Jr.; Griffiths, P. R. Appl. Spectrosc. 1998 52, 143-153. (12) Zhu, C.; Griffiths, P. R. Appl. Spectrosc. 1998, 52, 1403-1413. (13) Richardson, R. L., Jr.; Yang, H.; Griffiths, P. R. Appl. Spectrosc. 1998 52, 565-571. (14) Compendium Method TO-16 Long-Path Open-Path Fourier Transform Infrared Monitoring of Atmospheric Gases; EPA/625/ R-96/010b; U.S. Environmental Protection Agency: Research Triangle Park, NC, 1999.

Received for review April 19, 1999. Revised manuscript received January 3, 2000. Accepted January 3, 2000. ES9904383

VOL. 34, NO. 7, 2000 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

1345