1156
Anal. Chem. 1086, 58, 1156-1162
(5) Munshi, K. N.; Dey, A. K. Chlm. Anal. (Paris) 1971, 53 (8), 539;Anal. Abst. 1972, 2 2 , 3047. (6) Vekhande, C.; Munshi, K. N. J. Indlan Chem. SOC. 1975, 52, 939; Anal. Abstr. 1976, 31 (2),87. (7) Bard, A. J. “Encyclopedia of Electrochemistry of the Elements”; Marcel Dekker: New York, 1978;Voi. VI, p 34. (8) Prandtl, W.; Scheiner, K. 2. Anorg. Allg. Chem. 1934, 220, 107; Chem. Abstr. 1935, 2 9 , 1712:5. (9) Moelier, T.; Moss, F. A. J. J. Am. Chem. SOC. 1951, 7 3 , 3149. (10) Yost, D. M.; Russell, H.; Garner, C. S. “The Rare Earth Elements and Their Compounds”; Wiley: New York, 1947;p 26. (11) Kieikopf, J. F.; Crosswhite, H. M. J. Opt. SOC.Am. 1970, 6 0 , 347. (12) Crosswhite, H. M.; Schwiesow, R. L.; Carnail, W. T. J. Chem. Phys. 1969, 5 0 , 5032.
(13) Schwiesow, R. L.; Crosswhite, H. M. J. Opt. SOC. Am. 1969, 5 9 , 602. (14) Gilbert, R.; Lgpine, L.; burin, M.; Ouellet, L.; Gauthier, R. Can. J. Chem. f n g . 1985, 6 3 , 978. (15) O’Haver, T. C. Anal. Proc. (London) 1982, (January), 22. (16) Taisky, G.; Mayring, L.; Kreuzer, H. Angew. Chem., I n t . Ed. fngl. 1978, 17, 785. (17) O’Haver, T. C.; Green, G. L. Int. Lab. 1975, 5 , 11. (18) O’Haver, T. C.; Green, G. L. Anal. Chem. 1976, 48, 312. (19) Martin, A. E. Spectrochlm. Acta 1959, 74, 97.
RECEIVED for review August 26, 1985. Accepted December 10, 1985.
Computerized Infrared Spectral Identification of Compounds Frequently Found at Hazardous Waste Sites Mark A. Puskar and Steven P. Levine* The University of Michigan, School of Public Health-11, Ann Arbor, Michigan 48109
Stephen R. Lowry Nicolet Instrument Corporation, 5225 Verona Road, Madison, Wisconsin 53711
The existence of large numbers of abandoned hazardous waste shes in this country has become an Issue of national concern. Government iegislatlon requires the EPA to regulate the cleanup activltles on hazardous waste remedlai actlon sltes. Implementation of these actlvltles requlres the development of reliable monltorlng techniques that can identlfy hazardous substances In complex mlxtures. This paper describes the application of a direct Infrared spectral lnterpretatlon technique to chemical mixtures. The system Is based on a branch tree rule aigorlthm, which utliizes the peak Information contalned in an Infrared spectrum. Thirty of the most frequently ldentlfled organic compounds on hazardous waste sites were selected as the trainlng set for the rules development. To document this method’s abliity to identify components in mixtures, four groups of mixtures where prepared. Of the 600 decisions made determinlng the presence of components in the mixtures, there were 57 true positlves, 12 false posltives, 3 false negatives, and 528 true negatives.
The Resource Conservation and Recovery Act of 1976 (RCRA) and its amendments (1984) require the U.S. Environmental Protection Agency (EPA) to regulate hazardous waste activities (1, 2). The Comprehensive Environmental Response, Compensation and Liability Act of 1980 (CERCLA or Superfund Act) requires the EPA to regulate cleanup activities of abandoned hazardous waste sites ( 3 ) . Implementation of both RCRA and CERCLA requires the development of reliable monitoring techniques that can identify hazardous substances in complex mixtures at concentrations ranging from pure compounds to trace levels. During hazardous waste remedial actions, a need exists to identify the major components of the waste mixtures present. This identification information is used to help set worker and community protection programs, correctly determine which compounds can be bulked together, and help identify appropriate disposal techniques. The two most common methods used to identify/classify hazardous wastes found in drums, tanks, and ponds are compatibility testing and gas
chromatography/mass spectrometry (GC/MS) (4). Compatibility testing is defined as a group of basic qualitative methods that separate the waste into disposable categories. The advantages of compatibility testing procedures are their speed and relative low cost. The major disadvantage of compatibility testing is that the method separates unknown wastes into general categories (i.e., halogenated organic, acid, base, sulfide) but does not identify specific compounds (4). GC/MS offers the ability to identify many volatile and semivolatile organic hazardous substances; however recent findings ( 5 , 6 )have documented major compound classes, for example, the isocyanates, that are not detected by this method. In addition to chromatographic problems, other disadvantages of GC/MS as a waste screening tool on remedial action sites include high cost and time constraints (4). The emergence of Fourier transform infrared (FT-IR) spectrometry has provided a spectral method of sufficient sensitivity and rapidity for environmental analysis, especially as a method for the identification of major components of hazardous waste. Once the identities of the components of a complex mixture are known, FT-IR spectrometry can also be used to analyze the mixture quantitatively (7). Preliminary reports of its use in identifying components of hazardous waste samples suggest a promising role for this technique in the analysis of environmental samples (8-13). A primary area where the advantages of qualitative FT-IR spectrometry can be realized is direct identification of major components of hazardous waste mixtures. Several research groups have reported the application of GC/FT-IR to the analysis of hazardous waste mixtures and priority pollutants (!+11, 13). However, the same problems of cost, timeliness, and thermal decomposition apply to this technique as to GC-MS. Some of these problems can be overcome by liquid chromatography-infrared spectrometry (LC-IR), but problems associated with the opacity of solvents to IR radiation limit the usefulness of this technique. A major advantage of these combined techniques is the availability of search algorithms for the identification of pure components after separation. Most of these techniques employ “forward” search algorithms where the unknown spectrum is compared to each member of the spectral data base (14-16). Although
0003-2700/86/0358-1 I56$01.50/0 0 1986 American Chemical Society
ANALYTICAL CHEMISTRY, VOL. 58, NO. 6, MAY 1986
these methods are quite successful for spectra of pure compounds, they tend to fail when the compound is heavily contaminated by a second compound. At best, a forward search might identify the two major components in an equal mixture. Interpretation techniques that can identify the components of a mixture have been reported (17-23). Most of these techniques utilize curve fitting or reverse search techniques to determine if a reference spectrum could be present in the mixture. However, these techniques frequently suffer from the basic assumption that the mixture can be constructed from a linear combination of reference spectra of the pure components. This is rarely true because an infrared spectrum is very sensitive to the local environment of the molecules; particularly for mixtures containing polar compounds. In these cases, the infrared peak absorption maxima and width may be changed by the presence of other molecules in the sample being measured. These differences make the use of intensity information particularly difficult when looking for low concentration components in the mixture. In this paper, we will describe the development of a rulebased decision tree program designed to identify the major components in a complex mixture from the infrared spectrum. Because of the difficulty in determining relative intensities of peaks for a single component in the spectrum of a mixture, only peak location information is used. This software is based on a three level filter algorithm designed to compensate for potential peak shifts in the mixture spectrum. The algorithm has properties of both a “hard Window” and a “fuzzy data set” approach. Both of these approaches are described in detail elsewhere (24). Basically, if a peak falls in a relatively wide frequency window assigned to a certain compound, a percentage of the overall “goodness”va1ue will be added to the total. As the Window size is reduced the amount of “goodness” is increased for each level where the peak in the unknown spectrum still meets the criteria. Statistical studies have been performed to evaluate the quality of the final goodness values for actual compound assignment. Based on these results, a goodness value greater than 0.60 out of a possible 0.99 generally indicated the presence of the compound in the mixture.
EXPERIMENTAL SECTION Thirty of the most commonly identified organic compounds on hazardous waste sites (6,25,26)were selected as the training set for the rules development. Emphasis was placed on common paint solvents and degreasers. Table I lists the compounds selected. The computer program PAIRS (27-33) was adapted to run on a Nicolet 1280 system and the computer language CONCISE (29) was used to convert the programming logic into English-like rules. The rules were compiled prior to interpretation to shorten interpretation time. The length of time required to interpret a spectrum through the 30 rules was 30 s. All solvents were Aldrich Spectrometric Grade or equivalent. Mixtures were prepared using equal volumes of each component. All spectra where generated on a Nicolet 20-SX system using two 13 X 2 mm KBr plates, with a background and sample signal averaging of 64 scans. The instrument resolution was 4 cm-l. Rules Development. PEAK PICKER, the automated peak selection routine used in selecting peaks for interpretation, has a threshold value that is set manually by the operator. Only peaks that maximize above the threshold value are selected by program. Using this routine, the operator makes the decision between actual peaks and noise. To totally automate the peak selection process, a FORTRAN routine was written to calculate the threshold value needed by PEAK PICKER prior to peak selection. This routine calculated the mean absorption of each spectrum and stored it in the threshold buffer. The low-resolution Aldrich Neat Library spectrum of each compound in the training set was individually processed to write the mixture analysis rules. The processing involved using the Nicolet PEAK PICKER routine to identify the ten largest peaks,
1157
Table I. Compounds Selected for Training Set compound name
no. of peaks selected for rules development
acetone anthracene benzene 2-butanone chlorobenzene chloroform o-cresol p-cresol dibutyl phthalate o-dichlorobenzene m-dichlorobenzene 1,l-dichloroethane 1,l-dichloroethene dichloromethane 1,2-dichloropropane 2,4-dimethylphenol ethylbenzene
10
hexachlorocyclopentadiene 2- hexanone
10
4-methyl-2-pentanone pentachloroethane phenol styrene 1,1,2,2-tetrachloroethane toluene 1,2,4-trichlorobenzene l,l,l-trichloroethane 1,1,2-trichloroethane 1,2,3-trichloropropane o-xylene
6 8 7 9 9 4
10 10
10 6 9 10 7 6
10 10 9 7 8
10 10 9 10 10 10 10
10 10
by intensity, from each spectrum. For compounds in the training set, e.g., 4-methyl-2-pentanone,that do not have ten peaks whose intensities are at least 15% of the largest peak, rules were written for less than the ten peaks present and the goodness values assigned per peak were increased. Table I lists the compounds selected for the training set and the number of peaks selected for rules development from each compound spectrum. Goodness values were assigned equally to all selected peaks because each peak has an equal probability of occurring in a mixture. A goodness of 0.99 was set as the maximum value for correct matches of all peaks. For example, the compound 4-methyl-2-pentanonehas seven peaks with intensity greater than 15% of the largest peak in the spectrum. Thus, each of the seven peaks used to write rules for 4-methyl-2-pentanoneidentification was assigned one-seventh of the total goodness (0.15). To control for the effect of peak location shifting, a three-level branch tree algorithm was developed where the goodness value assigned to each peak was split into three values. At each branch in the logic diagram a correct decision added the goodness value to the total for the compound and decreased the peak window length for the next rule decision. The peak windows were set at h10 cm-l for the first decision, f 5 cm-l for the second decision, and A 3 cm-l for the third decision with the first decision receiving 20% of the peak‘s goodness, the second decision receiving 30% of the peak‘s goodness, and the final decision receiving 50% of the peak’s goodness. The selection of window size was based on a trial and error study, were the advantages of tightening the windows (decrease of false positives) was weighed against the advantages of loosening the windows (increasein true positives). Figure 1details the logic diagram for the largest peak (at 1715 cm-’) in the spectrum of 4-methyl-2-pentanone. The algorithm operates in a fashion that if there is a peak in the spectrum of a sample in the window of 1725-1705 cm-l, the total goodness value for 4-methyl-2-pentanone is set to 0.03 and the peak window is shortened to 1720-1710 cm-’ for the next decision. If a peak is found in that window, 0.05 is added to the 4-methyl-2-pentanone total goodness and the window is shortened to 1718-1712 cm-’. If a peak is in that tightened window, 0.07 is added to the 4methyl-2-pentanonegoodness and the rules move on to the next peak in the 4-methyl-2-pentanonespectrum where this logic re-
1158
ANALYTICAL CHEMISTRY, VOL. 58, NO. 6, MAY 1986
Table 11. Example of Rule-Based Mixture Analysis on a Spectrum of Benzene Sorted by Goodness Value compound
There a Peak Between 1 7 2
Is
,'Jr',
goodness value
benzene chloroform toluene dichloromethane o-dichlorobenzene rn-dichlorobenzene styrene chlorobenzene o-xylene 1,2-dichloropropane 1,2,4-trichlorobenzene ethylbenzene phenol p-cresol 1,1,2-trichloroethane 1,2,3-trichloropropane hexachlorocyclopentadiene
0.99 0.25 0.22 0.17 0.17 0.16 0.14 0.13 0.10
Probability to 0.03
1
0.10
NO
0.10
(-Continue 7
0.07
0.04 0.02 0.02 0.02 0.02
2-hexanone pentachloroethane 1,l-dichloroethane 1,l-dichloroethane acetone
0.01 0.01 0.01
1,1,2,2-tetrachloroethane
0.01
4-methyl-2-pentanone o-cresol 2-butanone dibutyl phthalate l,l,l-trichloroethane anthracene 2,4-dimethylphenol
0.01 0.01 0.01 0.01 0.01 0.01 0.01
I O cm-' Window
YES
[ Probability - -
I
Is There a
Peak
Between 1 7 1 8
l a n d 1 7 1 2 cm-17
0.01 0.01
to t h e Probability
I i Continue 1
Flgure 1. Logic dlagram for rule-based mixture analysis,example peak
at 1715 cm-'.
peats. If a t any point in the rules, a peak is not found within a window, the rules continue to the beginning of the next peak. This logic was repeated for each peak selected for the 30 compounds in the training set. The rules were written in CONCISE using the Nicolet text editor to reduce the rule writing time and compiled on the 1280 system for use by the PAIRS program. The purpose of the PAIRS program was to process spectra through the compiled rules. PAIRS outputs the names of the analyzed compounds in decreasing order of goodness. Table I1 is an example of PAIRS output when a spectrum of benzene was interpreted.
RESULTS AND DISCUSSION It was necessary to calculate a goodness value where a decision would be made regarding the presence or absence of a compound. T o accomplish this, the 30 training set spectra were analyzed by the rules to determine the mean goodness value and the standard deviation for all negative results. When the 30 spectra were analyzed by each of the 30 rules, the mean of the 870 non-0.99 goodness values reported was
0.21, with a standard deviation of 0.19. A 90% confidence value of 0.59 was calculated as 2 times the standard deviation plus the mean. To define the presence of a compound in a mixture, a goodness value 20.60 was used. With 20.60 as the definition for the presence of a compound, three false positives and zero false negatives were identified in the 870 results. The false positives were a function of spectral similarity and will be discussed in detail later. To test the method's ability to identify components in actual mixtures, four groups of test mixtures were prepared and spectra were acquired and interpreted. Each group contained a single, a two-component, a three-component, a four-component, and a five-component mixture of compounds from the training set. As an example, Table I11 lists the results of the PAIRS analysis with the first group of mixtures. For each mixture, the top five results returned by the rules are listed. When a neat spectrum of ethylbenzene was analyzed by the rules, a goodness of 0.99 was returned for ethylbenzene. The second
Table 111. Results of PAIRS Analysis on Mixture Group 1 two-component mixture
one component ethylbenzene
ethylbenzene benzene
goodness ethylbenzene toluene o-xylene dibutyl phthalate styrene
three-component mixture
0.99" 0.62* 0.55 0.53 0.52
benzene ethylbenzene toluene dichlorobenzene o-xylene
ethylbenzene benzene chloroform 1,2-dichloropropane
ethylbenzene benzene chloroform
goodness 0.99" 0.87'
0.44 0.42 0.39
'True aositives. False Dositives. False negatives.
four-component mixture
goodness benzene ethylbenzene chloroform toluene o-xylene
0.99O 0.64' 0.60" 0.44" 0.44
five-component mixture ethylbenzene benzene chloroform 1,2-dichloropropane l,l,Z-trichloroethane
goodness benzene ethylbenzene dichloropropane chloroform o-xylene
0.83" 0.64" 0.62" 0.55c
0.52
goodness trichloroethane dichloropropane benzene ethylbenzene chloroform
0.75" 0.71" 0.68'
0.64" 0.55c
ANALYTICAL CHEMISTRY, VOL. 58, NO. 6, MAY 1986 Five-ComDonent Mixture
I
Three-Component Mlxture
i
1159
One- Component
W
0
z
W
0
4
z
m
a
U m K
sm
5:m
U
Five-Component Mixture
U
Ethylbenzene Benzene Chloroform 1.2-Dichloropropane I , I ,~-Tr~ch(oroethane Chloroform
u,L, .c
4
0
,-e 3100 2200 1300 WAVENUMBERS
400
Figure 2. Spectral comparison of ethylbenzene and flve-COmpOnent mixture. largest goodness value was 0.62 returned for toluene, which is above the defined 0.60 decision level and thus considered a false positive. The reason for this false positive is spectral similarity. When an ethylbenzene/benzene two-component mixture was analyzed, a 0.99 goodness was returned for benzene and 0.87 goodness was returned for ethylbenzene. The third largest goodness value 0.44 was returned for toluene, which is classified as a true negative. The major reason for the drop in toluene goodness between the one and two-component mixtures was due to loss of minor peaks during the PEAK PICKER routine, However, peak shifting also had an effect. When the ethylbenzene/benzene/chloroformthreecomponent mixture was analyzed, 0.99 goodness was again returned for benzene, 0.64 goodness was returned for ethylbenzene, and 0.60 goodness was returned for chloroform. The PAIRS results for the four-component mixture of ethylbenzene/benzene/chloroform/l,2-dichloropropane were 0.83 goodness for benzene, 0.64 goodness for ethylbenzene, 0.62 goodness for l,Zdichloropropane, and 0.55 goodness for chloroform (a false negative by definition). The fifth largest goodness value was 0.52 for o-xylene (a true negative). When the ethylbenzene/benzene/chloroform/l,2-dichloropropane/ 1,1,2-trichloroethanefive-component mixture was analyzed, a goodness value of 0.75 was returned for 1,1,2-trichloroethane, 0.71 goodness for 1,2-dichloropropane, 0.68 goodness for benzene, 0.64 goodness for ethylbenzene, and a goodness of 0.55 was again returned for chloroform (a false negative). Figure 2 compares the absorption spectra of the one component and five-component mixtures analyzed. Table IV lists the components of the four groups of mixtures interpreted and summarizes the results of the mixture interpretation experiment. Of the 600 decisions made determining the presence of components in the 20 spectra, there were 57 true positives, 12 false positives, 3 false negatives, and 528 true negatives identified. The sensitivity of the rule-based analysis method calculated from these results was 95%. Sensitivity is defied as the percentage of compounds actually present in the mixtures identified by the rules. The specificity of this method, defined as the percentage of compounds not actually present in the mixture and identified as such, was 98%. The 12 false positives identified during the mixture interpretation were caused by two phenomena, peak shifting and spectral similarity. A false positive caused by peak shifting is documented in Table V, which shows the goodness values calculated for chloroform during the analysis of mixture
1
0
1090 920 750 WAVENUMBERS
580
Figure 3. Spectral comparison of chloroform, three-component, and five-component mixtures of group 3.
I
Ethylbenzene
Toluene
40'10
31'00 2190 1280 WAVENUMBERS
370
Figure 4. Spectral comparison of ethylbenzene and toluene. group 3. Chloroform is not one of the components of the mixture but is identified as a false positive (goodness value of 0.62) in the three-component mixture chlorobenzene/tetrachloroethane/ 1,2,4-trichlorobenzene. Because chloroform has only four peaks for rule writing (Table V), each peak is assigned 25% of the total goodness value. Thus, when peak location shifting or spectral similarity effects of a mixture cause a peak to shift into one of its decision windows, the resulting goodness added to the chloroform total is significantly greater than for compounds that have rules based on a larger number of peaks. Table V shows the location of the four chloroform peaks used in writing rules and for each of the closest peaks in each of the five mixtures; the location, the absolute value of the difference between the closest peak and the rule; and the corresponding goodness value assigned to each difference. Figure 3 shows the absorption spectra, between 1260 and 580 cm-', of chloroform, the three-component mixture, and the five-component mixture. This figure shows the loss of peaks at 759 and 1216 cm-I into the shoulders of larger peaks, between the three- and five-component mixture. This resolution effect, the loss of these two peaks, was responsible for a goodness value loss to a resultant value of 0.24 and chloroform not being considered a false positive in the five-component mixture.
1160
ANALYTICAL CHEMISTRY, VOL. 58, NO. 6, MAY 1986
Table IV. Results of PAIRS Analysis on Mixtures mixture group no.
no. of components in mixture
I
1 2
2
last component added to mixture
3 4 5
ethylbenzene + benzene + chloroform + 1,2-dichloropropane + 1,1,2-trichloroethane
1 2 3 4
o-xylene + l,l,l-trichloroethane + o-dichlorobenzene + 1,l-dichloroethane
false negatives
true negatives
1 2
1 0 0 0 0
0 0 0 1 1
28 28
0 1 0 1 0
0 0 0 0
29 27 27 25 25
0 0 1 0 1
0 0 0 0 0
29 28 26 26
0 0 0 0 0
29 26 25
5
0 2 2 1 2
57
12
3
528
3 3 4 1 2
3 4
4
chlorobenzene + tetrachloroethane + 1,2,4-trichlorobenzene + pentachloroethane + o-xylene
1 2
3 4
5 4
false positives
+ hexachlorocyclopentadiene
5 3
true positives
1 2
3 4 5
1,l,a-trichloroethane
1
1 2
+ toluene + 1,l-dichloroethane + m-dichlorobenzene + 1,2-dichloropropane
2 3 4
5
3 4
totals
27
26 25
1
24
25 23
Table V. Location of Chloroform Peaks in Mixture Group 3 Documenting False Positive Results with a Three-Component Mixture mixture group 3: 1-component" mixture group 3: 3-componentb mixture group 3: 5-componentc location of peaks closest closest closest in chloroform rules, peak locations, absolute goodness peak locations, absolute goodness peak locations, absolute goodness cm-' cm-' difference values cm-l difference values cm-' difference value 670 759 1216 3020
662 740 1122
3059
08
19 94 39
0.05 0.00 0.00 0.00
160d
675 756 1216 2983
05 03 00
0.12 0.25 0.25
37
0.00
675 742 1203 3016
45d
05
0.12
14 13 04
0.00 0.00
0.12
36d 0.62ef
0.05e
0.24e
Chlorobenzene. Footnote a plus tetrachloroethane and 1,2,4-trichlorobenzene. Footnote b plus pentachloroethane and o-xylene. Total difference, cm-'. e Total goodness. 'Chloroform indicated as a false positive. The second reason for false positives is that the spectra of some compounds are extremely similar. For example, Figure 4 compares the absorption spectra of toluene and ethylbenzene. Because of spectral similarity, the rules written from the spectra are not exclusive. When a neat sample of ethylbenzene is analyzed by the rules, the toluene goodness is calculated to be 0.62. When a neat sample of toluene is analyzed, the ethylbenzene goodness is calculated to be 0.67. This effect, originally identified when the neat spectra were analyzed by the rules to define the 0.60 goodness cutoff, is responsible for 7 of the 12 false positives identified. This effect is also seen between 2-hexanone and 4-methyl-2-pentanone. False negatives (actual components with goodness scores below 0.60) and a general loss of goodness for compounds present as the mixture became more complex are a function of two phenomena: spectral shifting and the loss of peaks due to inadequate threshold selection by the PEAK PICKER routine. The goodness value drop for ethylbenzene between the one component (0.99) and five-component mixtures (0.64) of mixture group 1 is an example of how spectral shifting can lower goodness values. Table VI lists the peak locations used to write the rules for ethylbenzene: the closest peak in each of the five mixtures, the absolute value of the difference between the rule location and the closest peak, the total difference, and the calculated goodness value. Peak loss appears in the three-component mixture when the ethylbenzene peaks
1
Three-Component Mixture
1000
700 550 WAVENUMBERS
850
400
Spectral comparison of ethylbenzene and the three-component mixture of group 1.
Figure 5.
at 743 and 774 cm-I are obscured by a chloroform peak at 759 cm-'. This peak loss/shifting causes a goodness value loss of 0.20. Figure 5 compares, in the 1000-400 cm-' range, the absorption spectra of ethylbenzene and the three-component
ANALYTICAL CHEMISTRY, VOL. 58, NO. 6, MAY 1986
1161
Table VI. Location of Ethylbenzene Peaks in Mixture Group 1 Documenting Loss of Goodness Values due t o Spectral Loss/S hifting mixture group 1: 5-componentc mixture group 1: 3-componentb closest closest closest peak locations, absolute goodness peak locations, absolute goodness peak locations, absolute goodness cm-' difference values cm-' difference values cm-' difference values mixture group 1: 1-componentu
location of peaks in ethylbenzene
rules,dcm-' 2965 698 2931
2966 697
01 01
2932
3026 743
3027
01 01
1456 1495
1453 2874 1496
3061 774
3064 772
2871
746
03 03 03
0.10 0.10 0.10 0.10 0.10 0.10
03
0.10 0.10 0.10
02
0.10
01
2967
698 2933 3036 758 1453 2874 1496 3071 758
16'
02 00
0.10 0.10
2967 699
02
0.10 0.02
2933
10
15 03 03
0.00 0.10 0.10
01
0.10
10 16
0.02 0.00
3036 731
1453 2874 1496 3071 760
0.10 0.02 0.00
01
0.10 0.10 0.10
10 14
0.02 0.00
58e
62e
0.9$
0.10 0.10
02 01 02 10 12 03 03
0.64
0.64f
a Ethylbenzene. Footnote a plus benzene and chloroform. Footnote b plus 1,2-dichloropropaneand 1,1,2-trichloroethane. dPeak locations from the deresolved Aldrich library. e Total difference. f Total goodness.
mixture showing how the 759-cm-l chloroform peak obscures the two smaller ethylbenzene peaks. The second reason for false negatives is the routine used to select the threshold for peak selection, because the routine used selects peaks that maximize above the mean absorption in the spectrum and because, as the number of components in the mixtures increase the mean absorption in the spectrum increases, occasionally minor peaks of mixture components are lost during peak selection. Because these minor peaks have equal goodness weighting as larger peaks, when these peaks are lost the total goodness value assigned is significantly reduced. An example d this is observed with benzene in mixture group 1. Although benzene is correctly identified in the four mixtures in which it is present, its goodness drops from 0.99 in the two-component mixture to 0.68 in the five-component mixture. The two benzene peaks with the lowest intensity used in the rule writing were at 1814 and 1959 cm-l. As the number of other components in the mixture increase, the mean absorption in the spectrum increases. This causes the two smallest peaks in the benzene spectrum to be lost during peak selection. Figure 6 shows the absorption spectra, between 2370 and 1330 cm-l, for the two-component, four-component, and the five-component mixtures, in mixture group 1. The straight lines drawn through the three spectra represent the location of the mean absorption in the spectra (threshold) used to select peaks for PAIRS interpretation. Because the two-component mixture of ethylbenzene/ benzene has fewer peaks compared to the more complex mixtures, the two benzene peaks at 1814 and 1959 cm-l maximize above the threshold selected and are thus selected by the PEAK PICKER routine and interpreted by the rules, and the benzene goodness is calculated to be 0.99. However, in the four-component mixture, the number of peaks in the spectrum has significantly increased. When the threshold level, calculated as the mean absorption of the spectrum, is used to select peaks for interpretation, the benzene peak at 1814 cm-' is selected, but the benzene peak a t 1959 cm-l maximizes below the threshold level and is not selected. During interpretation of this spectrum, when the computer program reaches the benzene 1959-cm-I rule, no peak is found and the benzene goodness total is calculated to be 0.83. When the threshold value is calculated for the five-component mixture, neither benzene peak at 1814 or 1959 cm-l is selected for interpretation and the benzene goodness lowers to 0.68. Limitations of this method include the inability to identify chemicals with few peaks. For example, the spectrum of
il
Five-Component Mixture
WI
2370
2119
18'50
15'80
1330
WAVENUMBERS
Flgure 6. Location of mean absorbance value in a two-, four-, and five-component mixture spectrum.
carbon tetrachloride, a compound identified frequently on hazardous waste sites, has only one peak. If rules were written and the entire 0.99 goodness was assigned to that single peak, carbon tetrachloride would become a false positive in the interpretation of many neat and mixture spectra.
CONCLUSIONS New rules were written for PAIRS that adapted it from a program that identifies functional groups of pure compounds into a rule-based artificial intelligence program designed to identify the major components in an unknown mixture, by interpreting the infrared spectrum of the mixture. This method was shown to have qualitative ability within the scope of the study parameters. False positives were determined to be a function of spectral shifting and similarity. False negatives where shown to be a function of spectral shifting and the inadequate threshold selection routine used in peak selection. An initial study set of 30 compounds was used in this experiment. Developments of this method that are in progress include an expansion of the study set; developing and testing an automated threshold selection technique for optimizing peak selection in mixtures; a method of subtracting the effect of
1162
Anal. Chem. lQ86, 58,1182-1167
spectral similarity, either prior or after interpretation; comparing spectral generation techniques to identify an optimal technique for unknown organic mixtures; and identifying the limits of detection for this technique by preparing mixtures at varying concentrations. These improvements will be followed by comparing this method to presently available analysis techniques by analyzing actual organic hazardous waste samples.
ACKNOWLEDGMENT The authors wish to express their sincere appreciation to Sterling Tomellini for his help in adapting INTERP to our Nicolet 1280 computer system. Registry No. Acetone, 67-64-1; anthracene, 120-12-7;benzene, 71-43-2;2-butanone, 78-93-3;chlorobenzene, 108-90-7;chloroform, 67-66-3; o-cresol, 95-48-7;p-cresol, 106-44-5;dibutyl phthalate, 84-74-2;o-dichlorobenzene,95-50-1;m-dichlorobenzene,541-73-1; 1,l-dichloroethane, 75-34-3; 1,l-dichloroethene, 75-35-4; dichloromethane, 75-09-2; 1,2-dichloropropane, 78-87-5; 2,4-dimethylphenol, 105-67-9; ethylbenzene, 100-41-4; hexachlorocyclopentadiene,77-47-4; 2-hexanone,591-78-6;4-methyl-2-pentanone, 108-10-1;pentachloroethane, 76-01-7; phenol, 108-95-2; styrene, 100-42-5; 1,1,2,2-tetrachloroethane,79-34-5; toluene, 108-88-3;1,2,4-trichlorobenzene,120-82-1;l,l,l-trichloroethane, 71-55-6; 1,1,2-trichloroethane, 79-00-5; 1,2,3-trichloropropane, 96-18-4; o-xylene, 95-47-6.
LITERATURE CITED “Resource Conservation and Recovery Act of 1976”; Publlc Law 94580, 1976. The Hazardous and Solid Waste Amendments of 1984 Congr. Rec. 1984, Oct 3, H11103. “Comprehensive Environmental Response Compensation and Llabliity Act of 1980”; Public Law 96-510, 1980. Puskar, M. A,; Levine, S. P.; Turpln, R. I n “Protecting Personnel at Hazardous Waste Sites”; Levine and Martin, Eds.; ButterworthslAnn Arbor: Woburn, MA, 1985; Chapter 6. Gurka, D. F. “ProJect Summary: Interlaboratory Comparison Study: Methods for Volatile and Semivolatile Compounds”; Envlronmental Monitoring Systems Laboratory, Las Vegas, NV, EPA-600/S4-84-027, June 1984. Hailstedt, P. A.; Levine, S. P.; Puskar, M. A., submitted for publication in J. Hazard. Waste Hazard. Meter. Frankel. D. S . Anal. Chem. 1984, 56, 1011. Levine, S. P.; Puskar, M. A. Am. Ind. Hyg. Assoc. J . 1985, 46 (4), 181-186.
(9) Gurka, D. F.; Betowski, L, D. Anal. Chem. 1982, 5 4 , 1819-1824.
(IO) Wilkins, C. L.; Giss, G. N.; White, R. L.; Brissey, G. M.; Onyiriuka, E. C. Anal. Chem. 1982. 5 4 , 2260-2264. (11) Shafer, K. H.; Hayes, T. L.; Brasch. J. W.; Jakobsen, R. J. Anal. Chem. 1904, 56, 237-240. (12) Puskar, M. A.; Levine, S. P. “Characterization of Bulk Materials on Remedial Action Sites: A Preiiminary Comparison of Compatibility Testing, FT-IRIATR and GC/MS”, Proceedings of the National Conference on Hazardous Waste and Environmental Emergencies, May 1985. Shafer, K. H.; Cooke, M.; DeRoos, F.; Jakobsen, R. J.; Rosario, 0.; Mulik, J. D. Appl. Spectrosc. 1981, 35, 469. Erickson. M. D. Appi. Spectrosc. 1981, 35, 181-184. Griffiths, P. R.; Azarraga, L. V.; de Haseth, J.; Hannah, R. W.; Jakobsen, R. J.; Ennis, M. M. Appl. Spectrosc. 1979, 33, 543. Lowry, S.R.; Huppler, D. A. Paper presented at 31st Pittsburgh Conference on Analytical Chemistry and Applied Spectroscopy, Atlantic City, NJ, March 10-14, 1980. Rasmussen, G. T.; Isenhour, T. L.; Lowry, S. R.; Ritter, G. L. Anal. Chim. Acta 1978, 103, 213-221. Antoon, M. K., D’Esposito, L., Koenlg, J. L. Appl. Spectrosc. 1979, 33. 351-357. Brown, C. W.; Lynch, P. F.; Obremski, R. J.; Lavery, D. S. Anal. chem. 1982, 54, 1472-1479. Maris, M. A.; Brown, C. W.; Lavery, D. S. Anal. Chem. 1983, 55, 1694-1 703. Osten, D. W.; Kowalskl, E. R. Anal. Chem. 1985, 5 7 , 908-917. Frankel, D. S. Anal. Chem. 1984, 56, 1011-1014. Truison, M. 0.; Munk, M. E. Anal. Chem. 1983, 55, 2137-2142. Blaffert, T. Anal. Chim. Acta 1984. 161, 135-148. Blackman, W. C.; Garnas, R . L.; Preston, J. E.; Swibas, C. M. “Chembal Composition of Drum Samples from Hazardous Waste Sites”; Management of Uncontrolled Hazardous Waste Sites, Nov 1984. Eckel, W. P.; Trees, D. P.; Kovell, S. P. “Distribution and Concentration of Chemicals and Toxic Materials Found at Hazarouds Wasp Dump Sltes”; Proceedings of the National Conference on Hazardous Waste and Environmental Emergencles, May 1985. Woodruff, H. E.; Munk, M. E. J. Org. Chem. 1977, 42, 1761-1767. Woodruff, H. E.; Munk, M. E. Anal. Chim. Acta 1977, 95, 13-23. Woodruff, H. E.; Smith, G. E. Anal. Chem. 1980, 52, 2321-2327. Woodruff, H. E.; Smith, 0. E. Anal. Chim. Acta 1981, 133, 545-553. Tomellini, S. A.; Saperstein, D. D.; Stevenson, J. M.; Smith, G. M.; Woodruff, H. E. Anal. Chem. 1981, 53, 2367-2369. Tomelllnl, S. A.; Stevenson, J. M.; Woodruff, H. E. Anal. Chem. 1984, 56,67-70. Tomeilini, S. A.; Hartwick, R. A.; Stevenson, J. M.; Woodruff, H. E. Anal. Chim. Acta 1984, 162, 227-240.
RECEIVED for review August 20, 1985. Accepted December 23, 1985. This work was supported by Grant 1-R010H02066-01 from the National Institute for Occupational Safety and Health of Centers for Disease Control.
Fast Algorithm for the Resolution of Spectra Richard A. Caruana, Roger B. Searle, Thomas Heller, and Saul I. Shupack* Departments of Chemistry and Computer Science, Villanova University, Villanova, Pennsylvania 19085 An algorlthm is presented that resolves spectra into Gaussian bands by iteratively applying an efficient linear least-squares analysis to the Individual bands. The advantages of the method are that the initlai parameterization of the spectrum Is often simpler and the procedure is very fast thereby enabling practical implementation on microcomputers. Tests performed on microcomputers Indicate that the algorithm Is capable of resolving difficult spectra at least 5-10 times faster than the conventional error space mlnimizatlon techniques and can resolve simpler spectra about 50-100 times faster. Some of the techniques employed may be of benefit to users of other resolution methods. Novel measures of the goodness of fit are presented, which aid interpretation of the convergence path and the final fit obtained.
Methods have been developed to resolve spectra (or other data sets of overlapped bands) into their component bands 0003-2700/86/0356-1 162$01.50/0
to obtain physically meaningful interpretations or to compact the information. Characteristically these methods present trade-offs in accuracy, speed, computational complexity, and power. Computers with a significant amount of memory and processing power allow the use of nonlinear least-squares techniques, iterative methods that attempt to improve upon an initial set of parameter estimates by direct minimization of some measure of error. Most commonly employed are the gradient methods (1-4), simplex methods (5), and the many variations of these (1,2,6, 7). While many of these can be adapted to microcomputers, the amount of memory and/or time required for the analysis increases significantly with increasing problem size, making their use on microcomputers impractical for all but the simplest problems. Some of these methods also suffer from precision problems, and the use of double precision further aggravates the memory and time problems. 0 1986 American Chemical Society