Expert System Based on Principal Component Analysis for the Identification of Molecular Structures from Vapor-Phase Infrared Spectra. 1. Theory. Identification of Alcohols Jonathan H.Perkins,’J Erik J. Hasenoehrl,’s2and Peter R. Griffiths*JV2 Center for Hazardous Waste Remediation Research and Department of Chemistry, University of Idaho, Moscow, Idaho 83843 The concept of an expert system for the determlnatlon of mokcular structure from a vapor-phasa FTIR spechum rwing prlnclpal Component analysis le descrlbed. The prlnclpal andyds Is slpp#ed to a spectral tralnlng set of two classes of compounds (namely alcohols and non-alcohols). The scores of the components can be used to describe a dlscrhnlnatlon rule that clasdfks unknowns to enher class (Le., Id.ntltkr the presence of the alcohol functbnallty In the structure). Data scaling and welghtlng techniques applled to the tralnlng set spectra are shown to Improve the wparatlon power d the fist and second prlnclpal components. For the example shown, namely testlng for the presence of the alcohol functknallty, a rule Is descrlbd that correctly klentliks the presence or absence of the alcohol functlonallty 96% of the tkne In a valldatlon set. An expert system to derlve an estlmate of molecular structure solely from vapor-phase Infrared spectra could be derlved from a serles of such rules.
INTRODUCTION An expert system can be described as a set of rules that process a given set of information to answer a specific set of questions. An expert system is an inference algorithm that attempts to duplicate the abilities of a human expert (1). Two difficulties arise in the generation and application of expert systems. The first is the “knowledge acquisition bottleneck” (2). This bottleneck arises from the difficulty in translating the human expert’s knowledge to computer format. ‘The second difficult step is the translation of the available information for a given problem into a relevant format. The use of a training set is designed to answer the first difficulty. The expert system is generated (trained) using a set of problems where the input information and correct answers are available. A large number of methods (known collectively as classification techniques) can be used to correlate the input informationto the correct answer. Systematic methods that correlate input information to answers have advantages over directly translating a human expert’s knowledge. Correlation methods are simpler and do not require an expert in the field of the proposed expert system. Furthermore, rules can be generated that were previously unknown even to the experts. Multivariate linear data reduction techniques, such as principal component analysis, can overcome the second difficulty. Large amounts of quantitative data for the training set can be reduced to a few pertinent linear combinations of the initial data that adequately describe the input information. One field that is suited to the application of expert systems in the interpretation of chemical data for qualitative analysis. For example, a skilled spectroscopist can derive a very good
* Author to whom correspondence should be addressed. Department of Chemistry. ‘Center for Hazardous Waste Remediation Research. Present address: Mobil Research and Development Corporation, Paulsboro, NJ 08066. 0003-2700/91/0383-1738$02.50/0
estimate of a molecular structure from a midinfrared spectrum. Nevertheless, despite the great increase in popularity of infrared spectrometry with the development of inexpensive, sensitive Fourier transform infrared (FTIR) spectrometers, only a very small percentage of users can identify a structure without resorting to reference book or library searching. This work focuses on the derivation of an expert system for automating the process of structural elucidation from infrared spectra. An expert system that can objectively determine a molecular structure from its infrared spectrum would be of great utility. It would be a useful adjunct to a chromatographic hyphenated system, such as GC/FTIR, where it could assist with the identification of each component separated by the chromatography. This would greatly reduce the workload associated with the identification of the hundreds of components. It would also be helpful in determining if a chromatographic peak were truly a single component or two unresolved peaks. The assigned structure of an unresolved chromatographic peak might be nonsensical or the expert system might be able to report that the “compound”does not fit into the rules that it has learned. An expert system would also be useful in doing quality control on large spectral libraries (3). It is very difficult to check each spectrum in a large library by hand. The expert system can run through the library and mark the entries where the spectra and structures do not match within certain tolerances. These library entries can then be checked by hand. It is unlikely that any expert system could derive an exact representation of the structure of a large molecule from its infrared spectrum. Some questions (such as the number of carbons and carbon skeleton arrangement) are less amenable to infrared spectral determination than others. It is more likely that the expert system can identify broad classes to which the unknown molecule belongs (e.g., alcohols). Such an “incomplete” expert system would still be quite useful, particularly in conjunction with a library searchingtechnique. The library can be sorted into classes of molecules, and the expert system can tell which lists should be searched for a match. This should greatly reduce search times and increase the efficiency of using extremely large libraries (perhaps to the point of using them in real time with a chromatographic technique). Previous work has involved the use of techniques such as principal component analysis to compress and sort libraries (4,5) and prefiiter unknowns (61,but these techniques are based on the abstract principal component scores to direct structural assignments. Zupan and Munk (7)used a hierarchical tree structure based on similarityof low-order Fourier component scores to sort a library and aid in the search routine. The expert system can be examined in two regimes: the structure or arrangement of the rules and the actual rules themselves. The arrangement of the rules in an expert system is usually thought of as a tree formation, where the answer to a question (or rule) directs which branch to follow (that is, which question to ask next). The final answer to the problem lies at the end of the branch. This approach is @ 1991 American Chemical Society
ANALYTICAL CHEMISTRY, VOL. 63, NO. 17, SEPTEMBER 1, 1991
appropriate for an expert system with a finite set of answers. For the determination of a molecular structure, a more efficient arrangement is a simple list. For example, (1) does the structure contain an alcohol functionality, (2) does the structure contain a carbonyl functionality, (3) does the structure contain an amine functionality, etc. The reason for the simple list format is that the answers to the questions are not exclusive; i.e., a structure can contain both an alcohol and a carbonyl functionality. However, there will be some tree structure in this expert system. For example, it is unneceesary to ask questions about whether the analyte is a primary, secondary, or tertiary aliphatic alcohol or a phenol until the presence of an OH group is established. Additionally, there may be some tree structures within any specific question. For example, if the result from a rule is uncertain, secondary rules can be used to remove or quantify the uncertainty. This work is concerned more with the individual rules as opposed to the entire structure of the expert system. Previous expert systems, such as Program for the Analysis of Infrared Spectra (PAIRS)(8-121, that derive molecular structures from the infrared spectra use rules based on lists of peak locations and amplitudes (and sometimes widths). These types of rules present a number of difficulties. The automated determination of peak parameters (location, amplitude, and width) can be accomplished via curve fitting (13). Accurate determination of the peak parameters can take a great deal of computer time. This presents a problem not only for processing an unknown but also makes the generation of a large training set difficult. Either a small training set must be used or an experienced spectroscopist must develop the rule directly. The rules in this expert system are based on principal component analysis (PCA) (14) and classification or pattern recognition techniques (15-18). For each rule (such as whether or not the structure contains an alcohol functionality), the PCA is used to reduce the dimensionality of the training-set spectra down to a manageable number (e.g., 2). It provides a cross-sectionalwindow for examinii the spectral space. The location of the training-set spectra in that window is then correlated to the structural information by some classification technique. A similar approach for mass spectra is described by Harrington et al. (19) and for gas chromatographic data by Derde et al. (20). Frankel (3) described the use of PCA for aiding the interpretation of IR spectra of environmental mixtures. Shah and Gemperline (21) described the use of PCA combined with Mahalanobis distance calculations on nearinfrared spectra for identifying pharmaceutical raw materials. By using a large well-establishedinfrared library as a training set, it is possible to generate a simple linear rule relating spectral information to structural information. Any unknown sample can then be projected onto that window and thus classified (e.g., as to whether it is an alcohol or not). Thus, there is some linear transform of a spectrum that can qualitatively answer a molecular structure question. In principle, each question in the expert system has its own transform, optimized to provide the best separation between the two classifications (presence or absence of the functional group). This method of generating an expert system offers several advantages over the peak list approach, such as bypassing the need for subjective expert determination of the characteristic features of a class (22,23)and bypassing the need for arbitrary peak thresholds (24). Some work has been done on overcoming these limitations of a peak lit approach by developing a self-training PAIRS algorithm (12,25, 26). In this paper, we discuss the use of principal component analysis, as the basis of an expert system for determining the functional groups in a particular molecule from ita vapor-phase infrared spectrum. The discrimination of alcohols and non-
1730
alcohols will be used to exemplify the techniques being discussed.
THEORY If one considers a set of infrared spectra of lo00 data points each, the individual spectra can be thought of as 1000-dimensional vectors. Each spectrum can be represented as a point in a 1000-dimensional space (spectral space). Each axis of the space corresponds to a wavenumber. The coordinate of a spectral vector along one axis is the absorbance a t the correspondingwavenumber. Each spectrum maps to one and only one point in the spectral space and each point maps to one and only one spectrum. No information is lost in the mapping to the space. It is simply a different representation of the same information. Because the shape of a spectrum (location and amplitudes of absorption bands) is dependent on the structure of the molecule, the position of the spectral vector in the 1000-dimensional space is also determined by the structure of the molecule. If the set of spectra is associated with compounds of two different classes (e.g., alcohols and non-alcohols), the points in the spectral space can be labeled as such. Since the classification of the compounds can be determined from the spectrum (by an expert spectroscopist), it can be inferred that the same determination can be accomplished by an examination of the spectral space. A characteristic spectral shape of alcohols (for example, the 0-H stretching band near 3600 cm-’) translates to a specific pattern of positions in the spectral space. If the pattern can be identified and quantified, then an objective algorithm (rule) can be devised that determines the class of an unknown compound from its position in the spectral space. The major difficulty in the procedure described above is the high dimensionality of the spectral space. It is difficult or impossible for a human to envision a space with a dimension greater than three, and that makes it difficult to devise a set of pattern recognition rules. Furthermore, if the pattern is simple (e.g., the classes are separated by a single hyperplane), then the high dimensionality is redundant and simply an impediment to the solution. PCA can be used to reduce the dimensionality of the data set by defining axes of major importance and projecting the spectral vectors onto those axes. The spectra can be pretreated by wavenumber-dependent scaling factors to rotate the principal components to enhance the calssification capabilities of the first few components. Thus, the spectra space can be represented with one or two dimensions that optimize the separation of two classes (such as alcohols and non-alcohols). Principal component analysis has been thoroughly discussed elsewhere (14,27)and will be given only a cursory treatment in this paper. The PCA done in this work was accomplished by singular value decomposition (SVD),which was executed by the nonlinear iterative partial least-squares (NIPLS) algorithm (28). A training set of n spectra with representatives from each of two classes is collected. Each spectrum has m measurements. The training-set spectra comprise an n by m data matrix, X,where each row is a sample spectrum and each column corresponds to a wavenumber. SVD decomposes X to the product of three matrices
X=UZVT+E (1) where U (nby p ) is the matrix of left singular vectors arranged as columns, Vr (p by m)is the matrix of right singular vectors arranged as rows, Z (p by p ) is a diagonal matrix of singular values, p is the number of components into which X was decomposed, and E (n by m)is the matrix of residuals. If p equals the rank of X,then E is null. The set of column vectors in U is orthonormal and the same is true of the row vectors in VT. The values in 2 are proportional to the relative im-
1740
9
ANALYTICAL CHEMISTRY, VOL. 63,NO. 17, SEPTEMBER 1, 1991
portance (or scale) of the components in determining X. The order of the vectors in U and V and the values in 2 is set such that each value in 2 is greater than those to the right and below. Thus, the first vectors (or components) describe most of X (Le., they model most of the variance). The values in U (multiplied by the correspondingelements from 2 ) are referred to as scores, and the values in V are loadings. There is a row in U for each sample. The rows in U can be thought of as reduced dimensionality spectra (of dimension p instead of m). If p equals 2, then each sample will have two scores ( [ U ~ , ~ , Ufor ~ , sample ~] i). The training-set spectra can be represented in a two-dimensional scatter plot. This scatter plot can be thought of as a two-dimensional window (or cross section) of the spectral space. The values [ U ~ , ~ , are U ~ plotted ~] rather than the true scores [ U U ~ , ~ , U Ufor ~,~] plotting convenience as the abscissa and ordinate ranges are approximately the same after the removal of the u values. The scaling causes no real effect on the information content of the plots. Of course p can be greater than 2 and the scatter plot can be done with any pair of components. If the first two components are used, then the plot represents the cross section with the greatest width (variance), because the first components model the greatest variance in the matrix. One would hope that the two classes of samples are well separated in one of these scores plots. One of the most useful aspects of PCA is that a new, unknown spectrum can easily be projected onto one of these scores plots. For a given component, i, the dot product of the spectrum of the unknown and row i from VT is determined and then divided by element uii from 2. This provides the u value for this unknown and this component. By placing the unknown on the plot, the class of the unknown can be determined by one of several different classification techniques. As described above, there is one difficulty in looking at the plotted scores. The class separation is not necessarily optimized in the first two component scores or for that matter in any pair of components. Two data pretreatment steps have been applied to rotate the scores plots toward separating the two classes. The first step in the data pretreatment is autoscaling. Autoscaling consists of two steps: mean centering and variance scaling. Mean centering consists merely of determining the mean spectrum of the training set and subtracting that mean spectrum from each member of the training set. If the data set is considered as a collection of points in the spectral space, mean centering moves the collection of points such that the mean is at the origin. Variance scaling consists of determining the variance of each measurement (finding the variance spectrum) across the training set and dividing each measurement in each sample by the square root of the corresponding variance. In this work, the square root of the sum of squares (SQSS)was used instead of the square root of the variance, but the two cases are equivalent, differing only by a constant, (n - l)-ll2, across the spectrum. Graphically, variance scaling is more difficult to interpret, at least in the spectral space. In the sample space, where each axis corresponds to a sample and each measurement corresponds to a vector, then dividing the measurements by the SQSS sets the vectors to unit length. As far as the PCA is concerned, autoscaling the data has the effect of giving all the measurements equal weight in the determination of the principal components. With the original data, a wavenumber where there is an absorbance maximum has a greater effect on the “shape” of the collection of sample points in the spectral space than a wavenumber in a baseline region. After autoscaling, the magnitudes of the variance for each measurement (size of the collection of sample points in
that direction) are all equal. Autoscaling is normally used in cases where the measurements have different units (for example, samples of seawater could be characterized by collection depth and magnesium concentration). T o compare distance and concentration fairly in a multivariate technique such as PCA, the data are autoscaled to make them dimensionless. Measurements are expressed as standard deviations away from the mean. It can be argued that autoscaling is inappropriate for spectrometric data because the measurementa for which the weight factor is small are those where there is a high absorbance that corresponds to a high signal. Autoscaling amplifies the baseline at the expense of the signal. While that is true, in this analysis it is not known a priori what the spectral distinction between two classes will be. The difference between two classes may be subtle and that subtle change may be swamped by the apparently random variations in the large peaks in the spectra. Thus, after autoscaling, all the measurements have an equal chance at showing their ability to distinguish the two classes. The next step in the pretreatment is feature weighting. Feature weighting scales the autoscaled measurements such that they no longer have equal variances. The feature weight is the ratio of the intercategory variances to the sum of the intracategory variances (1). The measurements are now weighted not by their original magnitudes but by their ability to separate the two classes. A feature weight wk is derived for each measurement k. Each measurement is then multiplied by the feature weight. The feature weight for two clasaea of compounds, I and 11, is calculated as follows
Wk(1,II) =
+ C X I I 2 / N I I - 2CXICXII/NINII (2) C(X1 - W/NI + m I I - f1d2/NII
CXIZ/NI
where xI and X U are measurements at wavenumber k for the samples of classes I and 11, respectively. The number of samples in the training set of classes I and I1 are NI and Nn, respectively. The greater the discriminating ability of a measurement at a particular wavenumber, the greater the feature weight. If a measurement has no discriminating power, then the feature weight is one. The feature weight is analogous to the resolution parameter of chromatography. In a histogram of the values of a measurement for two classes of samples, a discriminatory measurement will produce a bimodal distribution. The feature weight is the ratio of the variance between the two modes and the variance inside each mode. The effect of the feature weighting can be interpreted graphically. In the spectral space, the training-set samples define a certain shape. The PCA will define the “major axes” of the shape. The feature weighting of the spectra stretches the shape of the data envelope in the direction of those original axes (measurements) where the classes are well separated. In stretching the shape, the PCA is forced to rotate the principal components toward these directions and thus the separation in the scores plots is improved. The window is turned toward a more efficient direction for the separation. Prior to the feature weighting, all samples are equivalent and the PCA has no information about the two classes. Feature weighting is the mechanism for including classification information into the PCA model. A decade ago it was claimed (29) that the determination of the proper features from multivariate data was the key problem holding pattern recognition back. Feature weighting provides an avenue toward that end. While feature weighting has been applied to other types of chemometric applications, the use of feature weights as an aid to the interpretation of IR spectra has not been reported in the literature. Many techniques and elements in techniques have been used,however, that are close in nature to the feature weighting approach. Visser and van der Mass (30,31)propose the terms “specific”, “selective”, and “pseudospecific”as de-
ANALYTICAL CHEMISTRY, VOL. 63, NO. 17, SEPTEMBER 1. 1991
scriptors of a peak’s ability to discriminate the presence of a functionality. Feature weights can be compared to the numbers generated by fuzzy set theory (32,33). The occurrence distributions (25), class spectrum profiles, and region weight constants (34) generated by automated PAIRS rule generators are also similar to feature weights but differ because they will include peaks common to both classes whereas feature weights are large only at distinctive peaks. Woodruff et al. (18)described pattern recognition methods for classifying binary-coded spectra. They described a normalized sum spectrum that is similar to the feature weight spectrum. The feature weights can be modified in a number of ways. The first is to raise them to some power (e.g., the square). By raising the feature weights to a power greater than 1 before applying them, the larger weights are emphasized relative to the smaller ones. This can be advantageousin the cases where the smaller feature weights are spurious or the separation is based mostly on one wavenumber. Spurious feature weight peaks can be created by the use of a training set of finite size. Nominally the training set should consist of representative samples from both classes under investigation (alcohols and non-alcohols in this example). If the non-alcohol group should contain more carbonyl-containing compounds than the alcohol group, then the carbonyl stretching frequency (1700 cm-’) will generate a feature weight greater than 1. This feature weight is spurious in that it was created entirely by an asymmetry in the training set. These feature weights will improve the separation for the training set compounds but will be a detriment for predicting the class of unknown compounds. A perfect training set for alcohols would have to contain equal quantities of carbonyls,amines, aromatics, etc., in both halves of the training set (alcohols and non-alcohols). This is quite difficult in practice, particularly with a small training set. Fortunately, these spurious feature weights will be small compared to the feature weights generated by “true” alcohol effects. The difference in magnitudes can be magnified by squaring the feature weights. This process could obviously be extended to cubing and further. As the power is increased, one wavenumber (the one with the highest feature weight and best separation) will become far more important than the others, to the point of essentially zeroing out the rest of the spectrum. This case would be applicable to the determination of a certain functional group having a unique absorption band-a very rare occurrence. Most functional groups lead to the presence of several significant bands in the spectrum, and increasing the power provides a continuum between the pure multivariate case (power = 1)to a univariate case (power = 01). Another way to modify the feature weights is to subtract 1from each calculated value of w,JI,J.I). The minimum feature weight that can be calculated from eq 2 is 1. A wavenumber that provides no separation at all is unmodified by feature weighting. It still is present in the data and still has an effect on the results of the PCA. If 1 is subtracted from the feature weights, then those wavenumbers that provide no separating power are essentially removed from the data set. The procesa of subtracting 1 and raising to a power can be combined in any order desired. Once the spectra have been projected onto a two-dimensional scatter plot, a classification rule that separates the two classes must be derived. In order to derive a classification rule, the scores of the samples in each of the two classes are assumed to be normally distributed about the class means. The distributions are bivariate (along components 1 and 2) and each is normalized such that the total volume under the surface is equal to the a priori probability that a random draw from the population of all organic molecules will fall in that class. These probabilities for alcohols/non-alcohols were
1741
estimated by the number of members of these classes in the Sadtler Library of Vapor-Phase Infrared Spectra as 0.2 for the alcohols and 0.8 for the non-alcohols. These numbers are obviously just an approximate estimate. This expert system will be used for organic vapor phase IR spectra so the parent population is not really all possible organic compounds but only those that can be analyzed by vapor phase IR, so the library population is not an unreasonable representative of that parent population. If an application arises where the a priori probabilities are different (e.g., presence of an alcohol is more likely), then the Bayesian classification rule would need to be recalculated to accurately model the situation. If the probabilities are unknown, then values must be assumed in order to use the Bayesian approach. The bivariate distributions define the probabilities of the position of a random organic molecule spectrum in this twodimensional space. Given a location of an unknown in this space, the probabilities of a random draw from each class landing in that spot can be calculated. Bayes’ rule simply states that the unknown should be assigned to the class with highest probability (35). This translates to the class whose distribution has the greatest height at the location. The bivariate distribution height for class i, H i ( x ) ,at a given location is given by
where x is the two-dimensional vector of the location, pi is the mean vector for class i, Xi is the covariance matrix of the class i training set, and Pi is the a priori probability of a sample being in class i. The locus of points x that satisfy the equality (4)
defines a quadratic form in the two dimensions (a line, parabola, hyperbola, or ellipse). A simple discriminatescore, di(x), can be defined as di(x) = In (IZil)
+ (x - pi)’Zyl(x - pi) - 2 In Pi
(5)
For each unknown with a position x in a two-dimensional scatter plot, a discriminate score can be calculated for each class. The unknown is assigned to the class with the lowest discriminate score. The discriminate score is a measure of the distance of the unknown to the class mean in terms of the class variance in the direction to the mean. The relative values of dl and d2define whether an unknown is inside or outside the locus defined by eq 4. The use of eq 5 assumes that the cost of a false positive and a false negative is equal. For a given application, thismay not be the case (e.g., missing the presence of a toxic material in an environmental sample can cost more than a false alarm). Under these circumstances, Bayes’ rule can be modified to give the most cost-effective rule.
EXPERIMENTAL SECTION For this paper, only one type of classification (alcohols and non-alcohols) has been used for the purpose of illustration. Subsequent papers will show the application to several other functional groups. Fifty-two spectra (26 alcohols and 26 nonalcohols) were extracted from the Sadtler library of vapor-phase FTIR spectra (Sadtler Research Division of Bio-Rad Laboratories, Philadelphia, PA). The samples were chosen by hand with the intent to provide a wide range of molecular types within each classification in order to minimize the errors of spurious feature weighting and to maximize the robustness of the predictive ability of the model. The spectral dimension was reduced by keeping every fourth data point, resulting in 460 data points between 4000 and 470 cm-’. A number of runs of the PCA were done by using different pretreatment techniques. The spectra were pretreated and principal component analyzed by software written in Turbo Pascal and Turbo C (Borland International, Scotts Valley, CA).
ANALYTICAL CHEMISTRY, VOL. 83, NO. 17, SEPTEMBER 1, 1991
1742
05
"'1
Non-alcohol 50% ellipse
..
/
\\\
N
Alcohol
-
0.3
Alcohol 50% ellipse
03 L r
0 4 ,
1
02-
c
5
01-
3
2
0-
i
s 0
II
I
031 04
0 5 , -03
-m
-01-
4-
-0.2-
E
-
-0.3
-
-04
1
05
-02
-0 1
0 01 02 Principal Component 3
03
Non-alcohol
0.1
-0 2
04
02
0
04
Principal Component 1
Figure 1. Scatter plot of the sample spectra In the two dimensions of the thlrd and fifth components from the PCA performed on the raw spectral data. Alcohols are designated with 0, and nonalcohols are deslgnated with X . The elllpses are lsoaltltudes containing 50% of the blvarlate normal dlstrlbutlons fmed to each class of samples.
Flgure 3. Scatter plot of the sample spectra In the two dlmensions of the first and second components from the PCA performed on the autoscaled data. Alcohols are designated with 0, and non-alcohols are designated with X. The 50% and 95% isoaltitudes are also plotted.
4 5 ,
04
4-
03-
Non-alcohol
35-
N
i
s M
:
02-
c
3-
2
01-
E
0-
::
W
-m V
0
;-01-
2
i
p.
1.5
-03-
1
4000
-02-
3500
3000
2500 2000 wavenumber
1500
1000
500
Figure 2. Feature weights ( w ) plotted against wavenumber. The maxima near 3880, 1300, and 1000 cm-' can be atblbuted to spectral features due to alcohols (0-H stretch, C-O-H bend, and C O stretch, respectively). The plots were generated by using Lotus 1-2-3 (Lotus Development Corp., Cambridge, MA) and Windows Draw Plus (Micrografx, Richardson, TX).
RESULTS In the fmt trial, the PCA was run on the raw spectral data. None of the scatter plots showed a good separation between the two classes. The best separation (which can be characterized as only fair a t best) was achieved on the plot of component 3 versus component 5 (Figure 1) where 0 and X designate alcohols and non-alcohols,respectively. Each class has an ellipse drawn centered on the class means. The ellipses are "isoaltitudes" of the bivariate normal distributions fit to each class. The ellipse is set such that 50% of the volume of the bivariate normal distribution is included inside the ellipse. Note that these ellipses do not take the class probabilities into account. That is, the 50% confidence limit is defined given that one has randomly drawn from the population of alcohols and not the population of all organic compounds. There is a substantial overlap between the classes. These results are inadequate for describing a rule for an expert system to determine the presence of an alcohol functionality in an unknown. T o improve the separation, autoscaling and feature weighting were performed on the data. The feature weights, as calculated by eq 2, are plotted versus wavenumber as a "spectrum" in Figure 2. The largest feature weight is at the 3600-an-'region as one might expect since the 0-H stretching
-04
I 01
0
01
02
03
0.1
Principal Component I
Figure 4. Scatter plot of the sample spectra in the two dimensions of the first and second components from the PCA performed on the feature weighted data. Alcohols are designated with 0, and nonalcohols are designated with X. The 50% and 95% lsoaltltudes are also plotted. mode absorbs in this region (36). Similarly, the peak in the feature weight spectrum in the 1300-1400-~m~~ region can be attributed to the C-0-H bend (36), and the peak in the l ~ l l O O - c m -region l can be attributed to the C-0 stretch (37). As was described above, it is unlikely that the smaller structure in the feature weight spectrum can be attributed to spectral features of an alcohol. Any wavenumber that discriminates between the 2 sets of 26 samples will generate a large feature weight. This happens regardless of whether the wavenumber is associated with alcohols or some other functionality that was not evenly distributed between samples sets. The reason for the discriminating ability is invisible to the feature weight generation algorithm. The data pretreatment steps of autoscaling and feature weighting were tried separately to gain insight into their respective effects on the PCA. Figure 3 shows the two-dimensional scatter plot showing the best separation of the autoscaled training-set data. One should note that principal components 1and 2 were used (as opposed to 3 and 5 for the untreated data) and that the separation has improved. By equalizingthe measurements' variances, the small significant peaks (such as the OH stretch) can affect the scatter plot as opposed to the large peaks (such as the CH stretch) which carry no relevant information. The plot also includes the bivariate ellipses (50% and 95% confidence limits). Figure
ANALYTICAL CHEMISTRY, VOL. 63, NO. 17, SEPTEMBER 1, 1991
1743
03
Alcohol 50% ellipse 04 -
-
03 4
G
E 8
01-
010 p.
0-,
-E - 0 1 -
52
0
0
0
2
-
0
02-
-02-0.3
-
Non-alcohol
7
-03
x
-01
-04-
-0.6 Y
-02
-02-03-
-05-
x
0 4 ,
o-01-
0 01 02 Principal Component 1
4
03
04
Flguro 5. Scatter plot of the sample spectra in the two dimensions of the first and second components from the PCA performed on the autoscaled and feature weighted data. Alcohols are designated with 0, and norkalcohds are designated with X. The 50% isoaltltudssare also plotted. The data were weighted with plain feature weights ( w ) .
4 shows the two-dimensional scatter plot with the best separation of the feature weighted (but not autoscaled) training-set data. The measurements in the training-set spectra were multiplied by the weights as calculated by eq 2. Principal components 1and 2 showed the best separation, with the first being more important. All of the first component scores are positive because the data were not mean centered. The alcohol samples have been extended in a positive direction because of the feature weighting. Figure 5 shows the scatter plot for the data that was both autoscaled and feature weighted. For all three cases, autoscaled, feature weighted, and both, the slight overlap of the 50% ellipses was very similar. In the case where both pretreatment steps were used, the number of outliers (Le., alcohols in amongst the non-alcohols and vice versa) was minimized. Both autoscaling and feature weighting are useful data pretreatment steps. Because the separation was based mostly on one absorption band, it is warranted to try raising the feature weights to some power (e.g., squaring) to emphasize the 3600-cm-' region even more. It is also worth subtracting 1from the feature weights in order to remove the nondiscriminatory regions from the data set. A number of combinations of these operations were feature investigated including feature weights squared (w2), weights minus one (w- 11, feature weights squared minus one (w2 - l), and feature weights minus one squared ((w - l)?. All these approaches gave better results than the plain feature weighted data. w 2- 1gave the best discrimination between alcohols and non-alcohols. It is entirely possible that some other function (such as (w2- 1)2) could give even better results, but there is little point in pursuing this optimal function. There is no systematic way of finding this function and the significance of the improvements between functionsdiminishes as more functions are tried. For this work, the function w 2 - 1provided a sufficiently adequate separation to warrant the cessation of testing new functions. The scatter plot (Figure 6) of the scores for the first and second components of the w 2 - 1 weighted data shows an excellent separation of alcohols and non-alcohols. The plot also includes the bivariate normal distribution ellipses (50% and 95% confidence limits). The locus of points where the two distributions have equal height defines an ellipse and is plotted in Figure 7. Every unknown that falls within this ellipse can be classified as a non-alcohol, and every unknown that falls outside the ellipse can be classified as an alcohol. There is one outlier alcohol, namely o-ethoxyphenol,in the non-alcohol region. The 0-H stretching band of this molecule is shifted to lower wavenumber because of the intramolecular hydrogen bonding of the alcohol hydrogen to the ether oxygen
I
I
-07
-03
-02
0
-01
02
01
03
04
Principal Component 1
Figure 8. Scatter plot of the sample spectra in the two dimensions of the first and second components from the PCA performed on the autoscaled and feature weighted data. The data were w mted as w2 - 1. IsoaRitudes that contain 50% and 95% of the distribution vol-
umes are also plotted.
03-
0
0
S -03-
outlier alcohol o-ethoxyphenol
:"I____ -06
-0 7- 0 4
-03
-02
-01
0
01
02
03
1
4
Principal Component 1
Figure 7. Same plot as Figure 6 (autoscaled and weighted ( w 2 - 1) data)with the discrknlnatlon ellipse line plotted. The hewts of the two distributions are equal along the edge of the ellipse. The non-alcohd distribution is higher inside the ellipse, and the alcohol distribution Is higher outside.
atom. The 0-H stretching band of o-ethoxyphenol was shifted to 3585 cm-', which removes it completely from the 3660-cm-' peak in the feature weight spectrum. Thus, this rule missed the presence of the alcohol functionality in this molecule. A different rule would presumably be needed for intramolecularly hydrogen-bonded alcohols. The non-alcohols are more tightly grouped together than the alcohols. This may seem counterintuitive because the allowed structural variation is greater for non-alcohols than it is for molecules constrained to be alcohols. However, this small variation is explainable, as the non-alcohols all respond as near zero because the overlap between the spectrum (with no peak a t 3660 cm-') and the loading vector (with most of the emphasis at the 3660-cm-' region) is small. The scores for the alcohols depend on the amplitude and location of the 0-H stretching peak and thus show a much wider variation. The loadings determined from the autoscaled and feature weighted w 2- 1data for components 1and 2 are plotted versus wavenumber (as spectra) in Figure 8. It is important to remember that these loading vectors are mathematical components of the spectra and not real spectra themselves. This is especially true of scaled and weighted data. However, it is still possible to interpret features in the loadings in terms of their wavenumber and relationship to expected peaks in the original spectra. The complicated structure of these vectors can be attributed to the autoscaling which amplifies
1744
e
ANALYTICAL CHEMISTRY, VOL. 63, NO. 17, SEPTEMBER 1, 1991
Table I. Classification Matricesn
classified as alcohol
classified as non-alcohol
not classified
total correct rate, %
A. 996 Library Spectra Identified by the Bayesian Discrimination Line Determined from the 52 Training-SetSpectra authentic alcohol 172 8 85 authentic non-alcohol 142 674 B. 996 Library Spectra Identified by the Bayesian Discrimination Line Determined from the 996 Library Spectra authentic alcohol 154 26 94 authentic non-alcohol 34 782 C. 996 Library Spectra Identified by the Linear Discrimination Line Determined Visually from the 996 Library Spectra authentic alcohol 158 22 97 authentic non-alcohol 11 805 D. Second lo00 Library Spectra Identified by the Linear Discrimination Line Determined Visually from the First 996 Library Spectra
authentic alcohol 135 21 96 authentic non-alcohol 20 824 E. Second 1000 Library Spectra Identified by the Linear Discrimination Line Determined Visually from the First 996 Library Spectra with a No-Choice Region Determined Visually from the Second lo00 Spectra 126 19 11 98 (of classified compounds) authentic alcohol 4 823 17 3 (no-choice rate) authentic non-alcohol
OAll matrices determined from projection of Satdler Library spectra onto principal components 1 and 2 determined from the autoscaled and feature weiehted (w2- 1) traininrc-set mectra.
Principal Component 2
01
4000
3500
3000
2500
2000
1500
1000
J 500
wavenumber
Figure 8. Loading values plotted versus wavenumber for the first and second components from the PCA performed on the autoscaled and feature weighted (w' - 1) data.
the variance of baseline regions in the spectra. In future fully developed expert systems, these loadings will provide a method of reconstructing the original IR spectrum. Each functionality deemed present by the expert system will provide a component spectrum (similar to the loading) associated with that functionality. A classical least-squares approach might then be used to combine these functionality components to estimate the unknown's spectrum. Nonrandom deviations will indicate a deficiency in the expert system's interpretation (38). The shape of the first component loading vector is very similar to the feature weight vector and is mostly positive, which is not surprising since the spectral bands are for the most part uncorrelated with each other. The autoscaled spectral vectors define a nearly hyperspherical shape in the spectral space centered on the origin with the alcohols tending toward one side of the hypersphere and the non-alcohols tending toward the other. The feature weight vector pulls the sphere into an ellipsoid with the alcohols and non-alcohols arranged a t opposite ends of the major axis (i.e., the first principal component). Because the first principal component is mostly all of one sign, one can infer that all the spectral features indicating the presence of the alcohol are positively correlated; i.e., the alcohol functionality is indicated by the presence or absence of a full set of bands as opposed to the presence of some bands and absence of others. Physically, this makes sense, and furthermore, it is obvious that the
alcohol functionality should be indicated by the presence of bands and not their absence. The maximum values in both loading vectors are in the 3660-cm-' region (the 0-H stretch). The presence of this absorption band in the vapor-phase infrared spectrum is most indicative of the presence of an alcohol as all alcohols must contain an 0-H group (by definition) and no non-alcohols (except for carboxylic acids) contain an 0-H group. The f i t 996 spectra from the Sadtler library of vapor-phase infrared spectra (including the 52 used in the training set) were projected into the space defined by the first and second loading vectors defined by the PCA of the autoscaled and weighted w 2 - 1 training set. The spectra were examined visually for quality, and the presence of the alcohol functionality was determined from the Wiswesser line formula and from the name of the molecule. A Wiswesser line formula is a set of characters (supplied in the file headers in the Sadtler library) that nominally identify the structure of the compound. For example, 1-hexanol is represented by Q6 where the Q stands for the OH functionality and the 6 stands for the six carbon alkane backbone. Within this large data set, there were 180 alcohols and 816 non-alcohols. The spectra were tested for the presence of the alcohol functionality by using the discrimination ellipse developed from the 52 training set spectra. The classification algorithm correctly identified 172 of the alcohols and misidentified the remaining 8. The classification algorithm correctly identified 674 of the nonalcohols and misidentified the remaining 142. These resulb are represented in a classification matrix (15) in part A of Table I. The total correct rate is determined by the ratio of the s u m of the diagonal elements of the classification matrix to the total entries in the matrix. The total error rate is determined by the ratio of the off-diagonal elements to the total entries in the matrix. The entries in the classification matrix can be combined to estimate certain probabilities, for example, the probability that an unknown is truely an alcohol given that it has been classified as an alcohol is 172/(172 + 142) = 55%. The pertinent probabilities are as follows. I. If an unknown is classified as a non-alcohol, there is a 99% chance that it is a non-alcohol. 11. If an unknown is classified as an alcohol, there is a 55% chance that it is an alcohol. 111. There is a 83% chance that a non-alcohol will be classified as a non-alcohol.
ANALYTICAL CHEMISTRY, VOL. 63, NO. 17, SEPTEMBER 1, 1991 0 5 ,
x
"
P
1745 I
l
X
3
c
1 8 06
,
,
,
,
-01 -02
,
, 0
,
, 02
,
, 04
,
,
O
06
,
, 08
,
, , 1
0 4 1
12
-04
./, -03
Principal Component 1
Figure 9. Scatter plot of the projection of the first 996 samples from the Sadtier library onto the window defined by Figure 6. The spectra were pretreated with the autoscaling and weighting ( w 2 - 1) data from the training set. The 50% and 95% confidence ellipses are shown on this pbt. The non-aicohol50% ellipse is plotted In white. Because of the density of the points, lt Is difficult to see the points In the region of overlap.
IV. There is a 96% chance that an alcohol will be classified as an alcohol. The high probabilities for I and IV are due to the low rate of misclassifying alcohols as non-alcohols. These probabilities need to be examined in light of the corresponding probabilities for trivial rules. Consider a rule that classifies the unknown based on a coin flip. For a given unknown, there is a 50% probability of a correct classification. Probabilities I11 and IV are both 50% for this trivial model. Probabilities I and I1 depend on the a priori probabilities of the two classes (in this case, approximately 20% of the samples are alcohols and 80% are non-alcohols). For probability I, since the classification contains no information,the unknown can be anything; thus, there is an 80% chance that it is a non-alcohol. Similarly, probability I1 is 20%. Therefore, while 55% seems like a low value (and in fact is too low to be of analytical use), the rule based on the projection is doing almost 3 times better than simply guessing. Another trivial rule is to classify all unknowns as nonalcohols. The total error rate and total correct rate are very close to the rule based on the projection. Given that the unknown is an alcohol, there is a 0% chance of correct classification (probability IV), but given that the unknown is a non-alcohol, there is a 100% chance of a correct classification (probability 111). If an unknown is classified as a non-alcohol, there is an 80% chance that it truly is a non-alcohol (probability I). Since no unknowns are classified as alcohols, probability I1 is undefined. The presence of some high probabilities and a low total error rate is not proof of the utility of the model. For a perfect rule, probabilities I and I1 should sum to 200% and I11 and IV should sum to 200%. For both of these trivial models, the probabilities summed to loo%, except where the probability was undefined. Thus, 100% is a minimum and 200% is a maximum for this indicator of the utility of a rule. For the rule based on the projection developed above, the sum of probabilities I and I1 was 154% and the sum of probabilities I11 and IV was 179%. While these indicators are well above the minimum, the rule as it stands has too large an error rate (particularly for those unknowns classified as alcohols) to be useful. The large data set can be used to derive a better estimate of the true distributions of alcohols and non-alcohols. Figure 9 shows the scatter plot of the spectra from large data set projected onto the same space as shown in Figure 6. The ellipses that contain 50% and 95% of the distribution volume for each class are included. These ellipses were generated from
-02
-01 0 Principal Component 1
01
02
03
Figure 10. Expansion of the overlap region of Figure 9. This plot includes the elliptical decision line derhred from fitting normal dlstributions to the large data set. 05 04
f
X
i
-04
/
X Y
x
-03
-02
-0 1
X
I
x
0
01
02
0.3
Principal Component 1
Figure 11. Same scatter plot as shown in Figure 10 with a straight discrimination line determined visually.
the distributions fitted to the large data set. The 50% ellipse for the non-alcohol class is plotted in white to stand out on top of the density of points. Because of the density of points, it is difficult to see the region of overlap and this region has been expanded in Figure 10. This plot also shows the ellipse that separates the two distributions. By classifying all of the samples in the large training set by this rule (non-alcohols inside the ellipse and alcohols outside the ellipse), a total error rate of 6% incorrect classifications is derived. The classification matrix is shown in part B of Table I. While these error rates are not bad, they can be improved by dropping the assumption of normally distributed samples. Furthermore, by dropping the assumption of normality, the dubious estimate of the a priori probabilities can also be dropped. By examining the plot of the large data set (Figure 9), it can be seen that the two classes do not fall into simple elliptical shapes. One can apply the human eye (a strong pattern recognition device) to draw a better separating line than the ellipse. Figure 11shows one possible separating line (with the function y = 1.666~+ 0.1). All unknowns falling above and to the left of the line are classified as non-alcohols. There are two difficulties in generating this type of line. First, a large training set is needed (in this case nearly lo00 samples were used), and second, it is subjective and thus may not be the optimal rule. Part C of Table I shows the classification matrix for this line. Since the separation rule was generated subjectively, it is important to test it against a prediction set (i.e., a set of spectra not used in the generation of the line). The next lo00 samples
1746
ANALYTICAL CHEMISTRY, VOL. 03, NO. 17, SEPTEMBER 1I. 1991
from the Sadtler library were checked for quality and identified from their Wiaweaser line formulae. None of the spectra were rejected. The spectra were projected onto the space and classified by the line described above. The resulting classification matrix is shown in part D of Table I. The corresponding probabilities are as follows. I. If an unknown is classified as a non-alcohol, there is a 98% chance that it is a non-alcohol. 11. If an unknown if classified as an alcohol, there is a 87% chance that it is an alcohol. 111. There is a 98% chance that a non-alcohol will be classified as a non-alcohol. IV. There is a 87% chance that an alcohol will be classified as an alcohol. This model’s indicator sums are both 185% (for I11 and IV and for I and 11), which is close to the theoretically optimal value of 200%. The error rate can be improved by recognizing that there is a region between the two distributions where it is difficult to classify the unknowns. It is desirable for the rule to report that it cannot classify an unknown with confidence rather than simply making a choice (essentially a t random in this region of overlap). By defining two lines (both with a slope of 1.666 and intercepts of 0.11 and 0.06),there are now three possible choices. All samples falling above the upper line (intercept 0.11) are classified as non-alcohols, all samples falling below the lower line (intercept 0.06) are classified as alcohols, and those falling between the lines are not classified (no choice). Obviously, the size of the no-choice region can be increased, which would drive the error rate to 0, but at the same time would drive the no-choice rate up to an unacceptable level. The two lines, described above, bracket 28 samples from the second lo00 spectra, thereby creating a no-choice rate of 3%. The classification matrix is shown in part E of Table I. The probabilities are as follows. I. If an unknown if classified as a non-alcohol, there is a 98% chance that it is a non-alcohol. 11. If an unknown is classified as an alcohol, there is a 97% chance that it is an alcohol. 111. There is a 98% chance that a non-alcohol will be classified as a non-alcohol. IV. There is a 81% chance that an alcohol will be classified as an alcohol. V. There is a 2% chance that a non-alcohol will not be classified. VI. There is a 7% chance that an alcohol will not be classified. Note that the classification matrix in part E of Table I is very similar to that in part D. The number of incorrectly classified non-alcohols dropped by 16 out of 20 at the expense of losing 9 out of 135 correctly identified alcohols. The change in the second column is minimal, as should be expected since the upper line is close to the previous line. Of the 19 alcohols that were misidentified as non-alcohols, the majority showed a large downward shift of the 0-H stretching frequency. This shift is attributable to intramolecular hydrogen bonding. Eight of the samples were short chain alcohols with an amino group. The shifted band in the spectrum does not overlap the peak in the component loading, and thus, the sample is identified as a non-alcohol. This effect was noticed for the misidentification of the o-ethoxyphenol in the training set. The importance of this observation is that in a more complex expert system such misidentificationsmay be caught by secondary rules. For example, the determination of the presence of the alcohol functionality can be accomplished by two separate rules, one rule as described in this paper and a second rule for alcohols with internal hydrogen bonding, developed in a similar fashion by using a training
set with examples of this structure type. Visser and van der Maas (31) reported difficulty in determining the presence of OH functionalities from condensed-phase IR spectra. These difficulties resulted from trying to find spectral regions where alcohols and carboxylic acids both showed specific absorption bands. Luinge and van’t Klooster (39)reported a classification rate of 97% of alcohols using a peak table approach on combined infrared and mass spectral data. Luinge et al. (26) reported classification rates of 100-8170 of alcohols depending on the degree of the alcohol. Their expert system was based on condensed-phase IR spectra and automatically generated peak location rules.
CONCLUSION A expert system for the determination of the molecular structure from the vapor-phase FTIR spectra can be devised. The difficult steps in techniques such as PAIRS can be circumvented by a simple principal component analysis approach. By the appropriate choice of a training set and by scaling the measurements based on their discriminatory power, it is possible, via PCA, to derive loading vectors that define a spectral subspace where the molecular classes (such as alcohol and non-alcohol) are separated. The set of scores generated by these loading vectors accurately represent a spectrum in terms of predicting the presence or absence of a given functional group. The decision-making process of classifying an unknown sample from its scores (location in the spectral subspace) can be based on a simple Bayesian probability approach. In this preliminary work, alcohols were examined. Alcohols are actually a special case because they have a distinctive, single absorption band (0-H stretch). In future papers from this laboratory, results obtained with more dflicult cases, such as molecules containing carbonyl groups and their subclasses (ketones, aldehydes, esters, etc.) and aromatics with different substitution patterns, will be described. For more complex cases, it may be necessary to use larger training sets to derive a robust model. A simple linear rule was achieved, that could identify an unknown alcohol with a 98% probability of being correct and a 3% chance of not being able to decide. The probability that an unknown calssified as an alcohol is actually an alcohol is 97% and the probability that an unknown classified as a non-alcohol is actually a non-alcohol is 98%. These probabilities are sufficiently high for the rule to be included in an expert system. In future papers in this series, we will discuss the structure of expert systems built on these PCA derived rules and compare various pattern recognition and classification techniques used in this application. LITERATURE CITED Sharaf, M. A.; Illman, D. L.; Kowalskl, B. R. ChemomMcs; John WC ley B Sons: New York, 1986. Hanlngton, P. B.; VoOmees, K. J. Anal. Chem. 1090, 62, 729-734. Frankel, D. S. Anal. Chem. 1984. 56. 1011-1014. Hangac. G.; Wleboldt, R. C.; Lam, R. B.; Isenhour, 1.L. Appl. SpecWOSC. 1982, 36(1), 40-47. Wang, C. P.; Isenhour, T. L. Awl. Spectrosc. 1987. 47 (2). 185-195. Bbrga, J. M.; Small, G. W. Anal. Chem. 1990. 62, 226-233. Zupan, J.; Munk, M. E. Anal. Chem. 1985, 57, 1809-1616. Woodruff H. B.; Smith, 0.M. Anal. Chem. 1980. 52, 2321-2327. Puskar. M: A.: Levine. S. P.; Lowry, S. R. Anal. Chem. 1988, 58, 1156- 1 162. ’ Grlbov. L. A.; Elyashberg, M. E.; Serov, V. V. Anal. Chim. Acta 1977, 95, 75-96. Tomelllnl. S. A.: Hartwick. R. A.: Woodruff. H. B. ADD/. .. Smctmsc. . 1985, 39 (2), 331-333. ’ Ylng, L.; Levine, S. P.; Tomelllnl, S. A.; Lowry, S. R. Anal. Chem. 1987. 59. 2197-2203. Maddams. W. F. Appl. Spectrosc. 1980, 34(3), 245-266. Jolllffe, I . T. PNnclpal Component Ana&&; Springer-Vetlag: New York, 1986. James, M. ClessiflcaNOn Algorithms; John Why 8. Sons: New York, 1985. Varmuza, K. Anal. Chim. Acta 1980, 722. 227-240.
Anal. Chem. 1991, 63, 1747-1754 (17) Derde, M. P.; Massart, D. L. Anal. Chlm. Acta 1988, 191, 1-18. (18) Woodruff, H. B.; RMer, (3. L.; Lowry, S. R.; Isenhour. T. L. Appl. SpeclrapC. 1978, 30 (2),213-216. (19) Harrlngton, P. B.; Street, T. E.; Voorhees, K. J.; dl Brozolo, F. R.; Odom, R. W. Anal. Chem. 1909, 61, 715-719. (20) Derde, M. P.; Buydens, L.; Gum, C.; Massart, D. L.; Hopke, P. K. Anal. Chem. 1987, 59, 1888-1871. (21) Shah, N. K.; Gemperline, P. J. Anal. Chem. 1990, 63, 465-470. (22) Tmelllni, S. A.; Stevenson, J. M.; Woodruff, H. B. Anal. Chem. 1984, 56,87-70. (23) Hippe, 2 . Anal. Chlm. Acta 1983, 150, 11-21. (24) Dupuls, J. F.; Cielj, P.; van’t Klooster, H. A.; Dijkstra, A. Anal. Chlm. Acta 1979, 112, 83-93. (25) Tomelllni, S. A.; Hartwick, R. A.; Stevenson, J. M.; Woodruff, H. 8. Anal. Chlm. Acta 1984, 162, 227-240. (28) Lulnge, H. J.; Kleywegt, G. J.; van’t Klooster, H. A,; van der Maas, J. H. J . Chem. Inf. Comput. Scl. 1987, 27, 95-99. (27) Goiub, G. H.; Van Loan, C. F. MaMx Computathms; John Hopkins University Press: Baltimore, MD, 1983. (28) Geiadi, P.; Kowaiskl, B. R. Anal. Chlm. Acta 1988, 165, 1-17. (29) Zupan, J. Anal. Chlm. Acta 1978. 103, 273-288. (30) Visser, T.; van der Maas, J. H. Anal. Chlm. Acta 1980, 122. 357-361.
1747
(31) Vlsser, T.; van der Maas, J. H. Anal. Chlm. Acta 1980, 122, 383-372. (32) Blaffert, T. Anal. Chim. Acta 1984, 161, 135-148. (33) Blaffert, T. Anal. CMm. Acta 1988, 191, 181-188. (34) Trulson, M. 0.; Munk, M. E. Anel. Chem. 1983, 55, 2137-2142. (35) Box, 0. E. P.; Tlao, G. C. B a m n Inference h Statlstlcel Ana&&; Addison-Wesley: Reading, MA. 1973. (36) Welti, D. Infrared Vapow Spectra; Heyden 8 Son: London, 1970. (37) Nyquist, R. A. The Interpretanon of Vapw-hase Infrared Spectra; Sadtler Research: Philadelphia. 1984;Voi. 1. (38) Sapersteln, D. D. Appl. Spectrosc. 1908, 40 (3),344-348. (39) Lulnge, H. J.; van’t Klooster, H. A. Trends Anal. Chem. 1985, 4 (lo), 242-243.
RECEIVED for review November 13, 1990. Accepted May 6, 1991. This work was partially supported by Grant No. 60NANB7D0736 from the Center for Fire Research of the National Institute for Standards and Technology and the University of Idaho Center Hazardous Waste Remediation Research.
Development of a Coherent Forward Scattering Resonance Monochromator for the Rejection of Continuum Background and Neighboring Lines in Emission Spectra Hideyuki Matsuta* and Kichinosuke Hirokawa Institute for Materials Research, Tohoku University, Sendai, 980 Japan
Opthnan opwathg condltkrw of a coherent forward scattering resonance monochromator (CFSRM), such as applied magnetk Rdd strength, pulse duration, and peak current of pulsed glow dlrcharge atomization, were studied. Correction of perfonnancelhnttingfactors, such as leakage of the crossed polarizer and analyzer, was tried by two different methods. One used the tlme dependence of atomic vapor density produced by pulsed glow discharge atomizatlon and the other used magneik Rdd modulaibn. I n the present apparatus, the method using tlme dependence of atomic vapor density was superior. Experiments to obtain the rejection of continuum background and neighboring lines of copper resonance llnes (324.8 and 327.4 nm) in emission spectra were performed, and satlsfactory resuits were obtalned.
INTRODUCTION In emission spectrometry for trace analysis, the presence of continuum background and neighboring lines of target elements restrict the detection limits. In a spectrometer such as a conventional grating monochromator or an echelle monochromator, it is difficult to establish extremely narrow band-pass and high throughput simultaneously (1). More than 20 years ago, Walsh and his group proposed a resonance monochromator (RM) (2,3)to select resonance lines of particular atoms from polychromatic radiation. They used a resonance fluorescence of atomic vapor and detected the signal at lateral direction. The spectral band-pass of this resonance monochromator was extremely narrow comparable to the line width of atomic emission lines (i.e. -0.001 nm (pm) order), but the transmission efficiency of the resonance monochromator was not so good. If a resonance coherent forward scattering (CFS) signal is used, the performance of a resonance monochromator will be 0003-2700/91/0363-1747$02.50/0
greatly improved, since intensity of a resonance CFS signal is many orders of magnitude larger than a resonance fluorescence signal (4) and it is feasible to collect an avialable CFS signal completely, since CFS radiation can be collimated. According to the above scheme, we have developed a coherent forward scattering resonance monochromator (CFSRM) with a permanent magnet (5,6). In our previous experiments, although it was confirmed that the spectral band-pass of a CFS resonance monochromator was very narrow, part of nonresonance radiation passed through the CFS resonance monochromator as a leakage of the crossed polarizer and analyzer. This leakage is a result of imperfections in the polarizing prisms. Since the leakage will limit the ability of a CFSRM, such as the detection power of weak emission lines of target elements from a mass of neighboring and objective lines, the leakage should be corrected to obtain the inherent performance of a CFSRM. In this report, two correction methods for leakage will be reported and evaluated. Besides those experiments, dependence of CFS intensity on a change of magnetic field strength, correction of afterglow emission, rejection capability of continuum background and extremely close emission lines, and so forth will be presented.
EXPERIMENTAL SECTION Copper is known to show a strong line-crossing effect (i.e. constructiveand destructive interferencebetween T and u Zeeman components) (4). Thus, it is a very convenient element to understand the general features of CFS intensity variation with changing applied magnetic field strength. In this experiment, copper was used as an experimental element. Theory of CFS intensity predicts that if the length of magnetooptical interaction region increases, then the CFS intensity increases quadratically (4). To verify the above, two types of cathodes 10 and 30 mm in length were used. These cathodes will 0 1991 American Chemical Society