Simplex Optimization of PCA-Based Infrared Expert Systems

Moving Window Partial Least-Squares Discriminant Analysis for Identification of Different Kinds of Bezoar Samples by near Infrared Spectroscopy and ...
0 downloads 0 Views 120KB Size
Anal. Chem. 1999, 71, 960-967

Simplex Optimization of PCA-Based Infrared Expert Systems Jyisy Yang,* Fon-Shiu Lee, and Su-Juan Shou

Department of Chemistry, Chung-Yuan Christian University, Chung-Li, Taiwan

This paper describes an improved principal component analysis-based infrared expert system. The simplex algorithm is used to optimize the feature weight spectrum, thereby increasing the separation of different classes of compounds. First, the classification of a two-class system is significantly improved. Analysis indicates that compounds having more than three classes can also be separated in a single step using the same approach. This can simplify the structural elucidation tree in an expert system. Also reported here are factors influencing optimization of the feature weight spectra, including the initial data matrix and the step size of the optimization. Developing an expert system to identify the molecular structures from their infrared spectra requires manipulation of a large number of spectra. Although infrared spectra can provide abundant molecular structural information, accurately interpreting infrared spectra requires detailed knowledge of the relationship between vibrational energies, intensities, and molecular structure. Properly translating the human expert’s knowledge into a computerexecutable format would allow researchers to rapidly and efficiently interpret infrared structural information. An expert system to interpret infrared spectra can be constructed via two methods. The first method derives rules directly from the infrared spectra to correlate with their molecular structures. Previous investigations have developed several expert systems based on the location, amplitude, and width of the infrared absorption peaks as well as their relationships to their molecular structure.1-5 However, this type of expert system has several limitations. For instance, determining the peak’s location, width, and amplitude is a complicated task, particularly when the peaks of unknown samples are being determined. The second method is based on the mathematical classification method such as principal component analysis (PCA)6-9 or neural networks.10-13 (1) Woodruff, H. B.; Smith, G. M. Anal. Chem. 1985, 57, 1609-1616. (2) Puskar, M. A.; Levine, S. P.; Lowry, S. R. Anal. Chem. 1986, 58, 11561162. (3) Gribov, L. A.; Elyashberg, M. E.; Serov, V. V. Anal. Chim. Acta 1977, 95, 75-96. (4) Tomellini, S. A.; Hartwick, R. A.; Woodruff, H. B. Appl. Spectrosc. 1985, 39, 331-333. (5) Ying, L.; Levine, S. P.; Tomellini, S. A.; Lowry, S. R. Anal. Chem. 1987, 59, 2197-2203. (6) Perkins, J. H.; Hasenoehrl, E. J.; Griffiths, P. R. Anal. Chem. 1991, 63, 1738-1747. (7) Hasenoehrl, E. J.; Perkins, J. H.; Griffiths, P. R. Anal. Chem. 1992, 64, 705-710.

960 Analytical Chemistry, Vol. 71, No. 5, March 1, 1999

Figure 1. Typical structural elucidation tree used in infrared expert system.

PCA is a data reduction method that categorizes major variance into few principal components. With a feature weight spectrum, Perkins et al.6-8 successfully applied this method to classify compounds using their vapor-phase IR spectra. Yang et al. 9 further applied this method to distinguish foreign compounds in cellulosic materials on the basis of the condensed-phase IR spectra of the separated extracts. Together with the structural elucidation tree in Figure 1, the PCA-based two-class method provides a simple and efficient means of constructing an IR expert system. In earlier work, the same investigators successfully classified six to seven types of compounds.8 This method offers several advantages over the peak list approach of the first method, particularly eliminating the need to characterize features in each class of compounds, and the need to define a threshold for peaks. In contrast to methods based on the neural network, the PCA-based method allows visual examination of the relationship between class features and spectra. (8) Hasenoehrl, E. J.; Perkins, J. H.; Griffiths, P. R. Anal. Chem. 1992, 64, 656-663. (9) Yang, J.; Hasenoehrl, E. J.; Griffiths, P. R. J. Chromatogr., A 1997, 785, 111-119. (10) Klawun, C., Wilkins C. L. Anal. Chem. 1995, 67, 374-378. (11) Klawun, C., Wilkins C. L. J. Chem. Inf. Comput. Sci. 1996, 36, 69-81. (12) Klawun, C., Wilkins C. L. J. Chem. Inf. Comput. Sci. 1996, 36, 249-257 (13) Novic, M., Zupan J. J. Chem. Inf. Comput. Sci. 1995, 35, 454-466. 10.1021/ac980808q CCC: $18.00

© 1999 American Chemical Society Published on Web 01/23/1999

According to Perkins et al.,6 a feature weight spectrum (w) can improve the separation for two classes of compounds, leading to a higher prediction accuracy. This feature weight spectrum plays a prominent role in enhancing the difference between classes of compounds and, hence, separation. Perkins calculated the feature weight spectrum from the ratio of the intercategory variances to the sum of the intracategory variances:

Wk(I,II) ) {

∑X /N + ∑X /N - 2∑X ∑X //N N }/ {∑(X - M ) /N + ∑(X - M ) /N } (1) 2

I

2 II

I

II

I

II

2

I

I

I

II

2

I

II

II

THEORY This work focuses only on a PCA-based expert system because details on how to use PCA to construct an infrared expert system can be found elsewhere.6-9 Consider an infrared data set containing N spectra in which each spectrum has M data points. PCA can be used to reduce this high-dimensional data set by defining principal axes and projecting the spectral vectors onto these axes. The n × m dimensional data set is expressed by X and this matrix can be decomposed into several matrices (see eq 1) by using the nonlinear iterative partial least-squares (NIPLS) algorithm.20

II

where XI and XII are measurements at wavenumber k for the samples of classes I and II, respectively. The number of samples in the training set of classes I and II are NI and NII, respectively, and MI and MII are average values of XI and XII, respectively. According to Perkins et al.,6 the effect of feature weighting can be interpreted as stretching the shape of the data envelope in the direction of the original axes where the classes are well separated. This stretching forces the PCA to rotate the principal components, thereby improving separation. Several formats of feature weight spectra have been tested,6-8 such as w, w2, w2 - 1, and (w - 1)2. These approaches selected the feature weight spectrum based on performance in separation of classes of compounds. Typically, the feature weight spectrum was visually determined from the separation distance of different classes of compounds. A different performance in classification is observed for different formats of feature weight spectra, indicating that better performance may be obtained. Therefore, an optimization method is employed in this work to obtain an improved feature weight spectrum. As will be demonstrated later, the optimized feature weight spectrum not only improves separation in a two-class system (i.e., alcohol and nonalcohol) but also is applicable to simultaneously separate systems containing more than two classes of compounds. Simplex optimization14-16 is used for optimization of the feature weight spectrum because this method is a practical oriented method and can be easily extended to many factors. Since Spendly et al.14 proposed the sequential simplex optimization algorithm, several modifications have been made. For instance, Nelder and Mead17 employed expansion and contraction factors, Routh et al.18 proposed a supermodified simplex (SMS), and Ryan et al.19 presented a weighted centroid method (WCM). However, these methods are based on small data sets such as chromatographic separations. When these methods are applied, which for the purpose of constructing the expert system, the data matrices have approximately 50 × 500 dimensions for a two-class system, making the optimization of the feature weight spectrum quite slow. Since the optimization step need not be repeated, a long calculation time is not prohibitive. As described in the next section, the optimization process must have a numerical criterion to assess the performance of feature weight spectra. Also proposed herein is such a criterion. (14) Spendly, W.; Hext, G. R.; Himsworth, F. R. Technometrics 1962, 4, 441. (15) Deming, S. N.; Morgan, S. L. Anal. Chem. 1973, 45, 278A. (16) Berridge, J. C. J. Chromatogr. 1989, 485, 3. (17) Nelder, J. A.; Mead, R. Comput. J. 1965, 7, 308. (18) Routh, M. W.; Swartz, P. A.; Denton, M. B. Anal. Chem. 1977, 49, 1422. (19) Ryan, P. B.; Barr, R. L.; Todd H. D. Anal. Chem. 1980, 52, 1460.

X ) T1P1 + T2P2 + T3P3 + ... + TnPn + E

(2)

where Tn (n × 1 matrix) denotes the score of the vector Pn (1 × m matrix) for nth principal components and E represents the residual matrix. The spectra in the training set should be pretreated by autoscaling to obtain a similar weight for each spectrum. Herein, each spectrum is multiplied by the feature weight spectrum to further emphasize the important features of different classes of compounds. In the simplex algorithm, optimization is based on mountain climbing techniques to optimize variables. To approach the optimum value for a two-variable system, three tests (i.e., number of variables plus one) must be performed to extract the gradient. After the responses of each test are compared, the condition having the worst response is omitted and a new test is then performed under the condition in the opposite direction of the worst point. A new set is formed and the process is repeated until the quality criterion is reached. The SMS method proposed by Routh et al.18 has been used in this work because of the simplicity of its optimization strategy. This method differs from the original simplex method primarily in that the step size is changed and the responses are compared to determine better conditions for the next test. When the new position is to be determined, several positions in the direction from the worst position to the mean center are compared. The optimization criterion is an important indicator to translate the separation condition into a numerical form. The construction of the critical optimization function (COF) is based on the resolution between each class. According to Figure 2, the optimization function is defined as classes separated in a projection line. The projection line is determined by calculating the projection of every point in the training set. After the degree of spreading and the distance of mean values of each classes are compared, the direction of the optimal projection line is determined. The COF translates separations into numerical numbers, and the format used in this work is calculated according to the following equation:

COF ) log[(MI - MII)/(SDI + SDII)]

(3)

where MI and MII denote the projected average values in the separation line for class I and II, respectively. SD represents the standard deviation of the projection values for each class, which are assumed to be normally distributed. The COF is in the form of a logarithm to effectively suppress the weight after reaching a certain separation condition. For more than two classes, eq 3 is (20) Geladi, P.; Kowalski, B. R. Anal. Chim. Acta 1986, 185, 1-17.

Analytical Chemistry, Vol. 71, No. 5, March 1, 1999

961

Figure 2. Response function used in the optimization of the feature weight spectrum. (A) Project each compound in training set onto the principal components 1 and 2 to obtain the best projection line. (B) Determine mean values and standard deviations for each class of compounds and calculate the response of COF.

extended to calculate the response of an individual COF between each class of compounds. While unknown compounds are predicted, a classification line is visually generated from the projection plots. The simplex method was originally proposed for a fewdimensional data system. In systems containing hundreds of variables, the starting matrix cannot be directly calculated but can be chosen formed to speed optimization. Although a random starting matrix can be used to initiate the optimization, the use of the feature weight spectrum proposed by Perkins et al.6 (as shown in eq 1) is assumed to be a better starting point. Therefore, the initial simplex is formed from the feature weight spectrum that is derived on the basis of eq 1. During formation of the initial simplex matrix, a window is used to change the intensity of (zap) a defined range of wavenumbers in the original feature weight spectrum. In this manner, a new feature weight spectrum can be formed with only a few points different from the original feature weight spectrum. Besides the feature weight spectrum generated from eq 1, other forms of feature weight spectra can also be used to generate the starting simplex. For instance, w2, w2 - 1, or (w 1)2 can be used. Because most of the variations are reserved in first few principal components, optimizations of the feature weight spectra are focused on the first two principal components only. Details regarding the testing of these starting simplex with different sizes and different based feature weight spectra will be discussed below. EXPERIMENTAL SECTION For this work, data manipulation was carried out with programs written in MATLAB language (The Mathworks, Inc. Natick, MA). The 350 vapor-phase spectra were measured to a resolution of 8 cm-1 over the spectral region from 4000 to 450 cm-1. Each spectrum contains 444 data points. Twenty-five spectra were 962 Analytical Chemistry, Vol. 71, No. 5, March 1, 1999

Figure 3. Flowchart for optimizing separation of different classes of compounds.

selected to represent each of the classes of compounds. Figure 3 illustrates the calculation procedure used in this work. This method differs from the method proposed by Perkins et al.6 in that a recursive procedure is used to search the optimal feature weight spectrum. A 586 IBM-compatible personal computer equipped with 166-MHz CPU and 32-MB RAM was used for all calculations. For comparison purposes, the method proposed by Perkins et al.6 was also tested with feature weight spectra of the forms w, w2, w2 - 1, and (w - 1)2. RESULTS AND DISCUSSION The merits of optimized feature weight spectra on classification were investigated. The feasibility of simplifying the tree structure in the classifications was also demonstrated. Eight two-class systems (as listed in Table 1) were tested for separation. The systems of alcohol/carbonyl/not either and acid/ester/ketone/ aldehyde were also examined to demonstrate that compounds having more than two classes can be classified simultaneously. Construction of Parameters in Simplex Optimization of Feature Weight Spectra. A starting matrix is required to initiate the simplex optimization algorithm. This starting matrix should contain one number more vectors than the number of variables being optimized. For instance, if the feature weight spectrum contains 444 data points, then the initial matrix should have 445 vectors. To increase the speed of convergence, the feature weight spectrum calculated from eq 1 is used to form the starting matrix. A series of new feature weight spectra (vectors) is generated from it by moving a zapping window (changes the intensities to one) across the original feature weight spectrum. The size (number of

Table 1. COF Responses in the Classification of Different Classes of Compounds systems

no feature weighted

w

w2

w2 - 1

(w - 1)2

optimized

carbonyl/non-carbonyl alcohol/nonalcohol alcohol/carbonyl carbonyl/non-carbonyl-alcohol alcohol/non-carbonyl-alcohol acid-ester/ketone-aldehyde acid/ester ketone/aldehyde

-0.671 -0.145 -0.190 -0.461 -0.174 -0.120 0.373 -0.508

0.143 0.044 0.153 0.078 0.024 0.259 0.529 0.168

0.228 0.145 0.228 0.232 0.128 0.298 0.536 0.200

0.200 0.113 0.207 0.200 0.106 0.301 0.536 0.294

0.165 0.130 0.190 0.162 0.113 0.286 0.535 0.210

0.499 0.230 0.521 0.526 0.217 0.544 0.608 0.547

average

-0.237

0.175

0.249

0.245

0.224

0.462

data points) of the zapping window is varied and studied, including 1, 3, 5, 7, and 9 data point(s). For instance, if the zapping window is 1 data point window, the first new feature weight spectrum is obtained by changing only the intensity of the first data point in the original feature weight spectrum to one. The second new feature weight spectrum is obtained by changing only the intensity of the second data point in the original feature weight spectrum to one. Changing every data point in this way results in 444 new feature weight spectra. After adding the original feature weight spectrum, an initial matrix containing 445 feature weight spectra is formed. If the zapping window is 3 data points wide, the first feature weight spectrum is obtained by changing only the intensities of the first 3 data points to one. The second new feature weight spectrum is obtained by changing only the intensities of second, third, and fourth data points to one. Changing every set of 3 data points in this way and wrapping around the corner, a starting matrix containing 445 feature weight spectra is formed. Further sets are formed in the same manner using wider zapping windows. Test results obtained from carbonyl/non-carbonyl systems indicate that different zapping window sizes give different convergence speeds. Figure 4A displays the COF curves for the carbonyl/non-carbonyl system using zapping window sizes ranging from 1 to 9 data points. A slight difference in the terminal values of COF is observed after 5000 replacements. Zapping window sizes around 5 data points shows faster convergence rates than other sizes. This suggests that zapping 5 data points can effectively remove the important features in the feature weight spectrum, thereby causing a large variation of COF values. The typical absorption bandwidths at half-height in vapor-phase infrared spectra are approximately 40 cm-1; thus, replacing 5 points corresponds to removing one band. In contrast, a smaller number of zapping points cannot completely remove the feature of a certain band. A large number of zapping points smoothes out the features of some bands. Therefore, zapping windows 5 points wide were used in all of the following calculations. The starting matrix can also be prepared from other forms of feature weight spectra. Several other forms of feature weight spectra were also investigated for their speed in convergence: these included w2, w2 - 1, and (w - 1)2. The results of COF responses for the carbonyl/non-carbonyl system using these forms of feature weight spectra are plotted in Figure 4B. From the figure, we can infer the following. First, all of these forms of initial matrices can converge effectively. Second, the initial feature weight spectrum of the form w2 generally gives a faster convergence. This may be because the feature weight spectrum in the form of

Figure 4. (A) Response of COF vs width of zapping window for 1, 3, 5, 7, and 9. (B) Response of COF curve using different initial data of w, w 2, w 2 - 1, and (w - 1)2. The width of the zapping window is 5 data points in these calculations.

w2 effectively enlarges the difference between different classes of compounds. The form of w2 - 1 or (w - 1)2 moves the baseline of the feature weight spectra to zero. This movement can decelerate the optimization at points without any intensity. Because the convergence rates are similar for w and w2, for computational simplicity and speed, the form w was selected for the following investigations. Simplex is conventionally used to optimize fewer variables such as optimization of the conditions for chromatographic separation, extraction, and reactions. To properly extend this method to a high-dimensional data system such as the IR spectra, the response surface of the COF was studied. Assuming R is the time of the distance from worst position (W) to the mean center (Cw), the new position can be calculated as Cw + R(Cw - W). To study the response surface in the progress direction, R is varied from -100.0 Analytical Chemistry, Vol. 71, No. 5, March 1, 1999

963

Table 2. Prediction of Compounds Using Original and Optimized Feature Weight Spectrum without optimization sample no. system carbonyl/non-carbonyl alcohol/nonalcohol alcohol/carbonyl carbonyl/non-alcohol-non-carbonyl alcohol/non-alcohol-non-carbonyl acid-ester/ketone-aldehyde acid/ester ketone/aldehyde

carbonyl non-carbonyl alcohol nonalcohol alcohol carbonyl carbonyl non-alcohol-non-carbonyl alcohol non-alcohol-non-carbonyl acid-ester ketone-aldehyde acid ester ketone aldehyde

298 273 108 463 108 298 298 164 108 164 150 148 72 78 75 73

av correct prediction rate (%)

Figure 5. COF in with R values ranging from -100 to +100 in steps of 0.5. Plot is enlarged in the portion with larger variations.

to +100.0 in steps of 0.5 for separation of the carbonyl/noncarbonyl system. The typical resulting curves are expanded in the region having larger variations for replacement at 800-1200 times, as shown in Figure 5. According to this figure, the response surface contains a smooth function with a maximum located close to the center of the mean Cw and located at the positive direction. Although one can always calculate the R having maximum response in a given surface, the calculation time is long. To increase the speed in calculation, only positions having R values of -0.5, 0, 0.5, 1, 2, and 3 were calculated and compared in all of the following investigations. Predictability Improvement for Two-Class Systems. Although two classes of compounds can be separated by the PCA method using a calculated feature weight spectrum, the optimization method further increases class separation and facilitates prediction of unknown spectra. Herein, the initial data matrix was selected using the zapping windows close to the width of peaks (5 data points) and the step sizes of -0.5, 0, 0.5 1, 2, and 3 for optimization. The initial simplex matrix was generated by eq 1 in the form of w. The simplex algorithm was terminated after the 964 Analytical Chemistry, Vol. 71, No. 5, March 1, 1999

optimized

no. of wrong classif

correct prediction rate (%)

no. of wrong classif

correct prediction rate (%)

4 20 3 9 6 8 9 18 1 9 1 3 0 0 1 4

98.7 92.7 97.2 98.1 94.4 97.3 97.0 89.0 99.1 94.5 99.3 98.0 100 100 98.7 94.5

2 10 0 1 0 1 2 5 0 0 1 1 0 0 0 2

99.3 96.3 100 99.8 100 99.7 99.3 97.0 100.0 100.0 99.3 99.3 100 100 100 97.3

96.78

99.21

difference of the COFs of the worst and best feature weight spectra was less than 0.001. Table 1 summarizes the results of the optimized responses of COFs for tested compounds. This table also contains COFs found using different forms of feature weight spectra originally used by Perkins et al. According to this table, the separation of different classes of compounds can be much improved. For instance, the COF for the carbonyl/non-carbonyl system can be increased from 0.228 to 0.499; COFs for the alcohol/carbonyl system increased from 0.228 to 0.526 as well. The average COF for the tested systems is increased from 0.249 to 0.462. Consider the situation in which the response of the COF is in the form of a logarithm; such an increase is significant. Class prediction accuracy is also increased by using optimized feature weight spectra. Table 2 summarizes the test results. For the carbonyl/non-carbonyl system, the number of erroneously classified compounds changed from 24 to 12. For the alcohol/ nonalcohol system, it changed from 12 to 1. For the carbonyl/ non-carbonyl and carbonyl/alcohol systems, Figures 6 and 7 present typical plots of projections of compounds in training and prediction sets to principal components 1 and 2, respectively. These figures clearly show that the optimization algorithm can successfully enhance the classification of compounds. Based on the prediction pattern, the optimization algorithm does not introduce any systematic bias. Generally speaking, classification would be difficult for one class in systems that contain compounds with a complicated chemical functionality, such as the carbonyl/ non-carbonyl system. This difficulty arises because only the feature of the well-defined class of compounds is used in separation. On the other hand, in the separation of well-defined classes of compounds, such as the alcohol/carbonyl system, the features of each class of compounds can be used, leading to high separability. For instance, in the separation of systems such as carbonyl/alcohol, the COF is substantially increased (from 0.228 to 0.521). This finding also suggests that the original feature weight is sufficient to enhance the difference between classes of compounds. This leads to a situation in which an acceptable separation using feature weight spectra of the form w2 can be obtained (as depicted in Figure 7A) for the carbonyl/alcohol

Figure 6. Projection of the training set and prediction set onto principal components 1 and 2 for the carbonyl/non-carbonyl system using the original feature weight spectrum, w 2 (A) and optimized feature weight spectrum (B). Training set of carbonyl (9) and noncarbonyl ([); prediction set of carbonyl (×) and non-carbonyl (+).

system. However, for this type of system, any increase of the distance between two classes of compounds facilitates the reduction of the ambiguity in prediction of unknown compounds. For instance, any compound that projects outside the 95% confidence interval is difficult to assign. According to Figure 7B, further separation of two well-defined classes such as carbonyl and alcohol can significantly reduce an ambiguous determination of the unknown compounds in which the projected value is between the classes. Parts A and B of Figure 8 show principal components 1 and 2, the feature weight spectrum of the form w2 and the optimized feature weight spectrum for the carbonyl/non-carbonyl system, respectively. The percentages of variances represented by principal components 1 and 2 using the original feature weight spectrum are 22.4% and 17.6%, respectively. For principal components calculated using the optimized feature weight spectrum, the percentages of variances are 21.4% and 19.6%, respectively. The optimized feature weight spectrum contains several features differing from the original feature weight spectrum. For instance, an intense band appears in the region around 3600 cm-1, which is due to the OH band in acids. Also, a broader but similar intensity band appears in the region of 1700 cm-1 for the CdO stretching mode. This broader band feature can allow irregular carbonyl compounds to be accurately classified. Predictions of unknown compounds are further made for the systems that might be required in the construction of a complete structure elucidation tree as in Figure 1. Table 2 summarizes the results. The average prediction accuracy for systems tabulated in this table is increased

Figure 7. Projection of the training set and prediction set onto principal components 1 and 2 for the carbonyl/alcohol system using the original feature weight spectrum, w 2 (A) and optimized feature weight spectrum (B). Training set of carbonyl (9) and alcohol ([); prediction set of carbonyl (×) and alcohol (+). Table 3. Total Correct Classification of Six Types of Compounds by Flowchart in Figure 1 with Two-Class and Multiclass Methods two-class method no optimization

optimized feature weight

multiclass method

alcohol acid ester ketone aldehyde none

97.2 94.5 94.5 92.0 88.1 87.3

100 98.4 98.4 98.4 95.7 96.8

96.7 98.0 98.0 94.3 98.0 95.7

average

91.29

97.96

96.58

from 96.78% to 99.21%. If one classification is improved only 2% in the prediction accuracy, the total increase of the prediction rate is significant. For instance, at least three steps are required to assign compounds as acids. The multiple steps can increase the correct prediction rate from 90.62% to 95.86%. To closely examine the increase of the prediction accuracy, the elucidation tree in Figure 1 is used to calculate the total correct prediction rates. Table 3 summarizes the results. According to this table, the prediction accuracy can increase approximately 5%. Simplification of the Structural Elucidation Tree in the PCA-Based IR Expert System. Optimization of the feature weight spectrum can increase the separation of different classes of compounds. The structural identification process can be simplified if this advantage can be extended to classifying Analytical Chemistry, Vol. 71, No. 5, March 1, 1999

965

Figure 8. (A) Original feature weight spectrum and its corresponding principal components 1 and 2 for the carbonyl/non-carbonyl system. (B) Optimized feature weight spectrum and its corresponding principal components 1 and 2 for the carbonyl/non-carbonyl system.

compounds into more than two classes. Therefore, two systems, including alcohol/carbonyl/not either and acid/ester/ketone/ aldehyde, were investigated using the algorithm discussed above. These sets of compounds were used to test the feasibility of eliminating or simplifying the structural elucidation tree structure. For example, as Figure 1 indicates, an unknown spectrum requires several steps to be classified as alcohol, carbonyl, or not either. If an unknown compound is one of the carbonyl class, several further steps are necessary to distinguish them. After applying the optimization algorithm, alcohol, carbonyl, and not either can be distinguished in one step. Acid, ester, ketone, and aldehyde can also be distinguished in an additional step. Equation 1 was original intended for two-class separations. For separation into more than two classes, this equation is modified to:4

∑X /N + ∑X /N + ... + ∑X /N C∑X ∑X ... ∑X /N N ... N }/{∑(X - M ) /N + ∑(X - M ) /N + ... + ∑(X - M ) /N } (4)

Wk(I,II,...,C) ){

2

I

2 II

I

2 C

II

C

2

I

II

C

I

II

C

I

I

2

II

II

I

2

II

C

C

C

where XC is measurement at wavenumber k for the sample of classes C, NC is the number of samples in class C, and MC is the average value of XC. Figure 9 shows test results for three- and four-class systems with the projection of available spectra onto the optimized system. 966 Analytical Chemistry, Vol. 71, No. 5, March 1, 1999

Figure 9. (A) Projection of the training set and prediction set onto principal components 1 and 2 for the carbonyl/alcohol/not either system using the optimized feature weight spectrum. Prediction of alcohol (9), carbonyl (2), and not either ([). (B) Projection of the training set and prediction set onto principal components 1 and 2 for the acid/ester/ketone/aldehyde system using the optimized feature weight spectrum. Prediction of acid (9), ester ([), ketone (1), and aldehyde (2).

According to Figure 9A, the prediction accuracy is high for three classes of compounds. Because three classes of compounds include a class of non-alcohol/non-carbonyl, the separation of welldefined systems such as acid/ester/ketone/aldehyde should be much easier, as indicated by Figure 9B. Generally, the simplex algorithm is applicable to optimize the feature weight spectra to enhance separation of more than two classes of compounds. This enhancement can simplify the tree structure in the molecular elucidation as stated above. The simplified tree structure can also increase prediction accuracy. For instance, in the two-class PCA method, two questions must be answered for each compound. The first question is, “Is it carbonyl?” If the answer is no, then the second question is, “Is it alcohol?” If both questions have accuracy rates of 95%, then both assignments will be correct for only 90% of compounds. Table 3 lists the prediction accuracy using the original feature weight spectrum and the optimized feature weight spectrum accompanied by the two-class tree structure. Large improvements can be observed using the simplified tree structure to predict compound classes. Figure 10 shows the optimized feature weight spectra for three and four classes of compounds and their corresponding principal components 1 and 2. The percentages of variances represented by principal components 1 and 2 for the carbonyl/alcohol/not either system are 23.4% and 17.4%, respectively. For the acid/ ester/ketone/aldehyde system, the percentages of variances

Table 4. Prediction of Compounds Using Optimized Feature Weight Spectra no. of correct sample no. wrong prediction systems classif rate (%) alcohol/carbonyl/not either acid/ester/ketone/aldehyde

av correct prediction rate (%)

alcohol carbonyl not either acid ester ketone aldehyde

108 298 164 72 75 78 73

4 6 7 0 0 3 0

96.3 98.0 95.7 100 100 96.2 100 98.02

accuracy in the classification for this system. According to this table, a high prediction accuracy for unknown compounds can be obtained. Table 3 presents the total rate of correction in prediction using these multiclass separation techniques. The rate of correct prediction for the multiclass separation method is slightly smaller than using the tree structure for the two-classes system. Although the prediction accuracy is not improved, applying the simplex algorithm to optimize the feature weight spectrum for simultaneously separating multiple classes of compounds is quite feasible. This confirms that a much simplified structural elucidation tree can be constructed.

Figure 10. (A) Optimized feature weight spectrum and corresponding principal components 1 and 2 for the carbonyl/alcohol/not either system. (B) Optimized feature weight spectrum and its corresponding principal components 1 and 2 for the acid/ester/ketone/aldehyde system.

represented by principals 1 and 2 are 27.5% and 13.1%, respectively. According to Figure 10A, features such as alcohol and carbonyl appear to have scores in principle components to increase the separations. The principal components 1 and 2 are rotated to increase the separation for these classes of compounds. Comparing these two components reveals that principal component 2 is the major contribution of the separation as a large positive band in the CdO stretching region (1700 cm-1) and a negative band in the O-H stretching region (3600 cm-1). Similarly, the optimized feature weight spectrum the for acid/ester/ketone/aldehyde system enlarges the slight difference between these four classes of compounds. For instance, the O-H stretching of acid is enlarged in the region of 3600 cm-1, and C-H stretching for aldehyde also appears in the region of 2800 cm-1. Examining the principal components for these four classes of compounds reveals that principal component 1 is the major contribution in separation of these four classes of compounds. However, principal component 2 enlarges the difference of acid and aldehyde to that of ester and ketone. Table 4 summarizes the results of the prediction

CONCLUSION The simplex algorithm has been successfully applied for optimizing a PCA-based IR expert system. Optimizing the feature weight spectrum can significantly increase classification accuracy in a two-class system and is also feasible for improving separation of multiclass systems. Although the so-called optimized feature weight spectrum in this work may still not provide the global maximum efficiency, this work demonstrates that this approach can still improve classification significantly. In addition, further classification of compounds for two-class systems leads to a higher accuracy. In application to systems with more than two classes of compounds, the complete separation of three and four classes of compounds is obtained as well. This confirms that the tree structure used to classify compounds can be simplified in constructing an IR expert system based on PCA. ACKNOWLEDGMENT The authors thank the National Science Council of the Republic of China for financially supporting of this work under Contract NSC86-2113-M-033-008. The authors also thank Dr. Christopher J. Manning for his help in preparation of the manuscript.

Received for review July 21, 1998. Accepted December 10, 1998. AC980808Q

Analytical Chemistry, Vol. 71, No. 5, March 1, 1999

967