Wavelet Orthogonal Signal Correction-Based Discriminant Analysis

Sep 29, 2009 - We show that a discriminant analysis based on WOSC effectively removes irrelevant classification information from spectral responses...
0 downloads 0 Views 163KB Size
Anal. Chem. 2009, 81, 8962–8967

Wavelet Orthogonal Signal Correction-Based Discriminant Analysis Wangdong Ni,† Steven D. Brown,*,‡ and Ruilin Man† School of Chemistry and Chemical Engineering, Central South University, Changsha, Hunan, 410083 P.R. China, and Department of Chemistry and Biochemistry, University of Delaware, Newark, Delaware 19716 We report the use of wavelet orthogonal signal correction (WOSC) for multivariate classification. This new classification tool combines a wavelet prism decomposition of a spectral response and orthogonal signal correction to significantly improve the classification performance, reducing both classification errors and model complexity. Two spectroscopic data sets are examined in this paper. We show that a discriminant analysis based on WOSC effectively removes irrelevant classification information from spectral responses. WOSC-based discriminant analysis performs favorably as compared to a wavelengthdomain filtering approach, such as that used in orthogonal partial least-squares discriminant analysis (OPLS-DA). Highly multivariate data plays an increasingly important role in classification, but the most common classifier, one based on conventional linear discriminant analysis (LDA),1,2 is of limited use when the number of variables exceeds the number of samples available for analysis. A dimension reduction, as obtained in principle component analysis (PCA) or in partial least-squares regression (PLS), has been applied in connection with discriminant analysis in this situation. PLS-DA, where PLS regression is used to model the linear discriminant distinguishing two classes, has received attention recently.3,4 When spectroscopic data is dominated by a varying background, obtaining the discriminant by using a PLS-DA classifier is difficult. Various signal preprocessing techniques, including variable selection,5,6 orthogonal signal correction (OSC),7 and frequency-domain processing using a wavelet prism transformation,8-10 have been employed to suppress background to improve the robustness and reliability of the multivariate PLS model in both calibration and classification. The preprocessing step can be incorporated directly into the classifier * Corresponding author. Tel.: +1-302-831-6861. Fax: +1-302-831-6335. E-mail: [email protected]. † Central South University. ‡ University of Delaware. (1) James, M. Classification Algorithms; Wiley: New York, 1985; p 125. (2) McLachlan, G. J. Discriminant Analysis and Statistical Pattern Recognition; Wiley: New York, 1992. (3) Rayens, W.; Barker, M. J. Chemom. 2003, 17, 166–173. (4) Musumarra, G.; Barresi, V.; Condorelli, D. F.; Gortuna, G. C.; Scire`, S. J. Chemom. 2004, 18, 125–132. (5) Xu, L.; Schechter, I. Anal. Chem. 1996, 68, 2392–2400. (6) Ballabio, D.; Skov, T.; Leardi, R.; Bro, R. J. Chemom. 2008, 22, 457–463. ¨ hman, J. Chemom. Intell. Lab. Syst. 1998, (7) Wold, S.; Antti, H.; Lindgren, F.; O 44, 175–185. (8) Tan, H. W.; Brown, S. D. J. Chemom. 2002, 16, 228–240. (9) Tan, H. W.; Brown, S. D. J. Chemom. 2003, 17, 111–122. (10) Woody, N. A.; Brown, S. D. J. Chemom. 2007, 21, 357–363.

8962

Analytical Chemistry, Vol. 81, No. 21, November 1, 2009

as well, as in the case of orthogonal PLS discriminant analysis, OPLS-DA,11 an extension of orthogonal PLS regression (OPLS)12 to discriminant analysis. Uncorrelated information in the multivariate response is removed in the process of performing a classification using OPLS-DA, but no improvement results in classification performance. Similarly, combinations of OSC preprocessing and PLS regression or the closely related OPLS regression have been shown to offer little in the way of improvement over that obtained from direct application of PLS regression.13 Frequency-domain preprocessing of multivariate spectral signals with wavelet transforms, for example using a wavelet prism decomposition,8 requires a decision about the wavelet components to retain in the preprocessing. Retaining only those wavelet frequency components clearly describing information about signal may not be helpful for enhancement in robustness of multivariate model because of the possibility of information loss, as has been demonstrated for regression.9 Wavelet orthogonal signal correction (WOSC), a method combining OSC preprocessing with a wavelet prism transformation, can be used to discover and remove unwanted background effects in calibration. Its performance was shown superior to OSC filtering combined with PLS regression in removing background while retaining information useful to a multivariate calibration.14 In this manuscript, we couple WOSC preprocessing with discriminant PLS to create a new algorithm, WOSC-DA. Two spectral data sets are examined to show the improvement in classification performance achieved using a PLS-DA classifier on spectral data preprocessed by WOSC filtering. THEORY Notation. In the following text, matrices are represented by bold capital characters (e.g., X), column vectors by bold lower case characters (e.g., y), and scalars by lightface italic letters (e.g., R). The transposition operation is indicated by T. PLSDA. A discriminant separating two classes can be developed by linear regression. When classes are described by more features than there are objects, a conventional PLS regression can be used to model a linear discriminant for classification, an approach known in the literature as PLS-DA. Binary classification (11) Bylesjö, M.; Rantalainen, M.; Cloarec, O.; Nicholson, J. K.; Holmes, E.; Trygg, J. J. Chemom. 2007, 20, 341–351. (12) Trygg, J.; Wold, S. J. Chemom. 2002, 26, 119–128. (13) Westerhuis, J. A.; deJong, S.; Smilde, A. K. Chemom. Intell. Lab. Syst. 2001, 56, 13–25. (14) Feudale, R. N.; Liu, Y.; Woody, N. A.; Tan, H. W.; Brown, S. D. J. Chemom. 2005, 19, 55–63. 10.1021/ac901204t CCC: $40.75  2009 American Chemical Society Published on Web 09/29/2009

is done by encoding class membership in the property vector y, and PLS-1 regression is used to model the single discriminant and threshold value. The context in which PLS-DA can be regarded as an extension of conventional linear discriminant analysis has been explored theoretically by Barker and Rayens.3 When more than two classes are present, class information can be treated as a binary classification and PLS-1 used to develop a set of discriminant boundaries between each target class and all other classes. It is also possible to encode the set of class identities in a matrix, Y, where each column represents the binary class membership of each sample, and to use a PLS-2 regression to develop a set of discriminant functions separating each class from all others. We follow Barker and Rayens in our use of the binary discriminant here. OPLS-DA. OPLS regression,12 a PLS regression method closely related to OSC preprocessing, addresses the problem of noncorrelated variation that is often present in data sets subjected to multivariate calibration. The OPLS decomposition is described by the following equations: X ) XO + XOPLS + E

(1)

nents. The WOSC processing is described in detail in ref14 and only a brief overview is given below. First, a wavelet prism (WP) decomposition8 is performed: WP(X) f Fj

(3)

where each spectrum j in the response matrix X is decomposed by the wavelet prism transform into l levels to generate a set of l+1 frequency components contained in matrix Fj. Subsequently, OSC filtering is used to process each of the frequency components in Fj to generate a set of OSC-filtered frequency components FOSC, j and a set of orthogonal weights Wj and loadings Pj. OSC(Fj,y) f FOSC,j, Wj, Pj

(4)

After OSC filtering of each frequency component, the WOSCcorrected spectrum XWOSC is reconstructed from the remaining spectral signal in each frequency component: i+1

XWOSC )

∑F

OSC,j

(5)

j)1

and XOPLS ) X - TOPO

(2)

where X is the response matrix, XO is the portion of the response matrix orthogonal to target property Y, TO is the score matrix orthogonal to Y, PO is the loading matrix orthogonal to Y, and XOPLS is the OPLS preprocessed matrix. The main advantage gained from OPLS regression is obtaining a PLS model that allows for interpretation through eliminating undesired effects and removing uncorrelated variations between a response matrix X and a property matrix Y. When unwanted variation is present in the spectral data, OPLS can be used in place of PLS to define the discriminant between classes. The OPLS regression used to define the discriminant here is slightly different from that used in the algorithm described by Bylesjo et al.,11 which contained a resampling phase to get the distribution of the predictions and then used this distribution to determine a threshold to assign the class of samples in overlapping classes. The resampling serves as a means of estimating class probabilities from the estimated class distributions, but because OPLS-DA models can be used to predict the class property of each training sample and to assign new samples to classes based on the predicted class results, the resampling step is not needed here. Using the binary class vector coding that is standard with PLS-1, those samples whose predicted numerical values are closer to 1 are assigned to that class and those with predicted class scores closer to 0 are assigned as not in this class, just as in classification using the PLS-DA algorithm. WOSC-DA. WOSC14 is a signal preprocessing technique aimed at removing undesirable background effects and enhancing the subsequent PLS regression model, by a combination of wavelet prism transformation of data into a set of frequency-domain spectra and OSC preprocessing on the frequency-domain data to remove signal uncorrelated to the target property, followed by reconstruction of the filtered spectra from the processed frequency compo-

New samples in Xtest are decomposed by the wavelet prism transform into the same l+1 components, and then the orthogonal weights Wj and loadings Pj are applied to the set of corresponding frequency components, Ftest to obtain the j test OSC-corrected frequency components FOSC,j : test ) Fjtest - FjtestWj(PjTWj)-1PjT FOSC,j

(6)

test Then all FOSC,j are summed to obtain the WOSC-corrected spectrum for each of the test samples. WOSC processing can be extended to classification problems by combining WOSC and PLS-DA algorithms:

PLS-DA(XWOSC,y) f yˆ

(7)

where XWOSC is the spectrum filtered by WOSC from eqs 3-6, y is the reference binary class vector, and yˆ is the predicted binary class vector from PLS-DA classifier. Filtering each frequency component separately with OSC to remove irrelevant classification information rather than discarding entire frequency components of data preprocessed by the wavelet prism has a beneficial effect on classification because all frequency components contain some information relevant for classification, though the amount of information varies considerably among the frequency components. By performing separate OSC processing on the different frequency components present in the data, application of the WOSC algorithm removes much of the irrelevant information but retains most of the information useful to the location of the discriminant. EXPERIMENTAL SECTION Data Sets. Two data sets collected from two different instruments are selected to demonstrate the better classification performance and easy interpretation of WOSC-DA. First, a set of Fourier transform infrared (FTIR) spectra of various bacterial colonies on agar plates is analyzed here. This set was previously Analytical Chemistry, Vol. 81, No. 21, November 1, 2009

8963

reported15 by Goodacre, where full experimental details are available. The bacterial identification data set contained five classes (consisting of bacterial species Escherichia coli, Proteus mirabilis, Klebsiella spp, Pseudomonas aeruginosa, and Enterococcus spp) described by 236 spectra, with each spectrum collected from a dried bacterial sample placed on a sandblasted aluminum plate using Fourier transform infrared spectrometry in diffuse reflectance mode from 4000 to 600 cm-1 at an increment of 4 cm-1, producing a total of 882 spectral channels. A single spectrum obtained from 256 coadded scans on each sample was measured. The second data set examined here consisted of samples of green coffee beans with 87 spectral samples measured by nearinfrared (NIR) spectrometry representing four classes (different geographical coffee-growing regions including Africa, Asia, North America, and South America). This data set was collected in-house and is described in detail elsewhere.16 Each NIR spectrum was obtained from 1100 to 2498 nm in increments of 2 nm in diffuse reflectance mode on a NIRSystems 6500 spectrometer, producing a total of 700 spectral channels. Each sample spectrum examined here was the average of 64 replicate spectra, with the sample rotated 90° after every 16 scans to reduce any specular reflection effects present in the data. Classification Procedure. For each data set, a 10-fold Venetian blind cross-validation procedure was used on the training set, where 50% of the samples in both data sets were selected and employed to evaluate the classification performance with optimization of the number of latent variables used in the PLSDA, OPLS-DA, and WOSC-DA classifiers. These classifiers were then used to identify the class membership of all remaining samples. Prior to building the PLS-DA, OPLS-DA, and WOSCDA classifiers, each of the training sets was mean-centered by feature. No other preprocessing was used. When the bacterial data set was preprocessed by the wavelet prism transform8 in the WOSC-DA classifier, the training and test sets for the bacterial data were decomposed to 9 levels (10 scales) using the Symlet 4 mother wavelet (sym4). Although the choice of different mother wavelets slightly affects the classification performance of the WOSC-DA classifier, only the sym4 mother wavelet was explored here because we have seen in previous work that sym4 basis offers a good approximation to spectral responses.17 The green coffee training and test sets were decomposed to 8 levels (9 scales) using the sym 4 wavelet because the of the smaller wavelength range of the green coffee data. Measure of Success for Multivariate Classification Models. The removal of the dominant irrelevant information in data sets should allow for construction of a more robust classifier with improved classification performance and reduced complexity of the classification model requiring fewer latent variables than those classifier models developed from PLS-DA or OPLS-DA discriminant modeling. The classification error, the number of misclassified samples is divided by the number of original in-class samples, is implemented here as one quantitative measure evaluating classification performance of each classifier. Sensitivity (15) Goodacre, R.; Timmins, E. M.; Burton, R.; Kaderbhai, N.; Woodward, A. M.; Kell, A. V.; Rooney, P. J. Microbiol. 1998, 144, 1157–1170. (16) Myles, A. J.; Zimmerman, T. A.; Brown, S. D. Appl. Spectrosc. 2006, 60, 1198–1203. (17) Mittermayr, C. R.; Tan, H. W.; Brown, S. D. Appl. Spectrosc. 2001, 55, 827–833.

8964

Analytical Chemistry, Vol. 81, No. 21, November 1, 2009

and specificity metrics often used in classification are also employed to evaluate the classification performance of each classifier. Sensitivity (SENS) and specificity (SPEC), respectively,18 can be expressed by SENS ) TP/(TP + FN)

(8)

SPEC ) TN/(FP + TN)

(9)

and

where TN represents the number of true negatives, TP is the number of true positives, FN denotes the number of false negatives, and FP indicates the number of false positives. A discussion of these metrics, as well as their use in assessing the performance of the three classifiers, is provided in the Supporting Information provided with this paper. Computation and Software. All computations were performed in Matlab version 7.4 (The Mathworks, Inc., Natick, MA) on an AMD Opteron running at 2.0 GHz with 2.0 GB memory. PLS-DA Matlab code was modified from code in the PLS Toolbox version 3.54, obtained from Eigenvector Research. Algorithms for OPLS-DA and WOSC-DA were written in-house in the Matlab language, using some parts of the PLS Toolbox version 2.01f obtained from Eigenvector Research, and the Wavelet Toolbox, available from the Mathworks. RESULTS AND DISCUSSION Green Coffee Data. The NIR diffuse reflectance spectra of green coffee data are dominated by a variable baseline offset that contributes heavily to the spectral variation in this data set (see Figure S-1, available in the Supporting Information). The baseline offset has adverse effects on the final classifier, and successful removal of this major variation in the coffee spectral data should allow for a clearer relationship between the response matrix X and class vector y, a simpler discriminant model and more robust classification performance. Even though OPLS-DA analysis eliminated some variation uncorrelated to classification information with removal of two OPLS components, parts of the spectrum, such as the range from 1100 to 1400 nm and range from 2300 to 2498 nm, still showed background effects, suggesting that OPLSDA modeling will not remove background variation effectively. Other attempts at removing uncorrelated effects in a wavelengthdomain spectrum through OPLS-DA preprocessing done on the full spectral range have also been shown unsuccessful because the OPLS and PLS regression model the same space.13 In contrast, the WOSC-DA processing removed background effects almost entirely since frequency-dependent filtering suppresses or even eliminates the effects of background and noise in spectra and helps enhance the signal of interest because accidental correlation to noise is restricted. The WOSC filtering approach permits local removal of effects isolated by frequency rather than complete removal of a range of frequencies by discarding some scales or by some zeroing wavelet coefficients based on their size, which can lead to a loss of useful information.9 To illustrate the improvement in classification after application of WOSC preprocessing to identify class 1 (Africa) samples in (18) Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer: New York, 2001; p 277.

Figure 1. Classification error curves for class 1 (Africa) in coffee data from PLS-DA, OPLS-DA with one OPLS component removed, and WOSC with two OSC components removed: (A) Cross-validation and (B) validation.

coffee data, classification error curves for the first 14 latent variables for the cross-validation of three classifiers, PLS-DA, OPLS-DA, and WOSC-DA, are depicted in Figure 1A. As shown in Figure 1A, classification errors for the PLS-DA classifier, applied without other preprocessing, decline slowly with added latent variables until reaching a minimum classification error of 13.83% at 12 latent variables. The curve from OPLS-DA requires 11 latent variables, one latent variable less than that from PLS-DA, to attain the same cross-validation classification error as that of the PLSDA classifier applied to unprocessed data, with one OPLS component removed. Consequently, there is no improvement in the classification error, though there is a reduction of the classifier’s complexity. OPLS modeling of the classification performs as in a calibration; the variation removed by OPLS preprocessing lies within the space spanned by the PLS model. On the contrary, as observed in this figure, the classifier based on WOSC filtered spectra reaches a minimum classification error of 0% at six latent variables, suggesting a substantial improvement in both classification performance and model simplicity, a result consistent with the spectral changes apparent in Figure S-1, which is available for examination in the Supporting Information. Figure 1B shows the external prediction classification errors from all three classifiers, with and without preprocessing, for comparison. Determined by cross-validation, PLS-DA, OPLS-DA, and WOSC-DA classifiers reach optimal classification errors of 5.71%, 5.71%, and 0%, respectively. Again, the cross-validation and prediction classification error curves from WOSC-DA are much lower than those from PLS-DA and OPLS-DA, and a considerable enhancement in both classification performance and model simplicity is observed from the WOSC-DA classifier with two OSC components removed from each component of wavelet prismprocessed data, for class 1 in the green coffee data set. In addition, as depicted in this Figure, the cross-validation and prediction curves from WOSC-DA track each other well, differing from those same curves produced by classification models generated from PLS-DA and OPLS-DA, a result that strongly suggests more consistency between cross-validation and validation. It is shown in Table 1 that the variation retained in each frequency component used in the WOSC-DA filtering varies with different frequency components. Because each frequency component produced by the wavelet prism transform from the original spectrum contains information both beneficial for and detrimental to the classification and because the spectral information useful to the classification distributes unevenly over different frequency

Table 1. Percent Variation Retained in Each Frequency Component for All Classes in Green Coffee Data after WOSC-DA Processing variation retained (%) frequency a component class 1 class 2a class 3a class 4a high frequency

low frequency

F1 F2 F3 F4 F5 F6 F7 F8 F9

45.28 37.08 23.52 26.65 20.17 18.82 13.44 5.52 3.10

43.12 23.14 17.37 16.44 11.69 6.33 4.44 0.09 0.30

41.29 19.66 10.66 12.56 9.42 6.09 4.23 0.09 0.11

44.64 33.68 18.25 23.48 19.17 19.17 12.37 4.06 2.80

a Two, three, three, and two OSC components removed in modeling classes 1-4, respectively.

components, removal of locally uncorrelated information in the frequency domain while retaining correlated information in each frequency component is necessary. The last two low-frequency components largely represent background signals for all classes, and these retain little variation relevant to the classification after WOSC processing. As a consequence, the baseline offset that dominated the spectral data and played an adverse role in the subsequent PLS-based classifier has been almost completely removed in the WOSC-DA classifier. It is not surprising that much more information is retained in the high- and middle-frequency components, since they are major contributors to the classes’ spectral fingerprint, and the majority of the spectral information relevant to class is contained in them. In the high-frequency components, contributions to the signal are mainly expected from noise, but because the noise is close to randomly distributed, a portion of these frequency components is accidentally correlated to the property and cannot be removed by OSC filtering. Since none of the frequency components have variation reduced to zero by an OSC filtering step using class as the target property, each contains some information relevant to the classification. It is therefore likely that information loss occurs when there is a deletion of an entire component, and that the amount of information lost will depend on the specific frequency component deleted.10 It is also clear from the discussion above that a filtering method based on conventional denoising of wavelet scales will not accomplish efficient removal of irrelevant information while retaining information useful to a classification. Analytical Chemistry, Vol. 81, No. 21, November 1, 2009

8965

Table 2. Classification Errors for PLS-DA, OPLS-DA, and WOSC-DA for the Green Coffee Data Set methods b

PLS-DA(CV ) PLS-DA(Pb) OPLS-DA(CV) OPLS-DA(P) WOSC-DA(CV) WOSC-DA(P)

LV in model 12/15/11/11 12/15/11/11 11/13/10/10 11/13/10/10 6/5/6/8 6/5/6/8

componentsa removed

class 1, %

class 2, %

class 3, %

class 4, %

1/2/1/1 1/2/1/1 2/3/3/2 2/3/3/2

13.82 5.71 13.82 5.71 0 0

12.70 18.14 12.70 18.14 0 7.03

20.71 33.08 24.29 33.08 10.48 14.36

27.27 16.03 28.79 16.03 9.09 5.00

a The number of OSC components removed for OPLS-DA and for WOSC-DA in each frequency component generated from wavelet prism transform. b CV refers to classification error of cross-validation on training set, and P refers to classification error of prediction on an external test set.

Figure 2. Classification error curves for class 3 (bacterial species comprising the K. spp) in bacterial data from PLS-DA, OPLS-DA with one OPLS component removed, and WOSC with two OSC components removed: (A) Cross-validation and (B) validation.

Table 2 provides a summary of cross-validation and external prediction classification errors from PLS-DA, OPLS-DA, and WOSC-DA classifiers for all classes in the green coffee data set. In all cases, the WOSC-DA classifiers outperformed PLS-DA and OPLS-DA classifiers. All results obtained from OPLS-DA were similar to those obtained from PLS-DA classifiers. Table 2 shows that the classification performance from WOSC-DA varies over the different classes. The information retained in higher- and middle-frequency components, especially in frequency components 1-4, appears to be important in differentiating class 1 and class 4 (after two OSC components are removed), and class 2 and class 3 (after three OSC components are removed). Not surprisingly, information for classification of the different classes distributes unevenly in the different frequency components because each class has its own spectral signature in the local frequency domain. That difference in distribution over frequencies and in locations of spectral signatures is the basis of the discrimination of different classes. Bacterial Data Set. The bacterial data set is employed here to further demonstrate improved performance using the WOSC-

DA classifier. Figure 2A depicts the cross-validation results for class 3 in bacterial data from PLS-DA, OPLS-DA (with one OPLS component removed), and WOSC-DA (with three OSC components removed) classifiers. A PLS-DA classifier applied to the spectra without preprocessing reaches an optimal classification error value of 3.52%, as determined by cross-validation, using six latent variables. The OPLS-DA classifier with one OPLS component removed again reaches the same cross-validation classification error value as that from PLS-DA classifier applied to the raw spectra but using one less latent variable than that used in the PLS-DA model. The classification error from WOSC-DA is 8% at one latent variable and decreases to 0% at four latent variables. By comparison of the cross-validated errors at the first latent variable among PLS-DA, OPLS-DA, and WOSC-DA classifiers for class 3, it is clear that a significant improvement in classification performance has been attained through application of the WOSCDA classifier. Also depicted in Figure 2B are the error curves from PLS-DA, OPLS-DA, and WOSC-DA classifiers applied to the external test set. For all of these curves, the same tendency as seen in cross-validation is observed for the external prediction

Table 3. Classification Error from PLS-DA, OPLS-DA, and WOSC-DA for the Bacterial Data Set

b

methods

LV in model

PLS-DA(CVb) PLS-DA(Pb) OPLS-DA(CV) OPLS-DA(P) WOSC-DA(CV) WOSC-DA(P)

7/5/6/5/4 7/5/6/5/4 6/4/5/4/3 6/4/5/4/3 5/4/4/2/1 5/4/4/2/1

components removeda

class 1, %

class 2, %

class 3, %

class 4, %

class 5, %

1/1/1/1/1 1/1/1/1/1 2/1/3/2/2 2/1/3/2/2

3.26 2.07 1.79 2.07 0.60 0.60

1.02 11.02 1.02 11.02 0 0

3.52 4.54 3.52 4.54 0 0

4.03 6.58 3.52 6.58 0 0.51

0 0 0 0 0 0.53

a The number of OSC components removed for OPLS-DA and for WOSC-DA (in each frequency component generated from wavelet prism). CV represents classification error of cross-validation on training set, and P represents classification error of prediction on test set.

8966

Analytical Chemistry, Vol. 81, No. 21, November 1, 2009

sets. It is important to note that the classification curves from the WOSC-DA classifier are very stable over almost the entire range of latent variables in both cross-validation and prediction, in comparison with classification curves from PLS-DA and OPLSDA. Note also that the classification errors from the WOSC-DA classifier are much lower than those from PLS-DA and OPLS-DA classifiers. These results indicate the robustness of the WOSCDA classification. The performance in cross-validation and prediction from the PLS-DA, OPLS-DA, and WOSC-DA classifiers is shown in Table 3. The performance of the WOSC-DA classifier in cross-validation and prediction is again significantly better in all cases. Results from the bacterial data mirror those observed on the green coffee data, even though these two data sets differ in appearance and in the distribution of class information. For class 5 in this table, the predictions from WOSC-DA are slightly poorer than those from PLS-DA and OPLS-DA, however, since the PLS-DA model can perfectly classify samples for class 5 and no single model from WOSC-DA can improve on that classification. Nevertheless, the curves from the WOSC-DA classifier for class 5 are more stable than those from the PLS-DA and OPLS-DA classifiers; the WOSCDA cross-validation has 0% cross-validation and prediction classification error at all latent variables. To be consistent with our other choices, we select the WOSC-DA model with 1LV there because it has fewest LV and best performance, though a model with one more LV performs equally well in cross-validation and better in the external prediction.

mance from subsequent classification using the PLS-DA algorithm is significantly enhanced by combination of the wavelet prism transform and OSC preprocessing of wavelet signals, as shown both in improved classification performance and in consistency in performance in cross-validation and prediction. It is worth noting that the WOSC-DA classifier also shows improved sensitivity and selectivity in these classifications (see the Supporting Information), while PLS-DA and OPLS-DA classifiers perform equivalently. It is likely that the WOSC-DA classifier takes advantage of the multiscale nature of data through use of the wavelet prism transformation and allows better removal of variation irrelevant to classification information in each frequency component, a process that avoids information losses that occur in other techniques. Because each frequency component spectrum may not require removal of the same number of OSC components, it is also likely that an additional modification of WOSC allowing the number of OSC components to vary with scale may lead to an even more effective classifier, but automated optimization of that process is challenging.

CONCLUSIONS The extension of WOSC regression to classification problems is presented in this paper to show that the classification perfor-

Received for review June 3, 2009. Accepted September 16, 2009.

ACKNOWLEDGMENT This work was supported by the Center for Process Analytical Chemistry. Wangdong Ni acknowledges the China Scholarship Council (CSC) for financial support. SUPPORTING INFORMATION AVAILABLE Additional information as noted in the text. This material is available free of charge via the Internet at http://pubs.acs.org.

AC901204T

Analytical Chemistry, Vol. 81, No. 21, November 1, 2009

8967