Two-Stage Variable Selection Using the Wavelet Transform of Batch

Oct 3, 2007 - We propose a two-stage variable-selection strategy performed in the wavelet domain in order to extract quality-related information from ...
0 downloads 0 Views 660KB Size
7188

Ind. Eng. Chem. Res. 2007, 46, 7188-7197

Two-Stage Variable Selection Using the Wavelet Transform of Batch Trajectories for Data Interpretation and Construction of Parsimonious Quality-Estimation Models Young-Hwan Chu,† Daeyoun Kim,‡ Chonghun Han,*,‡ and En-Sup Yoon‡ Samsung Petrochemical, Bugok-Dong 500, Nam-ku, Ulsan 680-110, Korea, and School of Chemical and Biological Engineering, Institute of Chemical Processes, Seoul National UniVersity, San 56-1, Shillim-dong, Kwanak-gu, Seoul 151-742, Korea

We propose a two-stage variable-selection strategy performed in the wavelet domain in order to extract qualityrelated information from batch trajectories of process variables and to build parsimonious quality-estimation models. The proposed variable-selection method proceeds in two stages and uses the discrete wavelet transform of the batch trajectories. This approach greatly reduces the computation time required for finding those wavelet coefficients related to product quality. A quality-estimation model built with the selected wavelet coefficient subset is shown for the case study we discuss to exhibit satisfactory estimation accuracy. The application of the method to an industrial PVC polymerization process was found to achieve the same prediction accuracy as a multiway partial least-squares (MPLS) based model that uses all variables in the time domain. 1. Introduction In batch production systems, the most important concern is the final product quality rather than its yield. However, a critical problem for the measurement of quality variables is that such measurements are time-consuming and costly. Thus, there is significant interest in quality-estimation techniques that can predict the behavior of the quality variables using quality-related information. The most popular quality-estimation method is to use empirical models to predict the values of quality variables from the known values of the process variables.1 The multiway partial least-squares (MPLS) method, which relates the quality variables to all variables unfolded from three-way batch data by a projection technique, is currently accepted as the most popular modeling technique for quality estimation because of its robust prediction results.2,3 Nevertheless, the use of too many variables in modeling has disadvantages in regard to model simplicity, ease of maintenance, and time required for model building. Thus, selection of these variables based on a systematic method should precede any modeling. Another benefit of variable-selection preprocessing is the identification of the dynamic characteristics of a batch process that determine the final product quality. If we investigate the variables selected from the batch trajectories, the stages that are critical to quality can be identified. The implementation of a variable-selection algorithm using wavelet coefficients in the wavelet domain is a good candidate for this approach. The timescale representation capability of the wavelet transform enables the effective analysis of frequency-specific information as well as of the time-specific characteristics of a batch process. The importance of variable selection for interpreting trajectory data and efficient empirical modeling has been discussed in several papers,4-7 and remarkable algorithmic methods have been proposed.8-12 In addition, novel approaches have been proposed for the design and optimization of trajectories of * To whom correspondence should be addressed. Tel.: +82-2-8801887. Fax: +82-2-888-7295. E-mail: [email protected]. † Samsung Petrochemical. ‡ Seoul National University.

manipulated process variables by analyzing the relationship between product quality and each section of the trajectories based on experimental-design methods.13,14 The critical-toquality stages can be correctly identified without painstaking experiments by applying a bootstrapping-based variable-selection method to limited amounts of historical data.15 However, interpreting batch process data has been limited in previous research to the identification of the quality-determining time stages. In this paper, we propose a two-stage variable-selection strategy based on the wavelet transform and demonstrate that it is more informative and faster than previous techniques. An industrial polymerization process for producing poly(vinyl) chloride (PVC) is used as a case study in this paper to test the proposed method. The use of the wavelet transform makes it possible to clearly distinguish the variables with large variances from those with small variances. The contribution of the proposed method to parsimonious quality-estimation modeling is then demonstrated by comparing the performance of a qualityestimation model built with the proposed method with one constructed using MPLS, both in terms of the prediction performance and the number of predictor variables. 2. Variable Selection for Extraction of Quality-Related Information from Process Trajectories In Figure 1, we show a simple example of how a variableselection technique can be used to identify the quality-related sections within process trajectories. In this figure, each section of the trajectories for the three process variables is treated as an individual variable, producing a total of 33 variables. In this example, ten variables corresponding to the gray sections have been selected by implementing a variable-selection technique. From these results, we can draw the following two conclusions. First, the three process variables are more important than other variables not to be selected, because a similar number of variables were selected from each section. However, considering intrarelation among three variables, one variable may be more important than the other two variables. Second, the initial stage of process variable 2 has a critical impact on the quality variables, and the last stage of process variable 3 is also

10.1021/ie0614475 CCC: $37.00 © 2007 American Chemical Society Published on Web 10/03/2007

Ind. Eng. Chem. Res., Vol. 46, No. 22, 2007 7189

Figure 1. Identification of the critical-to-quality sections within process trajectories with the variable-selection technique.

important. On the other hand, this procedure does not provide additional information such as the relative importance of the selected sections or frequency-specific information about the selected sections that could be used to establish detailed control guidelines for each section. This is why use of the wavelet transform is required prior to the implementation of the variableselection technique. 3. Wavelet Transform 3.1. Overview of the Wavelet Transform. The wavelet transform is a popular signal-processing technique for use in data compression and feature extraction. It has been successfully applied to various kinds of analyses of chemical process data.1,16-21 Further, variable selection has previously been applied to wavelet coefficients to extract information from raw data and to build parsimonious empirical models for spectrum calibration and classification.22,23 In our study, the wavelet transform is used to extract frequency- and time-specific information from batch trajectories of process variables by using a novel two-stage variable-selection strategy. 3.2. Continuous Wavelet Transform (CWT). The results of CWT are wavelet coefficients that describe the degree of similarity between each section of the original signal and the scaled wavelet function. CWT is expressed mathematically as eq 1

C(s,b) )

∫-∞∞ |s|-1/2Ψ(t -s b)f(t) dt

(1)

The equation shows that the convolution of a signal (f(t)), with a wavelet function with a specific scale, s, and shifted as much as b in the time domain, produces a specific wavelet coefficient, C. With these wavelet coefficients, the characteristic information of a signal can be completely re-expressed. Inversely, the original signal can be reconstructed in the time domain from the wavelet coefficients. However, it is a critical drawback of CWT that too many shift and scale parameters can be required, and there are no clear guidelines for selecting the number of these parameters. 3.3. Discrete Wavelet Transform (DWT) Based on Mallat’s Algorithm: Multiresolution Analysis (MRA). The use of the DWT solves the problems of the CWT by determining

the optimal number of scale and shift parameters of a wavelet function with a MRA approach. This approach was proposed by Mallat24 and successively decomposes the original signal into approximation and detail sections to obtain wavelet functions with different scales and shift parameters. The approximation sections are high-scale components of the signal, while the detail sections are its low-scale components. These two sections are generated by passing the original signal through low-pass and high-pass filters derived from scaling and wavelet functions, respectively. The number of scaled and shifted versions of a chosen wavelet function is automatically determined by the number of decomposition levels. Coifman and Wickerhauser25 have proposed various criteria for determination of the levels. Once the wavelet basis functions are determined, the wavelet coefficients are calculated. 4. Proposed Method The proposed method is performed according to the framework shown in Figure 2. The first step is the application of DWT based on Mallat’s algorithm to the process trajectories. At the second and third steps, the variances of all the wavelet coefficients are calculated and the wavelet coefficients with statistically significant variances above certain thresholds are selected. The variances are calculated in the wavelet domain instead of the time domain because of the enhanced discrimination effect provided by the wavelet transform. Significant variances are defined as those larger than the threshold values, which are the mean of the variances of the wavelet coefficients in each process variable category. The removal of the wavelet coefficients with insignificant variances is important because they would cause singularity problems in the empirical modeling based on regression analysis that is performed in the next variable-selection stage. Another important role of this selection step is to reduce the search area used in the next variable-selection stage, which requires considerable computation time. Theoretically, optimal feature sets cannot be decided exactly beforehand. So, instead of fixing the ranges of significant feature sets, the sets can be kept and updated flexibly. This searching is called the sequential forward-floating selection (SFFS).12 In

7190

Ind. Eng. Chem. Res., Vol. 46, No. 22, 2007

Figure 2. Overall framework of the proposed method. Table 1. Descriptions of the Process and Quality Variables variable

description

T1 T2 T3 F4 P5 R6 A7 P8 F9 F10 F11 T12 T13 F14 T15 Y1 Y2 Y3

inner temperature of reactor input temperature of cooling water output temperature of cooling water input flow rate of cooling water inner pressure of reactor RPM of agitator electric current of reactor pressure of agitator input flow rate of chilled water (side) input flow rate of chilled water (top) input flow rate of chilled water (bottom) input temp of cooling water for condenser output temp of cooling water for condenser input flow rate of cooling water for condenser top temperature of condenser mass fraction of PVC particles below 35 µm average size of PVC particles average number of flaws generated when PVC mass is spread

process variable

quality variable

Figure 3. Procedure of the second variable-selection stage.

the fourth step, the wavelet coefficients that are strongly correlated with quality variables are extracted from those selected in the first variable-selection stage by using the sequential forward-floating selection (SFFS) algorithm,12 which is known to be one of the most efficient variable-selection algorithms.26 The correlation of the wavelet coefficients with the quality variables is the major concern in this selection stage. In this step, the wavelet coefficients that might minimize the prediction error of the quality-estimation models are selected. The selected wavelet coefficients can almost completely represent the quality-related characteristics of a batch process. The detailed procedure used in this second variable-selection stage is shown in Figure 3. The main searching direction of SFFS is bottom up. The basic concept is summarized as augmentation and repetitive depletion. As in Figure 3, starting from the current feature set, SFFS includes the feature determined significant by means of criterion value. Then, the

algorithm excludes the worst features in the newly updated set. This process provides the more improved later sets than the former one.26 As a criterion for evaluating the prediction performance of a subset of the selected wavelet coefficients, the root-mean-square error in prediction (RMSEP) calculated with the multiple linear regression (MLR) model is examined. The resulting wavelet coefficients can be used for either data interpretation or quality estimation. The selected wavelet coefficients can also be used as predictor variables in qualityestimation models. This use of the selected wavelet coefficients enables the construction of parsimonious quality-estimation models with reduced computational costs and maintenance. 5. Case Study 5.1. PVC Polymerization Process. The analysis of an industrial PVC polymerization process is described here as a case study. The aim of this process is to determine how to safely produce PVC products with uniform quality by controlling the operating conditions of batch runs. The most important operating condition is the inner temperature of the reactor, which should follow a particular trajectory in order to produce low operating

Ind. Eng. Chem. Res., Vol. 46, No. 22, 2007 7191

Figure 4. Process flow diagram for the industrial PVC polymerization process.

Figure 5. Cross-validation scheme for calculating the RMSEP of a variable subset.

costs and acceptable product quality. The temperature trajectory consists of three stages: heat-up, main reaction, and cooling. In the heat-up stage, the temperature of the reactants is increased to a specific level to initiate polymerization. Once this exothermic reaction starts, heat is continuously generated. A cooling jacket and chilled water are used to maintain the inner temperature of the reactor at a constant level until the end of the main reaction. After the reaction, the chilled water is discharged along with products together out of the reactor. The temperature sharply drops once the exothermic reaction has finished, because of the continued functioning of the cooling system. The process flow diagram for the PVC polymerization process is shown in Figure 4. The process conditions are monitored online with 15 sensors, shown in Figure 4 as black circles. The data set we used in this case study provided data at every minute for 350 min. Product quality is specified with three quality variables: Y1, Y2, and Y3. These three quality variables are

measured off-line in a separate laboratory after each batch run, so considerable time and costs are required for this step. Descriptions of the 15 process variables and the 3 quality variables are given in Table 1. 5.2. Data. Forty samples of batch data were collected from the PVC process for use in our study. In order to treat all process trajectories on the same scale near zero, mean-centering and scaling to unit variance were performed. Eight of the 40 samples were set aside for use in testing of the accuracy of the predictions of the quality-estimation model. The remaining 32 samples were used in variable selection. In particular, a cross-validation scheme is used in the second variable-selection stage to obtain a more generalized RMSE value for each variable subset. In this scheme, the 32 samples are divided into four groups consisting of eight samples, and the RMSEP for each group is calculated by building a MLR model with the other three groups. The mean of the four RMSEP values is then used as the selection-criterion value for the given variable subset to compare

7192

Ind. Eng. Chem. Res., Vol. 46, No. 22, 2007

Figure 6. (a) Results of data validation with PCA of all process trajectories; (b) sum of squares of the residuals (Q) of PCA.

the prediction capability of the variable subset with that of the other subsets. This cross-validation scheme is shown in Figure 5. The batch trajectories of the 15 process variables were obtained for each run; each trajectory consists of 350 measurements. Figure 6 a shows the results of the principal component analysis (PCA) performed for the 40 batch data samples, which treated all measurement values in the process trajectories as independent variables. In Figure 6b, the Q statistic represents the squared perpendicular distance for the batches from the reduced space. Figure 6b shows that all the batches have been explained adequately since none of them have unusually large residuals exceeding the 95% confidence limit. All 40 batch data samples were used in our study. 5.3. Implementation of the Proposed Wavelet-Based TwoStage Variable Selection. In the first step of the proposed method, DWT was performed on the standardized process

trajectories, and 5250 wavelet coefficients were obtained. We used the Haar (Daubechies 1) wavelet in the DWT because of its simplicity and orthogonality. Using an entropy criterion proposed by Coifman and Wickerhauser,25 we found that five decomposition levels were sufficient for creating a wavelet representation of the original trajectory data. The values of the wavelet coefficients for the 15 process variable trajectories of the 32 batch runs are shown in Figure 7. Because of the smooth shapes of the trajectories, the values of the wavelet coefficients for the early parts (high-scale components) of each process variable are large. In contrast, those located in the late parts (low-scale components) of the process variables are nearly zero, except for F9 and F14, which have large amounts of highfrequency components in their trajectories. In the second and third steps, the variances of the wavelet coefficients for the 32 batch data samples were calculated, and those with statistically insignificant variances were removed

Ind. Eng. Chem. Res., Vol. 46, No. 22, 2007 7193

Figure 7. Wavelet coefficients for the 15 process-variable trajectories of the 32 batches.

according to the thresholds derived from the average variances. Figure 8 shows the variances of the wavelet coefficients for the 15 process variables in the 32 batch data samples. The dotted lines in this figure denote the thresholds determined for the corresponding process variables. By applying the second and third steps, the 526 wavelet coefficients that had variances larger than the thresholds were selected. It should be noted that this number corresponds to only 10% of the total number of wavelet coefficients (5250). In the fourth step, the SFFS algorithm was applied to these selected wavelet coefficients using the RMSEP minimization criterion. The purpose of this step is to select the wavelet coefficients that are highly predictive of the quality variables from the 526 wavelet coefficients. Figure 9 shows the changes in the RMSEP values as the number of wavelet coefficients selected in the second variable-selection stage increases. In this figure, we can see that the 16 selected wavelet coefficients produce the smallest RMSEP and that further selection of the wavelet coefficients dramatically increases the RMSEP. The list of the 16 selected wavelet coefficients is shown in Table 2. The results show that a large number of wavelet coefficients have been selected for the process variables F4 and F14. Therefore, it can be inferred that the rates of input flow of cooling water into the reactor and condenser have a critical impact on the quality variables. Since control of the reactor temperature is known to be very important for product quality, and the two selected process variables play a key role in temperature control, this variable-selection result is clearly reasonable. 5.4. Data Interpretation by Reconstruction of the Original Signals. To interpret the variable-selection results, the 16 selected wavelet coefficients were used to reconstruct the

original signals in the time domain. Figure 10 shows the results obtained from the reconstructions using the 16 selected wavelet coefficients. Because each wavelet coefficient represents the correlation between a process trajectory and a Haar wavelet function with specific scale and shift parameters, 16 kinds of signals with different amplitudes are shown in specific positions, showing various scales in a shape of the Haar wavelet function. We can extract three kinds of important information about the quality variables from these results. First, time regions where nonzero signals are observed are commonly found for T2, F4, P5, R6, F9, and F14 about 300 min after the start of a batch run. Therefore, we can infer that important chemical phenomena that determine the final values of the quality variables take place at this stage. In fact, this stage corresponds to the finalization of the particle growth phase of the polymerization reaction. Because all the quality variables are related to particle size, it is reasonable that the time region in which particle size is determined has been identified by the proposed variableselection method. Second, the shapes of the reconstructed signals give us not only the ranges of the critical-to-quality regions but also control guidelines for those regions. The signals with large width (highscale components) indicate that the corresponding regions broadly influence the quality variables. The F4 and T15 signals near 100 min and the P5 and F14 signals near 300 min belong to this category. On the other hand, signals with small width (low-scale components) such as those of the T2, F4, R6, and F9 signals near 300 min or the F10 and F14 signals near 220 min indicate that the corresponding regions are narrow and that the corresponding process variables almost instantaneously affect the quality variables in these regions.

7194

Ind. Eng. Chem. Res., Vol. 46, No. 22, 2007

Figure 8. Variances of the wavelet coefficients for the 15 process variables in the 32 batches and their thresholds.

Table 2. List of the Selected Wavelet Coefficients involved process variable

wavelet coefficient number

T2 F4

170 15 164 332 21 84 169 166 293 40 67 125 126 144 294 14

P5 R6 F9 F10 F14

T15

Figure 9. Changes in the RMSEP values as the number of selected wavelet coefficients increases.

These conclusions are shown to be correct by comparison with what is known about the actual process. From the viewpoint of process operation, the smooth increase of the rate of input flow of cooling water to the reactor (F4) and the gradual decrease of the top condenser temperature (T15) during the initial stage are very important if the inner reactor temperature (T1) is to reach the temperature demanded for the main polymerization reaction in a controlled manner, particularly as heat starts to be generated. Because this initial control of the reactor temperature ultimately governs the overall reaction conditions, the gradual transitions of these two process variables

around 100 min have a critical impact on the final values of the quality variables. The broad reconstructed P5 and F14 signals near 300 min can be interpreted as indicating the importance of stable operation in the final stage of the particle growth phase. Pressure control at this stage significantly affects the final particle size of the PVC products because the pressure determines the volume of the products in the reactor. To make the volume of polymerized products as large as possible, the reactor pressure should be smoothly decreased. The slow decrease of the rate of input flow of cooling water to the condenser during the final stage is also important to ensure the desired particle size is obtained. Because the exothermic polymerization reaction is nearly complete at this time, no more heat is generated and cooling must cease in a controlled manner. At the same time,

Ind. Eng. Chem. Res., Vol. 46, No. 22, 2007 7195

Figure 10. Signals reconstructed from the 16 selected wavelet coefficients in the time domain. Note that the mean value of each selected wavelet coefficient for the 32 batches was used in this reconstruction.

the particle growth rate slows down and the final values of the quality variables are determined. Since the gradual reduction at this stage of the rate of cooling water flow into the condenser plays a key role in determining the final reactor temperature, this agrees with the broad reconstructed F14 signal around 300 min. The reconstructed signals with narrow width provide us with additional useful information on the control of the process variables. If we examine the sections of the batch trajectories corresponding to the sharp reconstructed T2, F4, R6, and F9 signals around 300 min and the F10 and F14 signals around 220 min, sudden jumps are commonly observed. These instantaneous drops or rises in the process variables contribute significantly to the final values of the quality variables. In particular, the dependence of PVC product volume on chilled water control during the final stage can be identified from the reconstructed F9 and F10 signals. It is known that the instantaneous control of the flow of cooling water to the reactor and condenser during the late stages of a batch run is very important in guiding the polymerization reaction to correct finalization, which is confirmed by the sharp reconstructed F4 and F14 signals. The reconstructed T2 and R6 signals indicate that dropping the reactor cooling water temperature and stopping the agitator action before the end of batch operation stabilize the final properties of the PVC products. Finally, the relative importance of the selected sections of the process trajectories for the final values of the quality variables can be derived from the reconstruction results. If a wavelet coefficient is selected by the proposed method, it means that its variation is related to changes in the quality variables. On the other hand, its absolute value represents the similarity

between a given wavelet function and a specific section of the process trajectories. Thus, the amplitudes of the reconstructed signals can be used to judge the degree of correlation between the corresponding sections of the trajectories and the quality variables. From this viewpoint, we conclude that the section of the P5 trajectory near 300 min has a greater impact by a factor of 500 on the quality variables than the section of the T15 trajectory near 100 min. This interpretation makes sense because industrial operators place even more emphasis on control of the reactor pressure than on control of the condenser temperature in actual operation. In addition, the high amplitude of the reconstructed R6 signal suggests that a timely end to the agitator action crucially influences the determination of the final product quality. This conclusion also agrees with what is known about the process because product viscosity, which influences particle size, is significantly dependent on the speed of the agitator, and various physical properties of the product are stabilized by stopping the agitator action. 5.5. Quality Estimation Using a PLS Model Constructed with the Selected Wavelet Coefficients. The usefulness of the selected wavelet coefficients in quality-estimation models was tested by building a PLS model that relates the wavelet coefficients and the three quality variables and by calculating the RMSEP for the eight test data samples. The performance of the model based on the proposed method was compared to that based on MPLS in terms of both their RMSEP and number of predictor variables (see Table 3). Because MPLS is the most widely used method for quality-estimation modeling in real applications, this comparison tests the applicability of the proposed method as a practical quality-estimation tool.

7196

Ind. Eng. Chem. Res., Vol. 46, No. 22, 2007

Figure 11. Predictions for the eight test data samples of (a) the MPLS model and (b) the PLS model.

Table 3. Comparison of the Performances of the Variable-Selection-Based PLS and MPLS Models

MPLS PLS via the proposed variable selection

RMSEP

no. of predictor variables

Y1

Y2

Y3

4966 (time domain) 16 (wavelet domain)

1.0754 1.2608

0.95323 1.3616

1.1469 0.8512

Figure 11 shows the predictions of the two methods. Intuitively, we can see that the predictions in Figure 11b are closer to the real values than those in Figure 11a, except for the seventh data point in Y1 and the second and sixth data points in Y2. Whereas the predictions of the MPLS treatment vary little with the test data set because of the robustness of MPLS, the predictions of the variable-selection-based PLS treatment track the real values better because only the variables closely correlated with the quality variables are used. This comparison demonstrates the importance of variable selection in building highly predictive quality-estimation models and that the robustness of MPLS cannot provide acceptable prediction accuracy even though its use does avoid serious prediction errors. As a quantitative comparison, the sums of the RMSEP values for the eight test data samples for the two methods were calculated for each quality variable. Although slightly larger RMSEP values were obtained for Y1 and Y2 because of abnormal prediction error in the three data points (the seventh data point in Y1 and the second and sixth data points in Y2), the overall predictions of the variable-selection-based PLS model built with the selected wavelet coefficients are better. It may be possible to obtain smaller RMSEP values if more rigorous validation is performed with additional test data. However, we can infer from these results that the proposed variable-selection method produces a prediction accuracy that is at least comparable to that of MPLS. Further, the number of predictor variables used in the variableselection-based PLS quality-estimation model was much smaller than that in the MPLS model. Although only 0.3% of the number of the predictor variables used in the MPLS model were used to build the PLS quality-estimation model, there was no obvious difference in the performance of the two methods. This result shows that the selected wavelet coefficients capture most of the essential information required to track the behavior of the quality variables. By using a smaller number of predictor variables, we can reduce the computation time required for

building quality-estimation models and the maintenance costs required for modifying or updating existing models. These advantages enhance the possibility of replacing MPLS with the proposed method in practical applications. 6. Conclusions In this study, we have proposed a two-stage variable-selection method performed in the wavelet domain for extracting qualityrelated information from process trajectories and for building parsimonious quality-estimation models of batch processes. In the variable-selection stage for the PVC case study, a subset containing only 16 wavelet coefficients corresponding to 0.3% of the original number of coefficients was found to be required. Despite the extremely small size of this subset, the selected wavelet coefficients were found to reveal various facts when the original signals were reconstructed in the time domain. We found that a prediction accuracy comparable to that of MPLS, which uses 4966 variables in the time domain, could be obtained with only 16 wavelet coefficients. This small number of wavelet coefficients has benefits in terms of both computational cost and model maintenance. Acknowledgment The authors gratefully acknowledge the support for partial fulfillment of this work from the Korea Institute of Science and Technology, the Korea Science and Engineering Foundation provided through the Advanced Environmental Biotechnology Research Center (R11-2003-006) at Pohang University of Science and Technology, and the Brain Korea 21 project initiated by the Ministry of Education, Korea. This work was also supported by Grant No. (R01-2004-000-10345-0) from the Basic Research Program of the Korea Science & Engineering Foundation. Notation b ) shift parameter of a wavelet function C ) wavelet coefficient n ) number of decomposition levels in DWT s ) scale parameter of a wavelet function Vnew,k ) most recently added variable in a variable subset, Vk Vx ) xth variable column Vk ) variable subset composed of k selected variables

Ind. Eng. Chem. Res., Vol. 46, No. 22, 2007 7197

Greek Letters  ) tolerance for changes in the RMSEP ψ ) wavelet function Literature Cited (1) Kano, M.; Hasebe, S.; Hashimoto, I.; Ohno, H. Statistical Process Monitoring Based on Dissimilarity of Process Data. AIChE J. 2002, 48 (6), 1231. (2) Nomikos, P.; MacGregor, J. F. Multi-Way Partial Least Squares in Monitoring Batch Processes. Chemom. Intell. Lab. Syst. 1995, 30, 97. (3) Wold, S.; Geladi, P.; Esbensen, K.; O ¨ hman, J. Multi-Way Principal Components and PLS-Analysis. J. Chemom. 1987, 1, 41. (4) Walmsley, A. D. Improved Variable Selection Procedure for Multivariate Linear Regression. Anal. Chim. Acta 1997, 354, 225. (5) Eklo¨v, T.; Mårtensson, P.; Lundstro¨m, I. Selection of Variables for Interpreting Multivariate Gas Sensor Data. Anal. Chim. Acta 1999, 381, 221. (6) Ho¨skuldsson, A. Variable and Subset Selection in PLS Regression. Chemom. Intell. Lab. Syst. 2001, 55, 23. (7) Chu, Y. H.; Qin, S. J.; Han, C. Fault Detection and Operation Mode Identification Based on Pattern Classification with Variable Selection. Ind. Eng. Chem. Res. 2004, 43, 1701. (8) Marill, T.; Green, D. M. On the Effectiveness of Receptors in Recognition System. IEEE Trans. Inf. Theory. 1963, 9, 11. (9) Whitney, A. W. A Direct Method of Nonparametric Measurement Selection. IEEE Trans. Comput. 1971, 20, 1100. (10) Siedlecki, W.; Sklansky, J. On Automatic Feature Selection. Int. J. Pattern Recognit. Artif. Intell. 1988, 2, 197. (11) Siedlecki, W.; Sklansky, J. A Note on Genetic Algorithm for LargeScale Feature Selection. Pattern Recognit. Lett. 1989, 10, 335. (12) Pudil, P.; Novovicˇova´, J.; Kittler, J. Floating Search Methods in Feature Selection. Pattern Recognit. Lett. 1994, 15, 1119. (13) Duchesne, C.; MacGregor, J. F. Multivariate Analysis and Optimization of Process Variable Trajectories for Batch Processes. Chemom. Intell. Lab. Syst. 2000, 51, 125. (14) Chen, J.; Sheui, R. G. Using Taguchi’s Method and Orthogonal Function Approximation to Design Optimal Manipulated Trajectory in Batch Processes. Ind. Eng. Chem. Res. 2002, 41, 2226.

(15) Chu, Y. H.; Lee, Y. H.; Han, C. Improved Quality Estimation and Knowledge Extraction in a Batch Process by Bootstrapping-Based Generalized Variable Selection. Ind. Eng. Chem. Res. 2004, 43, 2680. (16) Leung, A.; Chau, F.; Gao, J. A Review on Applications of Wavelet Transform Techniques in Chemical Analysis: 1989-1997. Chemom. Intell. Lab. Syst. 1998, 43, 165. (17) Bakshi, B. R.; Locher, G.; Stephanopoulos, G.; Stephanopoulos, G. Analysis of Operating Data for Evaluation Diagnosis and Control of Batch Operations. J. Process Control 1994, 4, 179. (18) Bakshi, B. R.; Stephanopoulos, G. Compression of Chemical Process Data by Functional Approximation and Feature Extraction. AIChE J. 1996, 42, 477. (19) Misra, M.; Qin, S. J.; Kumar, S.; Seemann, D. On-Line Compression and Error Analysis Using Wavelet Technology. AIChE J. 2000, 46, 119. (20) Roy, M.; Kumar, V. R.; Kulkarni, B. D.; Sanderson, J.; Rhodes, M.; Stappen, M. V. Simple Denoising Algorithm Using Wavelet Transform. AIChE J. 1999, 45, 2461. (21) Bakshi, B. R. Multiscale PCA with Application to Multivariate Statistical Process Monitoring. AIChE J. 1998, 44, 1596. (22) Alsberg, B.; Woodward, A.; Winson, M.; Rowland, J.; Kell, D. Variable Selection in Wavelet Regression Models. Anal. Chim. Acta 1998, 368, 29. (23) Alsberg, B. Parsimonious Multiscale Classification Models. J. Chemom. 2000, 14, 529. (24) Mallat, S. A Theory for Multiresolution Signal Decomposition: The Wavelet Representation. IEEE Trans. Pattern Anal. Mach. Intell. 1989, 11, 674. (25) Coifman, R.; Wickerhauser, M. Entropy-Based Algorithms for Best Basis Selection. IEEE Trans. Inf. Theory 1992, 38, 713. (26) Jain, A.; Zongker, D. Feature Selection: Evaluation, Application, and Small Sample Performance. IEEE Trans. Pattern Anal. Mach. Intell. 1997, 19, 153.

ReceiVed for reView November 12, 2006 ReVised manuscript receiVed June 4, 2007 Accepted August 10, 2007 IE0614475