Ind. Eng. Chem. Res. 2009, 48, 837–843
837
Two-Dimensional Dynamic Principal Component Analysis with Autodetermined Support Region Yuan Yao, Yinghu Diao,† Ningyun Lu,† Junde Lu, and Furong Gao* Department of Chemical Engineering, Hong Kong UniVersity of Science and Technology, Clear Water Bay, Kowloon, Hong Kong SAR, People’s Republic of China
Dynamics are inherent characteristics of batch processes. In some cases, such dynamics exist not only within a particular batch, but also from batch to batch. In previous work, two-dimensional dynamic principal component analysis (2-D-DPCA) has been developed to monitor 2-D dynamics. Support region determination is a key step in 2-D-DPCA modeling and monitoring of a batch process. A proper support region can ensure modeling accuracy, monitoring efficiency, and reasonable fault diagnosis. In this work, an automatic method for support region determination is developed. This data-based method can be applied on different batch processes without prior process knowledge. Simulation shows that the developed method has good application potentials for both monitoring and fault diagnosis. 1. Introduction Presently, batch processes are widely applied in industry to manufacture high-value-added products. To ensure operation safety and product quality, the multivariate statistical monitoring methods, such as principal component analysis (PCA) and partial least-squares (PLS), have been extended from continuous processes to batch processes for online monitoring.1-5 However, these methods all assume batch independency for the processes. In fact, dynamics are inherent characteristics of batch processes. In some cases, such dynamics exist not only within a particular batch, but also from batch to batch. Causes of batch-wise dynamics, for example, are slow property changing of feed stocks, process characteristics drift, utilizations of run-to-run controllers, effects of slow response variables, and so on. All of these are common in batch processes. To take process dynamics into consideration, several modeling methods have been developed. Batch dynamic principal component analysis (BDPCA)6 captures within-batch dynamic information, while the lifted state space model7 and the method incorporating prior batches information into MPCA model building8 concerns batchwise dynamics. To build both types of dynamic information into a single model simultaneously, a two-dimensional dynamic principal component analysis (2-D-DPCA) model9 has now been developed. This model has a parsimonious 2-D structure that is relatively easy to build and can provide efficient online faults detection. Small changes in correlation or process drifts can both be effectively detected. However, a remaining problem is the determination of the support region (ROS) for such a 2-D-DPCA model. In our previous work, the support region is assumed to be limited in the quarter plane and have a regular shape. This assumption may not always be reasonable for certain batch processes. In this work, the problem of ROS determination for 2-D-DPCA model is presented and resolved. This work is organized as follows. In section 2, the basics of PCA are introduced, followed by a review of 2-D-DPCA in section 3. Then, in section 4, the problem of ROS determination is illustrated. A data-driven method to deter* To whom correspondence should be addressed. Tel: (852)-23587139. Fax: (852)-2358-0054. E-mail:
[email protected]. † Currently with Department of Automation, Nanjing University of Aeronautics and Astronautics.
mine the ROS automatically is presented in section 5. The procedure of 2-D-DPCA method with autodetermined ROS is introduced in section 6. In section 7, simulations are given to compare the monitoring and diagnosis results between 2-DDPCA models with quarter-plane ROS and with autodetermined ROS. Finally, the conclusions are given in section 8 to summarize. 2. Basics of PCA By performing PCA, the original data set is divided into two subspaces. Mathematically, PCA is shown in eq 1, A
X ) TPT )
∑
m
tjpTj +
j)1
∑
tjpTj ) Xˆ + E
(1)
j)A+1
where X(n × m) is the data matrix, n is the number of samples, m is the number of process variables, A is the number of the principal components (PCs) retained in score space, tj(n × 1) is the score vector, pj(m × 1) is loading vector which projects data into score space, T and P are score matrix and loading matrix respectively, and E is the residual matrix. By selecting a proper number of retained PCs, T extracts most systematical variance information, while E contains noises. Loading matrix P reflects the correlation information among process variables. The statistic of squared prediction error (SPE) is calculated from residual space: m
SPE ) eTe )
∑ (x - xˆ )
2
j
j
(2)
j)1
where xj is the measurement value of the process variable, and xˆj is the reconstructed value based on PCA. SPE measures the distance between process measurements and PCA model predictions. The control limits of SPE can be calculated based on the SPE values of normal processes with certain statistical assumptions. If SPE is beyond the control limit, then this means the measurements cannot be described by this PCA model, and a fault is detected. The details of PCA including the algorithms can be found in the book written by Jackson.10
10.1021/ie800825m CCC: $40.75 2009 American Chemical Society Published on Web 12/04/2008
838 Ind. Eng. Chem. Res., Vol. 48, No. 2, 2009
3. Review of 2-D-DPCA
4. Problem Description
As stated in the Introduction, 2-D dynamics are common in batch processes. Although 2-D structure has already been widely applied in data filtering, image processing and some other areas, the 2-D-DPCA model is a first attempt to model batch processes for process monitoring within a 2-D frame. It combines lagged regression model structure and PCA method to capture both 2-D dynamics and cross-correlation information among variables. With 2-D dynamics, the current measurements depend not only on lagged measurements in the past time direction of the same batch, but also on lagged measurements in past batches. All of these lagged measurements, which can reflect 2-D dynamic features of batch processes, locate in a region, called support region, region of support (ROS), or prediction region. In our previous work,9 the ROS is assumed to be within a past quarter plane and restricted to a rectangular shape, which means that the current value of a variable xj(i,k) is to be predicted by a region including its past values in time direction xj(i,k - 1), xj(i,k - 2),..., xj(i,k - n), its past values in batch direction xj(i - 1,k), xj(i - 2,k),..., xj(i m,k), and together with the values in the cross direction xj(i - 1,k - 1),..., xj(i - m,k - n), where xj(i,k) is process measurement of variable j at sampling interval k in batch run i (i ) 1,..,I; j ) 1,...,J; k ) 1,..., K). 2-D-DPCA expands the data matrix to include all the lagged measurements in ROS together with current measurements as shown in eq 3, and then performs PCA on such matrix.
To build an accurate 2-D-DPCA model, a key step is to choose a proper ROS. The ROS should be a proper selection of a neighborhood that can provide a good prediction of the current sample, so that all 2-D dynamic information is included. In batch processes, all relationships are causal, so this neighborhood is in the past of the current sample. As shown previously in section 3, the ROS was assumed to have a regular shape in the past quarter plane. On the basis of such an assumption, the quarter-plane ROS is selected on 2-DAIC.11 However, although the quarter-plane-assumption is widely used in image processing and other research areas, it may not be an optimal or proper choice in the particular cases of batch process modeling. In other words, the ROS of 2-D batch dynamics may not locate in the quarter plane. Take injection molding process as an example. The mold temperature is a slow response variable that may cause 2-D process dynamics. It is possible that the temperature values of the future time intervals in the past batches give better prediction of the current sample than the variable values in the quarter plane. This suggests that the ROS of a batch process may be irregular and complicated. Since each batch process could have some special characteristics of its own that lead to different ROS, it is not proper to set a uniform shape for all batch processes. If process characteristics are known, then the ROS can be chosen based on such prior knowledge. However, this prior information is unlikely to be available for most batch processes. And due to the complex properties of batch processes, it would be difficult and time-consuming to analyze each process separately. The existing methods for 2-D model order selection,12,13 which are based on the known shape of ROS, cannot be directly used in our cases. The methods for 1-D time series model structure determination14 also cannot be extended directly to 2-D cases with unknown ROS. So a general data-driven method is necessary to automatically determine a proper ROS, including its shape and orders. Another limitation of the previous ROS determination method is that the order pair is selected based only on 2-D autocorrelations while time-lagged cross-correlation information is not explored. In fact, when 2-D dynamics exist in a multivariate batch process, the current value of a variable may depend not only on its past value in both time and batch direction, but also on the past values of other variables. In view of these, a new ROS determination method is proposed to explore all the information.
[ ]
T Xm+1,n+1 l XTi,k X) l T XI,K
where
(3)
XTi,k ) [xT(i, k), xT(i, k - 1), · · · , xT(i, k - n), xT(i - 1, k), xT(i - 1, k - 1), · · · , xT(i - 1, k - n), ··· xT(i - m, k), xT(i - m, k - 1), · · · , xT(i - m, k - n)] xT(i, k) ) [x1(i, k), · · · , xj(i, k), · · · , xJ(i, k)]. Thus, with a proper number of retained principal components, most 2-D dynamics and cross-correlation information among variables can be extracted simultaneously, while noises are left in the residual space. Therefore, SPE statistic and corresponding control limits can be calculated for process monitoring. Same as most multivariate statistical process monitoring methods, 2-D-DPCA models batch processes based on sufficient historical normal operation data, which cover the whole normal operation region and distribute appropriately. This is a basis to ensure the reasonability of the ROS determination results and the 2-D-DPCA modeling results. As an extension of PCA, 2-D-DPCA focuses on linear correlations among process variables and the lagged variables. If significant nonlinear correlations are known to exist in a process, then additional variables can be created based on the process variables before performing ROS determination and 2-D-DPCA modeling. Thus, the nonliner auto/cross-correlations in process can be transformed into linear correlations among the process variables, the additional variables and their corresponding lagged variables.
5. ROS Autodetermination 5.1. Basic Idea of ROS Autodetermination. It is noteworthy that a proper ROS should include the necessary past data to provide good prediction of variables’ current values. So in a sense, an ROS determination problem is similar to a task of selecting proper independent variables for a model to predict current measurements. All past samples are candidates of such independent variables. The task of ROS determination is to choose a reasonable subgroup of them in a tidy structure to provide good prediction. For model regression, many commercial statistical program packages, which use stepwise regression, have been developed to determine the regression variables according to certain criterion, such as AIC. Several methods have also been developed for variable selection when multicollinearity is present.15-17
Ind. Eng. Chem. Res., Vol. 48, No. 2, 2009 839
Figure 1. Illustration of initial ROS and proper ROS.
Figure 3. Faulty trajectories of variable x2 with changed correlation structure (Solid lines: normal trajectories; Dash lines: faulty trajectories).
Figure 4. Monitoring for a fault of correlation structure changing by 2-DDPCA with autodetermined ROS (Solid line: 99% control limit; Dash line: 95% control limit).
Figure 2. Procedures of 2-D-DPCA based monitoring and fault diagnosis with autodetermined ROS.
Enlightened by the variable selection problem, a data-driven method for ROS autodetermination is designed. A past neighborhood of the current sample is chosen as the candidate region of ROS, which can be called initial ROS. All samples in this region can be regarded as candidate independent variables for the prediction model, and the current sample is regarded as the dependent variable to be predicted. The selected initial ROS should be sufficiently large to contain the target proper ROS as a subset, so that the proper ROS can be obtained by eliminating unnecessary independent variables. Therefore, stepwise elimination is iteratively conducted. In each run, a regression model is built to relate the remaining candidate independent variables to the current sample’s value. An index, which is a model evaluation criterion, is calculated for each regression model. Then, one independent variable is eliminated from the candidate region based on the importance of variables calculated in each run. In the end, the best choice of the ROS is determined based on the comparison of the index values calculated during the iteration. 5.2. Initial ROS Selection. For the initial ROS selection, some requirements should be satisfied. As pointed out earlier, the candidate region should be sufficiently large, so that the entire proper ROS should be completely contained. However, it should not be too large, in order to control the computation burden in the latter steps. If some process knowledge is available, then the initial ROS can be selected based on such
prior knowledge. Otherwise, Simple Regression (SR),15,18 which is often used as a preselection method in variable selection of PLS models, can be applied to indicate the initial ROS. The task is started with a past neighborhood that is large enough. Current measurement is regressed with each of the lagged variables individually. Check the slope coefficient of each simple regression. Only if the probability of the associated Student t test for a slope coefficient is less than a significant level, is the corresponding lagged variable kept as a component of the initial ROS. The t test is performed on a t-score defined by the following equation,19 t ) b ⁄ SE
(4)
where b is the slope of a sample regression line, i.e., the regression coefficient, and SE is standard error of this slope which can be calculated using eq 5. SE )
(
n
∑ (y - yˆ )
2
i
i)1
i
)( 1⁄2
⁄ (n - 2)
⁄
n
∑ (x - jx)
)
2
i
i)1
1⁄2
(5)
where yi is the value of the dependent variable which is the current measurement in our case, yˆi is the estimated value from the simple regression, xi is the independent variable which is a lagged variable in our case, jx is the mean of the independent variable, and n is number of observations training the regression model. The P-value of the t-score can be calculated based on the Student t test and compared to the threshold. 5.3. Target Proper ROS Determination. After initial ROS selection, Iterative Stepwise Elimination (ISE)20 based on the
840 Ind. Eng. Chem. Res., Vol. 48, No. 2, 2009
importance of the independent variables is performed for the target proper ROS determination. After a regression model between the dependent variable (the current measurement) and the independent variables (the lagged variables in the candidate region) is built, the importance of each lagged variables is quantified as follows:20 z)
|bi|si V
∑ |b |s
in the initial ROS on variable j, nj(V) is the maximum number of lagged time index value in the initial ROS in the Vth batch on variable j, rj(V) is the maximum number of future time index value in the initial ROS on variable j, and j ) 1,2,...,J. 2. On the basis of the chosen candidate region, augment data as follows:
i i
i)1
where bi is the regression coefficient corresponding to the ith independent variable, si is the standard deviation of the ith independent variable, and V is the total number of independent variables in the candidate region. At the beginning of target proper ROS determination, the candidate region is equal to the initial ROS. Then in each elimination cycle, the independent variable with the minimum importance is eliminated. After that, the regression model is recalculated with the remaining independent variables. Since the independent variables are highly correlated, the PLS, PCR or ridge regression model can be built. An index is used as a criterion to evaluate the regression models built in iterations. This index could be AIC, MDL, and other indices used in variable selection. The AIC criterion is proposed by Akaike21 and can be calculated from the equation below, k (7) N where σˆ 2k is the prediction error variance estimated from the regression model, k is the number of independent variable retained in candidate region and built into the regression model, and N is the number of observations. On the basis of the AIC criterion, the smaller the AIC(k), the better the prediction. So ideally, the retained independent variables and k corresponding to the smallest AIC(k) is chosen to be the proper ROS and the number of lagged variables in the ROS. One important issue that should be noted is that AIC sometimes leads to overfitting and results in a larger ROS than the proper one. So if the result of eq 7 is near the smallest value and only decays slowly with the increase of k, the lagged variables retained in the candidate region corresponding to that cycle may be selected as the target proper ROS. One thing that needs to be pointed out is that, as one of the stepwise regression methods, ISE may also have the common shortcomings of stepwise regression. Theoretically, it dose not provide a guarantee of selecting an optimal subset. However, stepwise regression methods are practically useful and widely applied in data mining. Therefore, it is also utilized in our ROS determination research. 5.4. Procedure of ROS Autodetermination. The detailed procedure of ROS autodetermination is given below. 1. In the past half-plane, select a candidate region (initial ROS) that is large enough and includes the proper ROS to be determined. This initial selection can be based on prior process knowledge or SR method as introduced before. Suppose current sample is on time k of batch i, Figure 1 shows the initial ROS on variable xj, which can be an irregular shape in the half-plane including data points of xj(i,k - 1), xj(i,k - 2),..., xj(i,k - nj(i)), xj(i - 1,k + rj(i - 1)), xj(i - 1,k + rj(i - 1) - 1),..., xj(i 1,k), xj(i - 1,k - 1),..., xj(i - 1,k - nj(i - 1)),..., xj(i - mj,k + rj(i - mj)),..., xj(i - mj,k), xj(i - mj,k - 1),..., xj(i - mj,k - nj(i - mj)), where i, j, k are the indices of batch, variable, and time respectively, mj is the maximum number of lagged batch index AIC(k) ) log(σˆ 2k ) + 2
[]
T Xm+1,n+1 l T Xm+1,K-r X) l XTi,k l T XI,K-r
(6)
where
(8)
XTi,k ) [xT1 (i, k), · · · , xTj (i, k), · · · , xTJ (i, k)], xTj (i, k) ) [xj(i, k - 1), · · · , xj(i, k - nj(i)), xj(i - 1, k + rj(i - 1)), xj(i - 1, k + rj(i - 1) - 1), · · · , xj(i - 1, k), xj(i - 1, k - 1), · · · , xj(i - 1, k - nj(i - 1)), ··· xj(i - mj, k + rj(i - mj)), xj(i - mj, k + rj(i - mj) - 1), · · · , xj(i - mj, k), xj(i - mj, k - 1), · · · , xj(i - mj, k - nj(i - mj))], n ) max(n1(i), n1(i - 1), · · · , n1(i - m), · · · , nj(i), nj(i - 1), · · · , nj(i - m), · · · , nJ(i), nJ(i - 1), · · · , nJ(i - m)), r ) max(r1(i - 1), r1(i - 2), · · · , r1(i - m), · · · , rj(i - 1), rj(i - 2), · · · , rj(i - m), · · · , rJ(i - 1), rJ(i - 2), · · · , rJ(i - m)), m ) max(m1, · · · mj, · · · , mJ). 3. Normalize X to zero means and unit variances. 4. Choose a variable xh where h ) 1,..., J and J is the number of process variables, and augment data into a matrix as follows:
[ ]
xh(m + 1, n + 1) l xh(m + 1, K - r) Yh ) l xh(i, k) l xh(I, K - r)
(9)
5. Normalize Yh to zero means and unit variances. 6. Take X as the matrix of independent variables, Yh as the matrix of dependent variable, and build a regression model between X and Yh. 7. Calculate and store the evaluation index value to assess this regression model. The index can be AIC or other indices that are used for model evaluation, as introduced before. 8. Calculate the importance of each independent variable as eq 6. 9. Eliminate a column in X that corresponds to the smallest importance value. 10. Return to step 6 until all columns in X have been eliminated. 11. Compare all the evaluation index values calculated before and determine the one indicating the most proper model. 12. Check the X that corresponds to the best model determined in step 11. The remaining columns of X give the most proper support region of xh. The independent variables retained in X are the lagged variables included in the determined
Ind. Eng. Chem. Res., Vol. 48, No. 2, 2009 841
proper ROS. The proper ROS on xj is also shown in Figure 1. Similar to initial ROS, the proper ROS also may have irregular shape in the past half-plane, which includes xj(i,k - 1), xj(i,k 2),..., xj(i,k - qj(i)), xj(i - 1,k + fj(i - 1)), xj(i - 1,k + fj(i 1) - 1),..., xj(i - 1,k), xj(i - 1,k - 1),..., xj(i - 1,k - qj(i 1)),..., xj(i - pj,k + fj(i - pj)),..., xj(i - pj,k), xj(i - pj,k - 1), xj(i - pj,k - qj(i - pj)), where i, j, k are the indices of batch, time, and variable respectively, pj is the maximum number of lagged batch index in the proper ROS on variable j, qj(V) is the maximum number of lagged time index in the proper ROS in the Vth batch on variable j, fj(V) is the maximum number of future time index in the proper ROS on variable j, and j ) 1,2,...,J. 13. Return to step 4. Choose another xh, and find its support region. When every variable’s support region is determined, the combination of them is the complete proper ROS to be used in 2-D-DPCA model building. 6. 2-D-DPCA with Autodetermined ROS After the proper ROS is determined, the data matrix can be augmented as a combination of current sample and all lagged variables in the proper ROS.
[]
T X˜p+1,q+1 l ˜) X˜Ti,k X l T X˜I,K-f
where
(10)
X˜Ti,k ) [x˜T1 (i, k), · · · , x˜Tj (i, k), · · · , x˜TJ (i, k)], x˜Tj (i, k) ) [xj(i, k), xj(i, k - 1), · · · , xj(i, k - qj(i)), xj(i - 1, k + fj(i - 1)), xj(i - 1, k + fj(i - 1) - 1), · · · , xj(i - 1, k), xj(i - 1, k - 1), · · · , xj(i - 1, k - qj(i - 1)), ··· xj(i - pj, k + fj(i - pj)), xj(i - pj, k + fj(i - pj) - 1), · · · , xj(i - pj, k), xj(i - pj, k - 1), · · · , xj(i - pj, k - qj(i - pj))], q ) max(q1(i), q1(i - 1), · · · , q1(i - p), · · · , qj(i), qj(i - 1), · · · , q1(i - m), · · · , qJ(i), qJ(i - 1), · · · , qJ(i - m)), f ) max(f1(i - 1), f1(i - 2), · · · , f1(i - m), · · · , fj(i - 1), fj(i - 2), · · · , fj(i - m), · · · , fJ(i - 1), fJ(i - 2), · · · , fJ(i - m)), p ) max(p1, · · · , pj, · · · , pJ). The columns of this expanded matrix X˜ consist of all current measurements and the lagged measurements in the whole proper ROS. Then, PCA can be performed on this matrix. The SPE statistic and corresponding control limits can be calculated based on residuals and used in online monitoring when new batch data are coming. Figure 2 shows the procedures of 2-D-DPCA based monitoring and fault diagnosis with autodetermined ROS. 7. Simulation Results In this section, monitoring and diagnosis efficiency of the 2-D-DPCA method with autodetermined support region and previous 2-D-DPCA method is compared with a batch process with 2-D dynamics whose ROS is not within a regular quarter plane. The process model is assumed to be as follows:
x1(i, k) ) 0.5 * x1(i - 1, k + 1) + 0.8 * x1(i, k - 1) 0.3 * x1(i - 1, k) + w1 x2(i, k) ) 0.44 * x2(i - 1, k + 1) + 0.67 * x2(i, k - 1) 0.11 * x2(i - 1, k) + w2 x3(i, k) ) 0.4 * x3(i, k - 1) + 0.25 * x1(i, k) + 0.35 * x2(i, k) + w3 x4(i, k) ) 0.8 * x4(i, k - 1) + 0.53 * x1(i, k) - 0.33 * x2(i, k) + w4 (11) where i is the batch index; k is the time index; x1 and x2 are two independent signals with dynamics in both time and batch directions with different ROS; x3 and x4 are functions of x1, x2 and their own values at one step before; wj(j ) 1,2,3,4) are Gaussian noises with variance 0.01. 100 batches data are generated and there are 200 samples in each batch. The ROS is determined with the proposed autodetermination method following the steps described in section 5.4. The ROS for variable x1 is determined as x1(i,k - 1), x1(i 1,k) and x1(i - 1,k + 1), and variable x2’s ROS includes x2(i,k - 1), x2(i - 1,k) and x2(i - 1,k + 1), while x1(i,k 1), x1(i - 1,k + 1), x2(i,k - 1), x2(i - 1,k + 1), x3(i,k - 1) for x3 and x1(i,k - 1), x1(i - 1,k), x1(i - 1,k + 1), x2(i,k 1), x2(i - 1,k + 1), x4(i,k - 1) for x4. The complete proper ROS includes x1(i,k - 1), x1(i - 1,k), x1(i - 1,k + 1), x2(i,k - 1), x2(i - 1,k), x2(i - 1,k + 1), x3(i,k - 1) and x4(i,k - 1), which is agree with the process. The 2-D AIC method is also used to do the same job based on the quarter-plane assumption of the ROS. Then, the quarter-plane-ROS is made up of x1(i,k - 1), x1(i - 1,k - 1), x1(i - 1,k), x2(i,k - 1), x2(i - 1,k - 1), x2(i - 1,k), x3(i,k - 1), x3(i - 1,k - 1), x3(i - 1,k), x4(i,k - 2), x4(i,k - 1), x4(i - 1,k - 2), x4(i - 1,k 1) and x4(i - 1,k). Then, two 2-D-DPCA models are developed and compared based on these two different choices of ROS. The numbers of PCs retained in the model are determined with the crossvalidation method.22 Both models can explain more than 99% of variations. And there is no significant correlation but only noise retained in residuals, since retained PCs capture most dynamics. With the 2-D-DPCA model of normal operation, the control limits of SPE statistics can then be calculated based on a weighted χ2 distribution assumption and be used to monitor future batches. To check the models’ ability of monitoring and diagnosis, a change in variable correlation structure is generated to simulate the fault. From batch 61, variable x2 is formulated as follows: x2(i, k) ) 0.1 * x2(i - 1, k + 1) + 0.67 * x2(i, k - 1) 0.05 * x2(i - 1, k) + w2 (12) From Figure 3, it can be found that this change is quite insignificant, although there are some differences between the parameters of the two models. The monitoring results are shown in Figures 4 and 5, respectively for autodetermined ROS and quarter-plane ROS. On the control plot of 2-D-DPCA with autodetermined ROS, there are 14.14% of points outside 99% control limit within batch 61. Conversely, on the control plot of 2-D-DPCA with quarter-plane ROS, there are 4.55% of points outside 99% control limit within batch 61. We can see, although the fault is insignificant, both 2-D-DPCA methods can detect the fault from batch 61 within which batch the fault begins. However, the model based on autodetermined ROS does the detection
842 Ind. Eng. Chem. Res., Vol. 48, No. 2, 2009
is associated with variable x2, while the one with quarter plane cannot tell the true reason of the fault. Summarily, the incorrect structure of ROS can give distorted correlation information among process variables. This not only increases the chances of miss alarm but also affects diagnosis results. The ROS autodetermination method for 2-D-DPCA can determine the ROS more reasonably and solve such problems quite well. 8. Conclusions
Figure 5. Monitoring for a fault of correlation structure changing by 2-DDPCA with quarter-plane ROS (Solid line: 99% control limit; Dash line: 95% control limit).
2-D-DPCA is a modeling method that can capture both time-wise and batch-wise dynamics of batch processes. A proper way of determining the support region (ROS) has shown to be important to guarantee monitoring efficiency and diagnosis accuracy. The idea of variable selection has been adopted in this work to solve the ROS determination problem nicely. The proposed ROS autodetermination method is data-driven, and has good potential to be applied on different batch processes, since no prior process knowledge is needed. Simulation results show the benefits of 2-D-DPCA with the proposed ROS determination method in both fault detection and diagnosis. Acknowledgment This work was supported in part by Hong Kong Research Grant Council, under project number 613107. Literature Cited
Figure 6. SPE statistic contribution plot of Batch 61 based on 2-D-DPCA with autodetermined ROS (Solid line: 99% control limit; Dash line: 95% control limit).
Figure 7. SPE statistic contribution plot of Batch 61 based on 2-D-DPCA with quarter-plane ROS (Solid line: 99% control limit; Dash line: 95% control limit).
job much better. Furthermore, the difference in fault diagnosis results indicates another important benefit of the 2-D-DPCA method with autodetermined ROS. The SPE contribution plots for current variables (x1(i,k), x2(i,k), x3(i,k)and x4(i,k)) in batch 61 are drawn in Figures 6 and 7, respectively, for the method with autodetermined ROS and quarter-plane ROS, based on the Westerhuis method.23 The 2-D-DPCA model with autodetermined ROS shows clearly and correctly that the fault
(1) Nomikos, P.; MacGregor, J. F. Monitoring of batch processes using multiway principal component analysis. AIChE J. 1994, 40, 1361. (2) Nomikos, P.; MacGregor, J. F. Multiway partial least squares in monitoring batch processes. Chemom. Intell. Lab. Syst. 1995, 30, 97. (3) Wold, S.; Kettaneh, N.; Tjessem, K. Hierarchical multiblock PLS and PC models for easier model interpretation and as an alternative to variable selection. J. Chemom. 1996, 10, 463. (4) Ra¨nnar, S.; MacGregor, J. F.; Wold, S. Adaptive batch monitoring suing hierarchical PCA. Chemom. Intell. Lab. Syst. 1998, 41, 73. (5) Lu, N.; Gao, F.; Wang, F. Sub-PCA modeling and on-line monitoring strategy for batch processes. AIChE J. 2004, 50, 255. (6) Chen, J.; Liu, K. C. On-line batch process monitoring using dynamic PCA and dynamic PLS models. Chem. Eng. Sci. 2002, 57, 63. (7) Lee, J. H.; Dorsey, A. W. Monitoring of batch processes through state-space models. AIChE J. 2004, 50, 1198. (8) Flores-Cerrillo, J.; MacGregor, J. F. Multivariate monitoring of batch processes using batch-to-batch information. AIChE J. 2004, 50, 1219. (9) Lu, N.; Yao, Y.; Gao, F. Two-dimensional dynamic PCA for batch process monitoring. AIChE J. 2005, 51, 3300. (10) Jackson, J. E. A User’S Guide to Principal Components; Wiley: New York, 1991. (11) Aksasse, B.; Radouane, L. Two-dimensional autoregressive (2-D-AR) model order estimation. IEEE Trans. Signal Proc. 1999, 47, 2072. (12) Etchison, T. L. Model Identification and Selection Techniques for Stationary ARMA(p1,q1) × ARMA(p2,q2) Processes, PhD thesis, Department of Statistics, North Carolina State University, Raleig, 1993. (13) Etchison, T. L.; Pantula, S. G.; Brownie, C. Partial autocorrelation function for spatial processes. Stat. Prob. Lett. 1994, 21, 9. (14) Box, G. E. P.; Jenkins, G. M.; Reinsel, G. C. Time Series Analysis: Forecasting and Control, 3rd ed.; Prentice Hall: Englewood Cliffs, N. J., 1994. (15) Gauchi, J. P.; Chagnon, P. Comparison of selection methods of explanatory variables in PLS regression with application to manufacturing process data. Chemom. Intell. Lab. Syst. 2001, 58, 179. (16) Lazraq, A.; Cle´roux, R.; Gauchi, J. P. Selecting both latent and explanatory variables in the PLS1 regression model. Chemom. Intell. Lab. Syst. 2003, 66, 117. (17) Chong, I. G.; Jun, C. H. Performance of some variable selection methods when multicollinearity is present. Chemom. Intell. Lab. Syst. 2005, 78, 103.
Ind. Eng. Chem. Res., Vol. 48, No. 2, 2009 843 (18) Gauchi, J. P. Utilisation dela re´gression PLS pour l’analyse des plans d’expe´riences en chimie de formulation. ReV. Stat. Appl. 1995, XLIII, 65. (19) Kleinbaum, D. G.; Kupper, L. L.; Muller, K. E.; Nizam, A. Applied Regression Analysis and Other MultiVariable Methods, 3rd ed.; Duxbury Press: Pacific Grove, CA, 1997. (20) Boggia, R.; Forina, M.; Fossa, P.; Mosti, L. Chemometrics study and validation strategies in the structure-activity relationships of a new class of cardiotonic agents. Quant. Struct.-Act. Relatsh. (QSAR) 1997, 16, 201. (21) Akaike, H. A new look at the statistical model identification. IEEE Trans. Automatic Control 1974, 19, 716.
(22) Wold, S. Cross-validatory estimation of the number of components in Factor and principal components models. Technometrics 1978, 20, 397. (23) Westerhuis, J. A.; Gurden, S. P.; Smilde, A. K. Generalized contribution plots in multivariate statistical process monitoring. Chemom. Intell. Lab. Syst. 2000, 51, 95.
ReceiVed for reView May 22, 2008 ReVised manuscript receiVed October 10, 2008 Accepted October 17, 2008 IE800825M