7858
Ind. Eng. Chem. Res. 2010, 49, 7858–7869
Multivariate Statistical Process Monitoring Based on Statistics Pattern Analysis Jin Wang*,† and Q. Peter He‡ Department of Chemical Engineering, Auburn UniVersity, Auburn, Alabama 36849, and Department of Chemical Engineering, Tuskegee UniVersity, Tuskegee, Alabama 36088
In this work, a new multivariate method to monitor continuous processes is developed based on the statistics pattern analysis (SPA) framework. The SPA framework was proposed recently to address some challenges associated with batch process monitoring, such as unsynchronized batch trajectories and multimodal distribution. The major difference between the principal component analysis (PCA) based and SPA-based fault detection methods is that PCA monitors process variables while SPA monitors the statistics of process variables. In other words, PCA examines the variance-covariance of the process variables to perform fault detection while SPA examines the variance-covariance of the process variable statistics (e.g., mean, variance, autocorrelation, cross-correlation, etc.) to perform fault detection. In this paper, a window-based SPA method is proposed to address the challenges associated with continuous processes such as nonlinear process dynamics. First, the details of the window-based SPA method are presented; then the basic properties of the SPA method for fault detection are discussed and illustrated using a simple nonlinear example. Finally, the potential of the windowbased SPA method in monitoring continuous processes is explored using two case studies (a 2 × 2 linear dynamic process and the challenging Tennessee Eastman process). The performance of the window-based SPA method is compared with the benchmark PCA and DPCA methods. The monitoring results clearly demonstrate the superiority of the proposed method. 1. Introduction The increasing demand for safer and more reliable systems in modern process operations leads to the rapid development of process monitoring techniques. By promptly detecting process upsets, equipment malfunctions, and other special events, online process monitoring not only plays an important role in ensuring process safety, but also improves process efficiency and product quality. With a large number of variables measured and stored automatically by distributed control systems (DCS), multivariate statistical process monitoring approaches have been developed and successfully applied to monitor various industrial processes. Specifically, the use of principal component analysis (PCA) for process monitoring has gained wide application in the chemical and petrochemical industries.1-4 PCA projects process data onto a lower-dimensional space that contains the most variance of the original data and accounts for correlations among different variables. Therefore, it can easily handle high dimensional, noisy, and highly correlated data generated from chemical processes, and provide superior performance compared to univariate statistical monitoring methods such as Shewhart, CUSUM, and EWMA charts. In addition, the PCA-based process monitoring methods are attractive because they only require a good historical data set of normal operation, which are easily available for the computer-controlled industrial processes. Although PCA-based monitoring methods have been extremely successful in many applications, there are cases where they do not perform well. First, PCA is a second-order method, which only considers the mean and variance-covariance of the data. Therefore, PCA lacks the capability of providing higherorder representations for non-Gaussian data, which is often the case for industrial data.5,6 In addition, the control limits of Hotelling’s T2 and the squared prediction error (SPE) charts * To whom correspondence should be addressed. Tel.: (334) 8442020. Fax: (334) 844-2063. E-mail: wang@auburn.edu. † Auburn University. ‡ Tuskegee University.
are developed based on the assumption that the latent variables follow a multivariate Gaussian distribution. Therefore, when the latent variables are non-Gaussian distributed due to process nonlinearity or other reasons, using Hotelling’s T2 and SPE may be misleading.5,7 When the underlying assumptions of the PCA method are not satisfied by the industrial process data, the monitoring performance of PCA is less satisfactory. To obtain better monitoring performance for the processes where the fundamental assumptions of PCA are violated, different approaches have been proposed. Luo et al.8 apply discrete wavelet analysis (DWA) to decompose the process measurements and then apply PCA to the wavelet coefficients at different detail levels to monitor the process; Kano et al.9 define a dissimilarity index to monitor the distribution of the process time series data and to detect changes of the correlations among process variables that cannot be detected by Hotelling’s T2 and the SPE charts; more recently, several multivariate monitoring methods based on independent component analysis (ICA) have been proposed where ICA decomposes the process data into linear combinations of statistically independent components.5,10-13 Among these approaches, Kano’s approach is a second-order method similar to PCA, while DW- and ICAbased fault detection methods involve higher-order statistics implicitly. In this work, we develop a new fault detection method to monitor continuous processes based on the statistics pattern analysis (SPA) framework that we proposed recently.14 In He et al.14 the SPA framework was proposed to address the challenges associated with monitoring batch processes, especially semiconductor processes. Two specific challenges are unsynchronized batch trajectories and multimodal distribution, which require substantial data preprocessing in order to apply the traditional PCA-based monitoring methods. It was shown that the proposed SPA method for batch processes not only eliminates the data preprocessing steps that are difficult to automate, such as trajectory shift and alignment, but also provides superior monitoring performance compared to the
10.1021/ie901911p 2010 American Chemical Society Published on Web 03/04/2010
Ind. Eng. Chem. Res., Vol. 49, No. 17, 2010
7859
Figure 1. Schematic of the window-based SPA method for monitoring continuous processes: (a) original process data; (b) computed statistics patterns; (c) fault detection. A block or a window of process variables shaded in (a) is used to generate an SP shaded in (b), which is then used to generate a point of dissimilarity measure shaded in (c) to perform fault detection.
tradition PCA method. Compared to the semiconductor processes which typically exhibit weak dynamics, continuous processes present different challenges for process monitoring, such as strong process dynamics and nonlinearity that makes the traditional PCA-based method less effective. In this work, we explore the potential of the SPA framework in monitoring continuous processes by proposing a window-based approach based on the SPA framework. The major difference between the traditional PCA-based and the proposed SPA-based fault detection methods is that PCA monitors the process variables while SPA monitors various statistics of the process variables. In other words, in PCA singular value decomposition (SVD) is applied to the original process variables to build the model for normal operation, and the new measurements of the process variables are projected onto the PCA model to perform fault detection; while in SPA, SVD is applied to various statistics of the process variables to build the model, and the statistics calculated using the new measurements are projected onto the model to perform fault detection. In this way, different statistics that capture the different characteristics of the process can be selected to build the model for normal process operation, and various higherorder statistics can be utilized explicitly. In this work, we show that the SPA-based fault detection method provides superior performance compared to the traditional PCA-based methods. The remainder of the paper is organized as follows. In section 2, the traditional PCA-based monitoring method is briefly reviewed. Section 3 presents the SPA framework and the details of the proposed process monitoring method. In addition, a simple nonlinear process is used to illustrate the advantage of the proposed method. In section 4, two simulation examples are used to demonstrate the performance of the proposed method: a 2 × 2 linear dynamic system and the Tennessee Eastman process (TEP). Conclusions are given in section 5. 2. Principal Component Analysis (PCA) PCA and its application in process monitoring have been studied extensively, and a brief review is given here. Let X∈R m×n denote the raw data matrix with n samples (rows) and m variables (columns). X is first scaled to zero mean for covariance-based PCA and further to unit variance for correlation-based PCA. By either the NIPALS15 or the SVD algorithm, the scaled matrix X is decomposed as follows: ˜ ) TPT + T˜P˜T ) [TT˜][PP˜]T X ) TPT + X
(1)
where T∈R n×l and P∈R m×l are the score and loading matrices, respectively. The PCA projection reduces the original set of m variables to l principal components (PCs). For fault detection on a new sample vector x, Hotelling’s T2 and the SPE charts are commonly used.
Hotelling’s T2 is a measure of the variation in principal component subspace (PCS): T2 ) xTPΛ-1PTx
(2)
where Λ is the diagonal matrix of the l largest eigenvalues of XXT. The T2 statistic forms a hyperellipsoid, which represents the joint limits of variations that can be explained by a set of common causes. For a given significance level R, the process is considered normal if T2 e TR2 where the upper control limit TR2 can be obtained using the F-distribution.16 The SPE statistic indicates how well each sample conforms to the model, measured by the projection of the sample vector onto the residual subspace (RS): SPE ) |x˜ | 2 ) |(I - PPT)x| 2
(3)
The process is considered normal if SPE e δ2R, where δ2R denotes the upper control limit for SPE with a significance level R. δR2 can be computed by assuming that the process data follow a normal distribution.17 Note that the thresholds of SPE and T2 can also be determined empirically using the validation data,18,19 which is often the case in the industrial applications. 3. Statistics Pattern Analysis (SPA) In our recent work,14 the SPA framework was proposed to monitor batch processes. It is shown by He et al.14 that SPA not only eliminates the need for various data preprocessing steps, such as trajectory shift and alignment, but also delivers superior performance compared to the multiway PCA method1,20 and the FD-kNN method (fault detection method based on k-nearestneighbor rule).21 In this work, we extend the SPA framework to monitor continuous processes. The basic idea of the SPAbased process monitoring is that the process statistics under abnormal conditions would show some deviation from the distribution of the process statistics under normal operation. Therefore, in the SPA framework, the process behavior is characterized by different statistics of the process variables, instead of by the process variables themselves. In other words, the SPA-based fault detection method monitors the variancecovariance structure of the process statistics, instead of the variance-covariance structure of the process variables. As shown in Figure 1, two steps are involved in the SPAbased monitoring for continuous processes: statistics pattern (SP) generation and dissimilarity quantification. For a continuous process, an SP is a collection of various statistics calculated from a window (or a segment) of the process measurements. These statistics capture the characteristics of each individual variable (such as mean, variance, and skewness), the interactions among different variables (such as correlation), as well as process dynamics (such as autocorrelation and cross-correlation). Note that, for different processes, different statistics can be
7860
Ind. Eng. Chem. Res., Vol. 49, No. 17, 2010
selected to capture the dominant process characteristics such as dynamics and nonlinearity. After the SPs are computed from the training data, the dissimilarities among the training SPs are quantified to determine an upper control limit of the detection index. In this work, we apply PCA to quantify the dissimilarities among the training SPs and define two detection indices similar to Hotelling’s T2 and SPE. When a single or a block of new measurements becomes available for fault detection, the window is shift forward by one or multiple samples and a new SP is computed; then its dissimilarity to the training SPs is quantified and compared with the threshold to perform fault detection. 3.1. Statistics Pattern Generation. We use Xk to denote a window of process measurements, as shown below: Xk ) [x1 x2 · · · xm ] ) x1(k - w + 1) x2(k - w + 1) · · · xm(k - w + 1) x1(k - w + 2) x2(k - w + 2) · · · xm(k - w + 2) (4) · ·· l l l x2(k) x1(k) · · · xm(k)
[
]
where w is the window width and k is the time index of the last, i.e., the most recent, measurement in the window. Generally speaking, an SP consists of three groups of process statistics, i.e. S ≡ [µ|Σ|Ξ]
(5)
µ in eq 5 denotes the first order statistics, i.e., variable means, whose elements are calculated from the data contained in the window: µi )
1 w
w-1
∑ x (k - l) i
l)0
Σ in eq 5 denotes the second-order statistics, which include variance (Vi), correlation (ri), autocorrelation (rid), and crosscorrelation (ri,d j) of different variables. Vi )
1 w
w-1
∑ [x (k - l) - µ ]
2
i
i
l)0
w-1
1 ri,j ) w
∑ [x (k - l) - µ ][x (k - l) - µ ] i
i
j
j
l)0
√ViVj w-1
rdi
1 ) w-d
∑ [x (k - l) - µ ][x (k + d - l) - µ ] i
i
i
i
l)d
Vi w-1
rdi,j
1 ) w-d
∑ [x (k - l) - µ ][x (k + d - l) - µ ] i
i
j
j
l)d
√ViVj
where d denotes the time lag between the variables. Ξ in eq 5 denotes the higher-order statistics, including skewness (γi), kurtosis (κi), and other higher-order cumulants. In this work, we only consider skewness and kurtosis.
γi )
κi )
(
(
1 w 1 w
1 w 1 w
w-1
∑ [x (k - l) - µ ]
3
i
i
l)0 w-1
∑
)
3/2
[xi(k - l) - µi]2
l)0
w-1
∑ [x (k - l) - µ ]
4
i
i
l)0 w-1
∑ [x (k - l) - µ ]
2
i
l)0
i
)
2
-3
Higher-order statistics measure the extent of nonlinearity and quantify the non-Gaussianity of the probability distribution of the process variables.22 Specifically, skewness measures the asymmetry and kurtosis measures the “peakedness” of the process variable distribution. Note that, for a normal distribution, its skewness and kurtosis are both zero. With different statistics computed, the final SP for a window of measurements is obtained by stacking all selected statistics into a row vector, and the SPs obtained from different windows are augmented together to build the training matrix, as shown in Figure 1b. 3.2. Dissimilarity Quantification and Fault Detection. Once the SPs are obtained for normal training data, the next step is to quantify the dissimilarities among them and determine an upper control limit for the normal operation. Dissimilarities between different objects are usually quantified by distance- or angle-based metrics, which evaluate certain distances or angles between different objects. Most fault detection methods, including the PCA-based methods, utilize distance-based metrics. In this work, we use PCA to assess the dissimilarity among SPs obtained from normal operation data. In other words, we perform PCA on the SP training matrix (i.e., the training SPs) to determine the upper control limits. To distinguish the SPAbased fault detection indices from the traditional PCA-based fault detection indices, we use Dp and Dr to denote the T2 and SPE in the SPA framework. The process is considered normal if the dissimilarity indices are below the thresholds, i.e., Dp e TR2 and Dr e δR2 , where TR2 and δR2 denote the upper control limits for dissimilarity index in PCS and RS with a significance level R. It is worth noting that PCA is just one way to determine the similarities or dissimilarities among different samples; other methods can be implemented to obtain distance-based or anglebased similarity indices.23 Similar to the standard PCA method, the SPs obtained from the training and testing samples are first normalized or scaled to zero mean and unit variance before the SVD is performed. In addition, δR2 and TR2 in the SPA framework can be obtained in two ways. One way is to compute the thresholds based on an assumed Gaussian distribution, and the analytical expressions for δR2 and TR2 are available in refs 4 and 17. The other way is the empirical method based on the calibration or validation data under normal operation conditions.20,24 For example, a 95% confidence upper control limit can be determined as the Dr or Dp value below which 95% of the calibration samples are located. As the theoretical limits (δR2 , TR2 ) are influenced by the choice of the number of PCs, they may not provide fair comparison among different methods. Therefore, in this work, we use the empirical method to determine the upper control limits so that different methods can be compared using the thresholds obtained based on the same confidence level. [Remark 1] In the SPA framework, PCA is applied to quantify the dissimilarity among the SPs of different monitoring
Ind. Eng. Chem. Res., Vol. 49, No. 17, 2010
windows. Therefore, all the fundamental assumptions of PCA apply in the SPA framework, including the independent samples and Gaussian distributions. Because the calculation of different statistics makes use of all measurements in the window, if the two adjacent windows have large overlap, the SPs calculated from the two windows would be highly correlated, which would violate the fundamental assumption of independent samples. This limitation can be addressed through shifting the window forward by the window width, so that the two adjacent windows do not overlap. The cost associated with this strategy is that a large amount of data are needed for model building, as the number of the obtained SPs is only 1/w of the number of the original training samples. However, once the model is built, the window can be moved forward by one sample each time, instead of moving by w samples, to reduce the detection delay. [Remark 2] A statistic is simply the mean of a variable or a function (such as cross-correlation) computed based on a group of samples. If the samples are drawn independently, the central limit theorem (CLT) states that the statistic follows a Gaussian distribution asymptotically, regardless of the original variable’s distribution. As different samples are assumed to be independent and different windows do not overlap, CLT applies for the nonoverlapping window approach. Therefore, the training SPs follow a multivariate Gaussian distribution even if the original variables are non-Gaussian distributed, which is one of the reasons that the SPA method provides superior fault detection performance than PCA when monitoring batch processes. This property is discussed and verified in detail in He et al.,14 where SPA is applied to monitor semiconductor processes. In addition, due to the averaging effect in computing the statistics, the variances of the statistics are significantly reduced compared to those of the original process variables. This makes the SPA model more sensitive to the faults with small magnitude as illustrated in section 4.1.1. [Remark 3] Compared to the traditional PCA method, which only examines the variance-covariance of the process variables, the SPA method is more versatile as it makes use of various process statistics to extract process information, including the higher-order statistics which capture the process nonlinearity and non-Gaussianity. The computation load of the SPA method is higher than that of the traditional PCA method due to the calculation of various statistics. However, because the calculation of various statistics is not computationally intensive, the increase in computation load should not be an issue. 3.3. Variable Selection for the Statistics Patterns. Similar to the traditional PCA method, variable selection plays an important role in the detection performance of the SPA-based method. Here we discuss the general rules that we use to select the statistics to be included in the SPs. Note that, as this work is our initial attempt in extending the SPA framework to continuous processes, some of the rules are rather ad hoc. • Mean and Variance. The number of the means and variances is the same as the number of variables, i.e., m. Due to their general importance in characterizing the process, the means and variances of all variables are included. • Correlation. The number of the unique correlation coefficients is [m(m - 1)]/2 due to the symmetry of the correlation matrix. To reduce the number of statistics, we only include the significant ones in an SP. In this work, a correlation coefficient ri, j is selected only if |ri, j| > 0.3 for more than 70% of the training SPs. • Autocorrelation and Cross-Correlation. The numbers of the autocorrelation and cross-correlation coefficients are mdm and m(m - 1)dm, respectively, where dm is the maximum time
7861
Figure 2. Scatter plot of the process variables for the nonlinear example.
lag. Again, we only include the significant ones to reduce the number of statistics to be monitored. In this work, we require |rid| > 0.5 or |ri,d j| > 0.5 for more than 90% of the training SPs. • Skewness and Kurtosis. The number of the skewness and kurtosis statistics are the same as the number of variables. In this work, we include the skewness and kurtosis of all variables. • Other Higher-Order Statistics. No other higher-order statistics are considered in this work. However, if considered, they can be selectively included following similar rules for the second-order statistics. Note that all the selection criteria mentioned above can be adjusted based on the process. 3.4. Simple Illustrative Example. One advantage of the SPA-based method is that, by monitoring the statistics of the process variables, the effect of the process nonlinearity can be significantly reduced. This is because the SPs are asymptotically Gaussian distributed regardless of the distribution of the process variables. Therefore, when PCA is applied to the training SPs, the normal operation can be better captured, which enables better fault detection performance. This is illustrated using a simple static nonlinear example.21 The process has two variables with the following nonlinear relationship: x2 ) x12 + e
(6)
where x1 is a random variable that follows a uniform distribution between [0.5, 0.5], and e is a random variable that follows a normal distribution with variance 0.02. A total of 15 000 normal samples is simulated, with 10 000 samples for training and 5000 samples for validation. In addition, two groups of testing samples (faults A and B) are simulated, each group with 5000 samples where the first 50 samples are normal and the rest of the samples are faulty. Figure 2 shows the scatter plot of the normal and faulty samples. To avoid clutter, not all samples’ points are plotted in Figures 3 and 4. Both PCA and SPA are applied to detect the simulated faults. For the PCA-based fault detection, two PCs are selected as one PC only captures 51% of the total variance. As a result, only the T2 index is applicable for this example. In addition, the principal component subspace (PCS) is simply the rotation of the scaled original variable space. Figure 3a is the scatter plot of the scores of the normal and faulty samples in the PCS for the PCA method, together with the threshold of the T2 index corresponding to the 99% confidence level. Figure 3a shows
7862
Ind. Eng. Chem. Res., Vol. 49, No. 17, 2010
Figure 3. (a) Score plot of the normal and faulty samples in the PCS for the PCA method. (b) The T2 indices fail to detect fault A (top) and fault B (bottom). The small variances in the T2 index for fault A are due to the fact that the scores of fault A contour with the T2 threshold.
variables extracted from the SPs follow a multivariate Gaussian distribution, and there is no nonlinear relationship among latent variables. In addition, both faults are clearly distinguished from the normal samples and are located outside the normal region defined by the Dp threshold. The SPA fault detection results are shown in Figure 5, which confirm that the SPA method can detect both faults in both the principal and the residual subspaces. 4. Case Studies
Figure 4. Score plot of the SPs in the SPA method for the nonlinear example.
that both faults A and B are located within the normal operation region determined by the PCA model, and the T2 index will not be able to detect them. Figure 3b shows the fault detection results of PCA for both faults, and it confirms that the T2 index fails to detect both faults because of the nonlinear relationship between the process variables. For the SPA method, we take the nonoverlapping window approach, and the window width is set to be 50 (i.e., w ) 50). After variable selection, each SP consists of eight statistics which include the mean, variance, skewness, and kurtosis of both variables. No autocorrelation or cross-correlation is selected based on the criteria discussed in section 3.3, which reflects the fact that the process is static. In addition, as the correlation between the two variables varies a lot due to the nonlinear relationship, the correlation between x1 and x2 is not significant enough to be included in the SPs, either. With eight variables included in the SPs, four PCs are selected through crossvalidation to build the model, which captures about 70% of the total process variance. Figure 4 is the score plot of the normal and faulty samples projected onto the first two PCs of the training SPs, together with the threshold of Dp at the 99% confidence level. From Figure 4 we observe that the SPs satisfy the fundamental assumptions of PCA very well; i.e., the latent
In this section, the performance of the SPA-based fault detection method is compared with the traditional PCA and DPCA methods using two case studies. One case study is a simple 2 × 2 dynamic system, and the other is the simulated Tennessee Eastman process (TEP), which is closer to an actual chemical plant. 4.1. The 2 × 2 System. In this section, we use a simple dynamic process to test the performance of the SPA method for fault detection. This example was first introduced in ref 2 and later used by others9,25,26 to compare different fault detection methods. The state-space model of the system is given by x(k + 1) )
[
]
[
]
0.118 -0.191 1.0 2.0 x(k) + u°(k) 0.847 0.264 3.0 -4.0 (7) y(k) ) x(k) + o(k)
u°(k) is the correlated input u°(k) )
[
]
0.811 -0.226 u°(k - 1) + 0.477 0.415 0.193 0.689 w(k - 1) -0.320 -0.749
[
]
(8)
and w(k) is a random noise with zero mean and unit variance. The measured input is u(k) ) u°(k) + v(k)
(9)
The output and input measurement noises (o(k) and v(k)) are uncorrelated random noises with 0 mean and variance 0.1. Both u and y are measured and used for monitoring, while w and x are not.
Ind. Eng. Chem. Res., Vol. 49, No. 17, 2010
7863
Figure 5. Fault detection performance of the SPA method. (a) Fault A and (b) fault B. For each fault, the top figure is the Dp index and the bottom figure is the Dr index. Table 1. Monitoring Settings of Different Fault Detection Methods for the 2 × 2 System methods
variables
PCs
lags
window width
window shifting step
PCA DPCA SPA
4 12 52
3 5 14
2 2
150
50
To compare the performances of the different monitoring methods, three different types of faults are investigated: mean shifts in the first element of w, denoted by w1; a change in the system matrix A, and a change in the system matrix B. For all three methods, 10 000 normal samples are used to build the model and another 5000 samples are used for validation. Each fault data set consists of 10 000 samples, with the fault introduced after 500 samples. The settings of the different methods are listed in Table 1. For PCA, four variables (u1, u2, y1, and y2) are used to build the model and three PCs are selected based on cross-validation. For DPCA, the maximum lag is 2; therefore 12 variables are used to build the model, and the number of PCs is set to five based on cross-validation. For SPA, the window width is 150 and the maximum lag for autocorrelation and cross-correlation is 2. After variable selection, each SP consists of 52 statistics. Among them, 36 second-order statistics are selected to capture the process dynamics and the interactions among different variables. Fourteen PCs are selected to build the model based on cross-validation. In order to test how well the SPA method works for the overlapped window approach, in this case study each time the window is shifted forward by 50 samples. In other words, two-thirds of the samples in two adjacent windows are the same, while the other third of the samples are different. For all three methods, the thresholds of different indices are determined empirically through the use of the validation data, and the confidence level is set to 99%, which corresponds to 1% false alarm. The detection rates of different methods for all faults are summarized in Table 2, with details discussed below. In addition, the false alarm rates of all methods for an independent normal operating data set are listed in Table 3. The results indicate that the thresholds determined empirically using validation data (with 99% confidence level) are consistent. 4.1.1. Fault 1: Mean Shift in w1. Two faults with different magnitudes are studied. In fault 1A, the mean of w1 is changed from 0 to 2, and for fault 1B, the change of the mean is from
Table 2. Fault Detection Rates (percentage) of PCA, DPCA, and SPA for the 2 × 2 System PCA
DPCA
SPA
fault
T2
SPE
T2
SPE
Dp
Dr
1A 1B 2 3
7.3 37.9 1.9 2.2
1.5 12.3 1.0 3.9
8.3 50.5 1.8 2.1
2.2 8.6 3.4 14.0
99.5 99.5 90.5 99.5
99.5 99.5 96.0 99.5
Table 3. False Alarm Rates (percentage) of PCA, DPCA, and SPA for the 2 × 2 System PCA 2
DPCA 2
index
T
SPE
T
training data normal testing data
1.1 1.0
1.1 0.9
0.9 0.9
SPA
SPE
Dp
Dr
0.9 0.9
0.8 1.0
0.5 0.5
0 to 4. Note that these faults are not as easy to detect as they might appear, because w is a random noise and is not measured directly. From Table 2 we see that, for fault 1A, both PCA and DPCA have difficulties in detecting the fault, with detection rates lower than 10%. This is mainly due to the relatively small change in the unmeasured variable w1. For fault 1B, with the increased fault magnitude, the detection rates of PCA and DPCA both increase noticeably. However, about half of the faulty samples are still not detected. On the other hand, the SPA method was able to detect both faults successfully, with detection rates of almost 100%. As the SPA method is a window-based approach, there are detection delays associated with it. In this case study, as the window is shifted by 50 samples each time, the detection delays were 50 samples for both faults. Note that it is solely the detection delay that accounts for the 0.05% missed detection; therefore, the detection rates of SPA in Table 2 can be further improved if each time the monitoring window is shifted forward by one sample. Figures 6 and 7 compare the detection performances of the DPCA and SPA methods for both fault 1A and fault 1B. These figures clearly demonstrate that the SPA method is able to detect subtle faults that are difficult to detect by the PCA and DPCA methods. Note that, in order to show the details, only segments of the process data are plotted for DPCA. 4.1.2. Fault 2: Change in the System Matrix A. In this fault, the eigenstructure of the system, i.e., the location of
7864
Ind. Eng. Chem. Res., Vol. 49, No. 17, 2010
Figure 6. Detection of fault 1A: (a) DPCA T2 (top) and SPE (bottom); (b) SPA Dp (top) and Dr (bottom).
Figure 7. Detection of fault 1B: (a) DPCA T2 (top) and SPE (bottom); (b) SPA Dp (top) and Dr (bottom).
the system poles, is changed as in.26 The system matrix A is changed from [0.118 -0.191; 0.847 0.264 ] to [0.21 -0.09; 0.44 0.36 ]. Figure 8a plots the original and new locations of the system poles. Such faults in eigenstructure changes may not be very relevant in chemical processes, but they are very important in monitoring mechanical structures subject to vibrations, such as buildings subject to wind or earthquake and turbines subject to steam turbulence.27-29 It has been noticed that eigenstructure changes do not necessarily result in changes in the means and variances of the input and output variables; therefore they are very difficult to be detected by the PCA T2 or SPE index.26,27 Figure 8b plots part of the process data where the fault was introduced at sample 501. From Figure 8 we observe that, despite the change in the system eigenstructure, neither the mean nor the variance of the measured variables shows noticeable changes. As PCA is a second-order method which only considers the mean and variance of the process data, it is expected that it would be difficult if not impossible for the PCA T2 or SPE index to detect such faults in eigenstructure change. From Table 2 we observe that the detection rates from the PCA and DPCA methods are very close to 1%. From Table 3 the false alarm rates are close to 1% for normal operating data
as well. As the thresholds for the detection indices were determined based on the 99% confidence level, the false alarm rate indicates that the threshold determined in this way is reasonable and consistent. In addition, detection rates close to 1% cannot reject the null hypothesis that there is no fault occurring in the process. For the SPA method, the fault detection rates are over 90% (90.5% from Dp and 96% from Dr) with about 1% false alarm rates, indicating that the SPA method can detect such eigenstructure changes. The detection performances of the DPCA and SPA methods are shown in Figure 9. 4.1.3. Fault 3: Change in the System Matrix B. In this fault, the system matrix B is changed. The parameter from u1 to x2, i.e., B21, is changed from 3.0 to 5.0. Again, no obvious changes were observed in the process data after the fault was introduced. Figure 10 compares the detection performances of the DPCA and SPA methods. Similar to the case of fault 2, SPA was able to detect the fault successfully with minimum delay. 4.1.4. Summary. This case study shows that, by monitoring the SPs, the SPA method is more sensitive compared to the traditional PCA-based methods, and is capable of detecting faults with small magnitude without increasing the false alarm rate.
Ind. Eng. Chem. Res., Vol. 49, No. 17, 2010
7865
Figure 8. (a) Locations of system poles: original (+) and new (×). (b) Time series plot of the process variables (dash-dotted lines indicate when the fault was introduced).
Figure 9. Detection of fault 2: (a) DPCA T2 (top) and SPE (bottom); (b) SPA Dp (top) and Dr (bottom).
Also, by involving higher-order statistics, the SPA method can detect faults that cannot be detected by PCA, such as changes in the system eigenstructure. In addition, the case study indicates that the performance of the SPA method may not be highly sensitive to the correlated SPs obtained in the overlapping window approach. It is worth noting that, as the SPA for continuous process is a window-based approach, the monitoring performance will be affected by the window width, and detection delay may be introduced. However, our experience shows that delay is not always associated with the SPA approach, and there is no simple correlation between the window width and monitoring performance. This may be due to the fact that with a wider window, the computed statistics will have smaller variance which results in a tighter model and may improve the monitoring performance; at the same time, the contributions from the new measurements will be reduced which may degrade the monitoring performance. One way to minimize detection delay without sacrificing detection performance is to build the SPA model using nonoverlapping windows of SPs while moving the window one
sample at a time during the detection phase. A thorough investigation on the effect of window width is currently underway. 4.2. The Tennessee Eastman Process. The Tennessee Eastman process simulator has been widely used by the process monitoring community as a realistic example to compare various approaches.2,24,30 The simulator was developed by Downs and Vogel31 originally. The process consists of five major unit operations: a reactor, a product condenser, a vapor-liquid separator, a recycle compressor, and a product stripper. Four reactants A, C, D, and E plus the inert B are fed to the reactor to generate products G and H, as well as byproduct F through two exothermic reactions. The diagram of the process is given in Figure 11. In addition, the plantwide control structure recommended by Lyman and Georgakis32 was implemented to generate the closed-loop process data.24 The simulator can simulate normal process operation together with 21 faulty conditions, and the process data are collected at a sampling interval of 3 min. The data were generated by Chiang et al.24 and can be downloaded from http://brahms.scs.uiuc.edu. The process data include 11 manipulated variables, 22 continu-
7866
Ind. Eng. Chem. Res., Vol. 49, No. 17, 2010
Figure 10. Detection of fault 3: (a) DPCA T2 (top) and SPE (bottom); (b) SPA Dp (top) and Dr (bottom).
Figure 11. Process layout of the Tennessee Eastman process.
ous process measurements, and 19 composition measurements which are sampled less frequently. The complete list of variables is given in Table 4. In this work, we consider two monitoring schemes. In the first scheme, similar to that of Lee et al.,5 22 continuous process measurements and 11 manipulated variables are used to monitor the process, while in the second scheme, all 52 variables are used for monitoring. The information on the programmed faults are listed in Table 5. For all three methods, 960 normal samples are used to build the model, another 480 normal samples are used for validation, and each fault data set contains 960 samples with the fault introduced after sample 160. The settings of different methods are listed in Table 6. Note that, for each method, the same numbers of PCs are chosen for
two monitoring schemes, as they showed the optimal performance. For PCA, nine PCs are selected through cross-validation; for DPCA, the maximum lag is 2, as suggested by refs 24 and 26, and the number of PCs is 20; for SPA, the maximum lag is 2, similar to DPCA, and a total 265 statistics for scheme 1 and 380 statistics for scheme 2 are selected to construct the SP. Due to the limited amount of the training data, the window width is set to 50, and each time the window is shifted forward by one sample only. Therefore, any two adjacent windows are largely the same, with only one sample different from each other. Such highly overlapped windows will result in SPs that are highly correlated, which violates the fundamental assumption on independent samples and may deteriorate the performance of the SPA method. The thresholds of different detection indices
Ind. Eng. Chem. Res., Vol. 49, No. 17, 2010 Table 4. Monitored Variables in the Tennessee Eastman Process no.
variable description
no.
4 5 6 7 8 9 10
total feed (stream 4) recycle flow (stream 8) reactor feed rate (stream 6) reactor pressure reactor level reactor temperature purge rate (stream 9)
11 product separator temperature
12 product separator level 13 product separator pressure 14 product separator underflow (stream 10) 15 stripper level 16 stripper pressure 17 stripper underflow (stream 11) 18 stripper temperature 19 stripper steam flow 20 compressor work 21 reactor cooling water outlet temperature 22 separator cooling water outlet temperature
Manipulated Variables 23 D feed flow valve (stream 2)
25 26 27 28
29 separator pot liquid flow valve (stream 10) E feed flow valve (stream 3) 30 stripper liquid product flow valve (stream 11) A feed flow valve (stream 1) 31 stripper steam valve total feed flow valve (stream 4) 32 reactor cooling water flow compressor recycle valve 33 condenser cooling water flow purge valve (stream 9)
34 35 36 37 38 39 40 41 42 43
component component component component component component component component component component
24
Composition Measurements A (stream 6) B (stream 6) C (stream 6) D (stream 6) E (stream 6) F (stream 6) A (stream 9) B (stream 9) C (stream 9) D (stream 9)
44 45 46 47 48 49 50 51 52
component component component component component component component component component
E (stream 9) F (stream 9) G (stream 9) H (stream 9) D (stream 11) E (stream 11) F (stream 11) G (stream 11) H (stream 11)
Table 5. Process Faults for the Tennessee Eastman Process no.
description
1
A/C feed ratio, B composition constant (stream 4) B composition, A/C feed ratio constant (stream 4) D feed temperature (stream 2) reactor cooling water inlet temperature condenser cooling water inlet temperature A feed loss (stream 1) C header pressure loss-reduced availability (stream 4) A, B, C feed composition (stream 4) D feed temperature (stream 2) C feed temperature (stream 4) reactor cooling water inlet temperature condenser cooling water inlet temperature reaction kinetics reactor cooling water valve condenser cooling water valve unknown unknown unknown unknown unknown The valve for stream 4 was fixed at the steady state position.
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Table 6. Monitoring Settings of Different Fault Detection Methods
variable description
Process Measurements: 1 A feed (stream 1) 2 D feed (stream 2) 3 E feed (stream 3)
type step step step step step step step
7867
methods
variables
PCs
lags
window width
window shifting step
PCA DPCA SPA
33 or 52 99 or 156 265 or 380
9 20 6
2 2
50
1
Table 7. Fault Detection Rates (percentage) of PCA, DPCA, and SPA for the TEP (with 33 Variables) PCA
DPCA
SPA
fault
T2
SPE
T2
SPE
Dp
Dr
1 2 4 5 6 7 8 10 11 12 13 14 16 17 18 19 20 21
99.1 98.6 14.6 27.6 99.4 48.9 97.1 40.3 31.3 97.9 94.1 85.5 24.4 77.5 90.0 3.0 41.5 38.1
99.8 96.3 100.0 21.6 100.0 100.0 91.4 24.3 76.5 92.3 95.3 100.0 24.3 94.3 90.1 40.0 49.9 48.6
99.5 98.9 17.7 30.6 99.6 63.6 97.7 39.0 36.9 99.5 94.4 98.5 19.1 81.2 89.8 4.4 40.2 41.5
100.0 97.6 100.0 21.8 100.0 100.0 95.6 31.1 89.0 93.7 95.6 100.0 31.5 97.0 90.0 72.3 55.6 50.6
99.0 97.3 99.9 98.0 100.0 99.8 100.0 79.5 99.1 100.0 92.3 99.8 76.9 98.6 100.0 95.9 93.5 99.9
98.8 97.8 99.9 99.8 99.9 99.9 96.9 73.9 99.1 99.6 93.5 99.8 79.7 96.9 89.3 96.3 89.0 90.8
Table 8. Fault Detection Rates (percentage) of PCA, DPCA, and SPA for the TEP (with 52 Variables) PCA
DPCA
SPA
fault
T2
SPE
T2
SPE
Dp
Dr
1 2 4 5 6 7 8 10 11 12 13 14 16 17 18 19 20 21
99.1 98.6 16.1 27.9 99.0 44.4 97.1 39.1 31.8 97.9 94.1 84.4 21.5 78.3 89.6 2.9 37.8 37.0
99.8 98.5 98.1 27.5 100.0 100.0 97.1 24.5 70.5 96.6 95.4 100.0 19.1 92.0 90.3 23.8 45.6 45.1
99.5 98.9 14.6 28.1 99.2 78.7 97.6 39.0 30.2 99.0 94.5 94.9 19.9 79.3 89.3 3.3 37.6 41.5
99.5 98.6 100.0 27.1 100.0 100.0 97.9 23.3 82.4 97.1 95.6 100.0 19.8 95.1 90.3 31.0 51.1 42.9
98.5 97.6 100 98.0 100.0 99.6 100.0 80.2 99.0 100.0 95.9 99.5 85.2 99.4 99.6 89.3 90.6 99.1
98.6 97.8 99.8 99.6 99.8 99.8 96.3 71.9 99.0 99.5 93.5 99.6 67.6 96.8 89.2 94.6 88.5 89.8
random variation random variation random variation random variation random variation slow drift sticking sticking
constant position
are determined empirically through the use of the validation data, with a confidence level of 99%.
The detection rates of the three methods for all faults are listed in Tables 7 and 8. Comparing the performances of the two monitoring schemes, we observe that adding 19 composition measurements only changes the process monitoring performance slightly for all three methods. A possible reason for such small changes in the monitoring performance is that most of the process variation has been captured by the manipulated and process variables. Therefore, adding composition measurements only provides limited additional information with the price of additional noise to the model. In addition, faults 3, 9, and 15 have been suggested to be difficult to detect,5,9,24,26 which is also confirmed in this study. Therefore, they are not considered here. For the rest of the 18 faults, PCA and DPCA have difficulties in consistently detecting six faults (faults 5, 10, 16, 19, 20, and 21), with detection rates less than 50% in most of the cases. On the other hand, the SPA method was able to detect all 18 faults
7868
Ind. Eng. Chem. Res., Vol. 49, No. 17, 2010
Figure 12. Detection performance of DPCA and SPA for fault 5 ((a) DPCA; (b) SPA), fault 10 ((c) DPCA; (d) SPA), and fault 19 ((e) DPCA; (f) SPA). The dash-dotted lines indicate the fault onsets.
consistently, with detection rates higher than 75%. In particular, for faults 5, 10, 16, 19, 20, and 21, the detection rates of the SPA method are 2-3 times higher than those of PCA and DPCA. The monitoring results for faults 5, 10, and
19 from DPCA and SPA are shown in Figure 12. These results clearly show that the SPA method performs better than the traditional PCA and DPCA methods, even with the limited training data and the highly correlated training SPs.
Ind. Eng. Chem. Res., Vol. 49, No. 17, 2010
5. Conclusions In this work, a new fault detection method for continuous processes is developed based on the SPA framework. In the proposed approach, instead of monitoring the process variables, the statistics patterns, which consists of various process statistics, are monitored to perform fault detection. In this way, various statistics, including higher-order statistics, can be used to capture the process characteristics that cannot be captured by PCA such as nonlinearity and non-Gaussianity. With additional information other than variance-covariance structure extracted from the process data, the SPA method is able to detect faults that are difficult or cannot be detected by the traditional PCA and DPCA methods. To examine the performance of the proposed SPA method, it is applied to monitor a static nonlinear system, a dynamic linear system, and the Tennessee Eastman process, and compared with the traditional PCA and DPCA methods. These examples demonstrate that the SPA method detects various faults more efficiently than the PCA and DPCA methods. In particular, it is able to handle nonlinear processes, to detect changes in the system eigenstructure, and to detect subtle changes in various dynamic systems. This work highlights the promise of the SPA framework for process monitoring, especially for industrial processes where the nonlinearity is prevalent. However, many properties of the SPA method are still unknown and worth further investigation. For example, knowledge on how process nonlinearity is captured by different statistics will help us construct SPs rationally and optimally, and how the window width and window overlapping affect the performance of the SPA method are still not fully understood. In addition, the research on how to extend the SPA framework to fault diagnosis, as well as to enhance the detection performance by applying weightings to different statistics, is currently underway. Acknowledgment The authors thank NSF for financial support under Grants CTS-0853983 (J.W.) and CTS-0853748 (Q.P.H.). Literature Cited (1) Nomikos, P.; MacGregor, J. Monitoring of batch processes using multiway principal component. AIChE J. 1994, 40, 1361–1375. (2) Ku, W.; Storer, R.; Georgakis, C. Disturbance detection and isolation by dynamic principal component analysis. Chemom. Intell. Lab. Syst. 1995, 30, 179. (3) Kourti, T.; Lee, J.; MacGregor, J. Experiences with industrial applications of projection methods for multivariate statistical process control. Comput. Chem. Eng. 1996, 20, S745–S750. (4) Qin, S. J. Statistical process monitoring: basics and beyond. J. Chemom. 2003, 17, 480–502. (5) Lee, J.; Qin, S. J.; Lee, I.-B. Fault detection and diagnosis based on modified independent component analysis. AIChE J. 2006, 52, 3501–3514. (6) Kermit, M.; Tomic, O. Independent component analysis applied on gas sensor array measurement data. IEEE Sens. J. 2003, 3, 218–228. (7) Martin, E.; Morris, A. Non-parametric Confidence Bounds for Process Performance Monitoring Charts. J. Process Control 1996, 6, 349– 358. (8) Luo, R.; Misra, M.; Himmelblau, D. M. Sensor fault detection via multiscale analysis and dynamic PCA. Ind. Eng. Chem. Res. 1999, 38, 1489– 1495. (9) Kano, M.; Nagao, K.; Hasebe, S.; Hashimoto, I.; Ohno, H. Statistical process monitoring based on dissimilarity of process data. AIChE J. 2002, 48, 1231–1240.
7869
(10) Kano, M.; Tanaka, S.; Hasebe, S.; Hashimoto, I.; Ohno, H. Monitoring independent components for fault detection. AIChE J. 2003, 49, 969–976. (11) Lee, J. M.; Yoo, C. K.; Lee, I. B. New monitoring technique with ICA algorithm in wastewater treatment process. Water Sci. Technol. 2003, 47, 49–56. (12) Kano, M.; Hasebe, S.; Hashimoto, I.; Ohno, H. Evolution of multivariate statistical process control: application of independent component analysisand and external analysis. Comput. Chem. Eng. 2004, 28, 1157– 1166. (13) Lee, J. M.; Yoo, C. K.; Lee, I. B. Statistical process monitoring with independent component analysis. J. Process Control 2004, 14, 467– 485. (14) He, Q. P.; Wang, J. Statistics Pattern AnalysissA New Process Monitoring Framework and Its Application to Semiconductor Batch Processes. Accepted for publication in AIChE J. (15) Wold, S.; Esbensen, K.; Geladi, P. Principal component analysis. Chemom. Intell. Lab. Syst. 1987, 2, 37–52. (16) Jackson, J. E. A User’s Guide to Principal Components; WileyInterscience: New York, 1991. (17) Jackson, J. E.; Mudholkar, G. Control procedures for residuals associated with principal component analysis. Technometrics 1979, 21, 341– 349. (18) He, Q. P.; Wang, J.; Qin, S. J. A New Fault Diagnosis Method Using Fault Directions in Fisher Discriminant Analysis. AIChE J. 2005, 51, 555–571. (19) Chiang, L. H.; Russell, E. L.; Braatz, R. D. Fault Detection and Diagnosis in Industrial Systems; Springer: London, 2001. (20) Wise, B. M.; Gallagher, N. B.; Butler, S. W.; White, D.; Barna, G. G. A comparison of principal component analysis, multiway principal component analysis, trilinear decomposition and parallel factor analysis for fault detection in a semiconductor etch process. J. Chemom. 1999, 13, 379– 396. (21) He, Q. P.; Wang, J. Fault detection using k-nearest-neighbor rule for semiconductor manufacturing processes. IEEE Trans. Semicond. Manuf. 2007, 20, 345–354. (22) Choudhury, M.; Shah, S. L.; Thornhill, N. F. Diagnosis of Poor Control Loop Performance Using Higher Order Statistics. Automatica 2004, 40, 1719–1728. (23) Singhal, A.; Seborg, D. E. Pattern matching in multivariate time series databases using a moving-window approach. Ind. Eng. Chem. Res. 2002, 41, 3822–3838. (24) Russell, E. L.; Chiang, L. H.; Braatz, R. D. Fault detection in industrial processes using canonical variate analysis and dynamic principal component analysis. Chemom. Intell. Lab. Syst. 2000, 51, 81–93. (25) Kano, M.; Nagao, K.; Hasebe, S.; Hashimoto, I.; Ohno, H.; Strauss, R.; Bakshi, B. R. Comparison of multivariate statistical process monitoring methods with applications to the Eastman challenge problem. Comput. Chem. Eng. 2002, 26, 161–174. (26) Wang, J. Ph.D. Thesis, The University of Texas at Austin, Austin, 2004. (27) Basseville, M.; Abdelghani, M.; Benveniste, A. Subspace-based fault detection algorithms for vibration monitoring. Automatica 2000, 36, 101–109. (28) Yan, A.; Kerschen, G.; Boe, P. D.; Golinval, J. Structural damage diagnosis under varying environmental conditions-Part I: A linear analysis. Mech. Syst. Signal Process. 2005, 19, 847–864. (29) Fassois, S.; Sakellariou, J. Time-series methods for fault detection and identification in vibrating structures. Philos. Trans. A: Math. Phys. Eng. Sci. 2007, 365, 411–448. (30) Kano, M.; Ohno, H.; Hasebe, S.; Hashimoto, I. A new statistical process monitoring method using principal component analysis. Comput. Chem. Eng. 2001, 25, 1103–1113. (31) Downs, J. J.; Vogel, E. F. A plant-wide industrial process control problem. Comput. Chem. Eng. 1993, 17, 245–255. (32) Lyman, P. R.; Georgakis, C. Plant-wide control of the Tennessee Eastman problem. Comput. Chem. Eng. 1995, 19, 321–331.
ReceiVed for reView December 3, 2009 ReVised manuscript receiVed February 11, 2010 Accepted February 11, 2010 IE901911P