ARTICLE pubs.acs.org/IECR
Transition Process Modeling and Monitoring Based on Dynamic Ensemble Clustering and Multiclass Support Vector Data Description Zhibo Zhu,†,‡ Zhihuan Song,† and Ahmet Palazoglu*,‡ †
State Key Laboratory of Industrial Control Technology, Institute of Industrial Process Control, Zhejiang University, Hangzhou, Zhejiang 310027, People’s Republic of China ‡ Department of Chemical Engineering and Materials Science, University of California, Davis, One Shields Avenue, Davis, California 95616, United States ABSTRACT: Monitoring and management of process transitions is a critical activity in chemical plants due to increased potential for abnormal operations. This activity is often hampered by the lack of a proper approach to label the transition states. In this paper, we present a systematic framework that constructs process transition states thus facilitating their monitoring for faulty operations. To address the nonstationary and non-Gaussian characteristics of the time series data collected during the transition process, an ensemble clustering method based on dynamic k-principal component analysisindependent component analysis (k-ICA-PCA) models is proposed to enable labeling of transitions. Next, we combine a PCAbased dimension reduction with a pattern classification strategy based on multiclass support vector data description (SVDD) to achieve transition process monitoring. The Tennessee Eastman (TE) benchmark process is used as a case study to evaluate the performance of the proposed framework.
1. INTRODUCTION Multivariate statistical process control (MSPC) methods such as principal component analysis (PCA), independent component analysis (ICA), and partial least-squares (PLS) have received great attention and have been widely used in continuous, batch, dynamic, and plant-wide processes in the last few decades.19 However, MSPC approaches assume that the process operates in a single desired state. Yet due to the changes in product specifications and market demands, chemical processes often operate at different steady-states. Process transition occurs when a steady-state operation is switched to another. Some common transitions include startups, shutdowns, and grade changes.10 To address monitoring of processes with multiple operating modes, a number of methods and techniques have been proposed with varying success. Li et al.11 and Wang et al.12 presented adaptive PCA and PLS methods, respectively. As an alternative, approaches based on model matching were also introduced.1315 However, such studies largely focused on steady-state operations monitoring and are not directly applicable during process transitions.16 Chemical processes usually operate in a variety of modes where some are the steady-states, while others are process transitions which include mode changes, startups, shutdowns, and others.17 A large number of accidents happen during such transitions, and transition monitoring requires extensive planning and operator involvement.18 The papers by Bhagwat et al.10 and Srinivasan et al.19 address this issue, but their approaches are either model-based or too complex to be applied in real chemical processes. Zhao et al.20 proposed a soft transition model based on user-specified parameters; however, choosing such parameters is quite subjective. Our goal is to detect and isolate whether a known and desired process transition is being executed normally. In r 2011 American Chemical Society
practice, perfect process knowledge is often lacking to label the process transitions using historical data. Therefore, our approach calls for a clustering algorithm for pattern construction, first. To address the nonstationary and non-Gaussian characteristics of the time series data collected during the transition process, this clustering method is based on dynamic k-principal component analysis-independent component analysis (k-ICA-PCA) models to enable labeling of transitions. The labeling procedure is facilitated by three different visualization tools. As a novel framework of pattern construction and process monitoring, for multimode process modeling and monitoring, we have proposed k-ICA-PCA based bilayer ensemble clustering method for multimode construction and adjoined multi-ICA-PCA based fault detection for multimode process.21In this paper, we focus on transition process modeling and monitoring and propose a novel dynamic ensemble clustering and multiclass classification approach to accomplish this aim. The novel dynamic ensemble clustering is based on dynamic k-ICA-PCA models. Duchesne et al.22 proposed a monitoring approach based on PCA and PLS by assessing the performance of the startup readiness and production readiness for the new grade in transitions. Next, we exploit this concept to determine whether it is a good start and end of the transition within a pattern recognition framework. This framework utilizes PCA as a dimension reduction tool and the multiclass support vector data descriptions (SVDD) for pattern classification to complete the transition process monitoring methodology.
Received: August 11, 2011 Accepted: November 7, 2011 Revised: November 2, 2011 Published: November 07, 2011 13969
dx.doi.org/10.1021/ie201792r | Ind. Eng. Chem. Res. 2011, 50, 13969–13983
Industrial & Engineering Chemistry Research
ARTICLE
Figure 1. Moving window strategy.
The rest of the paper is organized as follows: overview of proposed approach is introduced in the next section. In Section 3, we propose a novel dynamic ensemble clustering algorithm which is used for transition pattern construction. Multiclass SVDD-based transition process monitoring is presented in Section 4. In Section 5, we demonstrate the proposed modeling and monitoring framework using the Tennessee Eastman (TE) benchmark process. Conclusions and future research work are presented in Section 6.
2. OVERVIEW OF THE PROPOSED APPROACH Naturally, the monitoring performance is closely related with the classification quality of the time series data set. Traditional data classification/labeling approaches such as the well-known k-means clustering23 are usually intended for independent samples and result in poor discrimination performance for autocorrelated data. As addressed in our previous work,24 k-PCA models clustering algorithm can deal with autocorrelated and cyclic process data. However, transition processes exhibit nonstationary and non-Gaussian characteristics. Ge and Song proposed an ICA-PCA based fault detection technique8 but assumed that the process has only one normal operating mode at steady-state. Here, to improve the performance of the k-PCA models clustering, a novel ensemble clustering algorithm based on dynamic k-ICA-PCA models is proposed. In this method, nonstationarity is handled by introducing lagged variables into the time series data matrix, while the nonGaussianity is addressed by applying ICA-PCA in a two-step feature extraction process. After constructing the transition patterns, the start and end portions of every transition would be well-defined. Then a PCA-based dimension reduction and multiclass SVDD-based pattern classification are presented to facilitate transition process monitoring. PCA can transform the noisy, high dimensional data set into a lower dimensional subspace.25 As a one class classification algorithm, SVDD only needs information on the target class for modeling.26,27 In this paper, the original SVDD algorithm is expanded to cover multiclass classification problem. Outliers have a strong influence on projection methods and clustering algorithms. In this paper, we focus on the transition modeling and monitoring with an assumed good quality data set, and do not consider this issue explicitly. We assume that the outliers and bad data points would be eliminated during a data preprocessing step. Moreover, the data collected during a transition process would also often have missing points, data alignment issues, and run-to-run variations. Missing data and alignment issues are usually handled during data preprocessing.
Missing data would indicate that the modeling data matrix is incomplete, and an adaptive or a step-by-step strategy is traditionally used. Data alignment, in the context of batch operations, implies that each batch may have a different duration. To synchronize the batches, dynamic time warping16 is often applied. As the amount of off-grade material and safety conditions vary during each transition, one can argue that there are some transitions between modes that may be more desirable than others. Therefore, it would be logical and reasonable to construct these transition process monitoring models based on the best transition data sets22 or expected transitional trajectories. In this paper, however, to maintain simplicity, these issues are omitted, and more detailed descriptions can be found in the relevant articles.
3. TRANSITION PATTERN CONSTRCUTION BASED ON DYNAMIC ENSEMBLE CLUSTERING Before the dynamic ensemble clustering algorithm is introduced, the moving window strategy is presented first. The diagram of this strategy is shown in Figure 1. P represents the size of the moving window, and L is the window moving rate. It is obvious that P captures the time scale of the process mode, while L determines the extent of overlap between different windows. The selection of P and L is an important decision and requires good insight into the dynamic features of the process. As expected, there is a compromise between the computational load and the accuracy of the pattern matching.28 A smaller window size and moving rate can be much more accurate; however, that selection would be computationally expensive and can lead to overly sensitive results. Meanwhile, limiting the total number of cluster members, a larger window size and moving rate can yield a significant reduction in the computational load but would lead to a coarse and even poor pattern construction result and may miss informative events happening at smaller time scales. Due to the autocorrelated nature of time series data, Ku et al.1 proposed the dynamic PCA (DPCA) where one introduces lagged variables into the modeling data matrix. Since the dynamic data matrix contains variables at the different time lags, the DPCA model can capture dynamic features of the data set. ICA is an alternative algorithm to search for non-Gaussian data features leads to a set of statistically independent components. In this paper, we take advantage of PCA and ICA and use a two-step feature extraction approach on the dynamic data matrix. The non-Gaussian information is extracted by ICA first, and then, the remaining Gaussian features will be modeled by PCA. The dynamic k-ICA-PCA models ensemble clustering approach is presented next. 13970
dx.doi.org/10.1021/ie201792r |Ind. Eng. Chem. Res. 2011, 50, 13969–13983
Industrial & Engineering Chemistry Research
ARTICLE
Figure 2. Flowchart of transition process modeling and monitoring.
Figure 3. The Tennessee Eastman Process flowsheet.32
3.1. Dynamic k-ICA-PCA Models Ensemble Clustering. Assume a normalized data matrix Xi ∈ Rmn, where m is
the number of samples, and n is the number of monitored variables. To capture the dynamic features, Xi is expanded by 13971
dx.doi.org/10.1021/ie201792r |Ind. Eng. Chem. Res. 2011, 50, 13969–13983
Industrial & Engineering Chemistry Research
ARTICLE
concatenating K lagged copies of variables to form the dynamic data matrix XD i X Di ¼ ½X i , qX i , q2 X i , ::::, qK X i
The dynamic ensemble clustering algorithm attempts to optimize an objective function which describes the process data (as shown in Figure 1, time series data are captured in the form of a window) partitioning in an iterative manner. Initialization is a significant step for all iterative methods, since poor initialization often leads to suboptimal solutions. To obtain a globally optimal clustering result, as developed in our previous work,24 an ensemble strategy is adopted. There are two aspects to consider. First, an ensemble of randomly initialized windows is generated for each k, where k is the number of clusters. Next is aggregating the solutions into a single, reproducible solution after obtaining the clustering results with different k’s. Since there is usually no perfect prior knowledge of the steady-states or the distinct modes of a chemical process, aggregating all the clustering results obtained by a range of k’s would help us create a reasonable solution set that would include all possible pattern classifications. For each iteration, a dynamic data matrix XD s is obtained for each cluster s. Samples in XD s comprise the windows currently ^ s is calculated assigned to the cluster s. The PCA residual matrix E using eqs 2-4. Subsequently, a cluster prototype ICA-PCA model is obtained. D can be projected onto the s Any new data matrix XPn new dynamic ICA-PCA clustering model to estimate the model residual. Hereby, a measurement of the degree of model fitness Sf is presented
ð1Þ
where q is the backshift operator by which every entry of Xi shifts backward by one sampling interval in time. Thus, we have mnD , with nD = n(K+1) columns. XD i ∈ R First, the dynamic data matrix is decomposed by ICA as29,30 T ~ X Di ¼ A 3 S þ E
ð2Þ
where A ∈ RnDp is the mixing matrix, p(p e nD) represents the number of independent components, S ∈ Rpm is the indepen~ ∈ RnDm is the residual matrix of dent component matrix, and E the ICA model. To estimate A and S, we find the linear transformation that makes the rows of the reconstructed matrix ^S become as independent of each other as possible. Suppose W is the demixing matrix, then ^S is expressed as ^S ¼ W X
ð3Þ
After estimating A, its inverse yields W. The detailed description of ICA can be found in Hyv€arinen and Oja29 and Lee et al.30 PCA is applied in the second step to extract features of the ~ as the input to the PCA model. Gaussian part. Consider E Suppose that the first q PCs are determined to build the model, ~ is given as under the structure of PCA,31 E ~T ¼ E
q
t i 3 pTi ∑ i¼1
^ þ E
Ssf ¼
ð4Þ
Table 1. Monitoring Variables in the TE Process measured variables
1
A feed
2 3
D feed E feed
4
A and C feed
5
recycle flow
6
reactor feed rate
7
reactor temperature
8
purge rate
9
product separator temperature
10 11
product separator pressure product separator underflow
12
stripper pressure
13
stripper temperature
14
stripper steam flow
15
reactor cooling water outlet temperature
16
separator cooling water outlet temperature
nD
ð5Þ
For a data matrix of window l X(l), the residuals of the s dynamic ICA-PCA models can be calculated by eq 5. Suppose that M is the total number of windows, then the objective function describing process data partitioning is expressed as
^ is where ti are the score vectors, pi are the loading vectors, and E the PCA residual matrix.31
no.
P
∑ ∑ E^2s, ij i¼1 j¼1
J ¼
M
ðlsÞ Sf ∑ min s l¼1
ð6Þ
When no further reassignments can be made, the iterative algorithm will converge to the final clustering membership. 3.2. Visual Tools for Ensemble Clustering Solution. As discussed in previous sections, different cluster numbers and clustering runs are to be aggregated into a single, reproducible solution. The procedure for ensemble clustering solutions is similar to our previous research work, and while we now briefly describe it here for completeness, more details can be found in refs 21 and 24. The ensemble of solutions e is given as e ¼ ðkd 1Þnt
ð7Þ
where kd is the maximum cluster number k, and nt is the number of clustering runs. For each k from 2 up to kd, the aggregated distance matrices can be calculated. To determine
Table 2. Three Operational Modes of Base Case in the TE Process mode
description
samples
1 2
set reactor level as 75%, reactor pressure as 2705 kPa and temperature set-point as 120.4 °C reactor level from 75% to 65%, reactor pressure form 2705 kPa to 2750 kPa and reactor temperature
1300 301600
3
reactor pressure from 2750 to 2725 kPa and reactor temperature set-point from 127.4 to 130.6 °C
set-point from 120.4 to 127.4 °C
13972
601900
dx.doi.org/10.1021/ie201792r |Ind. Eng. Chem. Res. 2011, 50, 13969–13983
Industrial & Engineering Chemistry Research
ARTICLE
Figure 4. k-PCA models clustering solutions: (a) ΔMSE; (b) window similarity; (c) the dendrogram; and (d) membership probabilities.
kd, a discrepancy mean squared error based statistic ΔMSE(kd) is presented ΔMSEðkd Þ ¼
∑ fDaijðkd þ 1Þ Daij ðkd Þg2
partitioning at a desired resolution is then determined for final classification. 3 Membership probabilities: This graph is used to obtain the cluster label of each sample (window). There are some overlaps in the windows due to the moving window strategy, which may lead to multiple cluster assignments. As can be seen from Figure 1, PL1 indicates the number times of windows each sample is assigned. Suppose nistimes is the number times of sample i assigned to cluster s; therefore, the membership probability of sample i assigned to cluster s, Pis, can be defined as
ð8Þ
i : s:t:
∑i αi ðt i, i 3 t i, j Þ ∑i, j αi αjðt i, i 3 t i, jÞ 0 e αi e Ci , ∑ αi ¼ 1 i
ð15Þ
The support vectors are these objects with the coefficients 0 < αi e Ci. The radius ri is obtained by calculating the distance from any of the support vectors to the center. A test object zi is accepted as being within the sphere when its distance to the center is smaller than the radius. Inner products of objects (ti,i 3 ti,j) can be replaced by the kernel function K(ti,i 3 ti,j), if the kernel function satisfies Mercer’s theorem. Therefore, zi is accepted when Kðzi , zi Þ 2
∑i αi Kðzi , t i, i Þ þ ∑i, j αi αj Kðt i, i 3 t i, jÞ e r2i ð16Þ
Multiclass SVDD modeling is terminated after obtaining the 2t SVDD model parameter sets PMSi. Suppose fSVDD( 3 ) is the SVDD modeling function, PMSi can be represented as PMSi ¼ fSVDD ðT i Þ
ð17Þ
There are the numbers of support vector, the transforming vectors, the radiuses, the centers, and so on of all the patterns in the parameter sets PMSi. 4.2. Online Monitoring. Since our focus is transition processes in this paper, our approach is used for online monitoring of in-transition samples but not monitoring the steady-state samples corresponding to different operating modes. For a possible in-transition sample xnew, first the score vector is obtained by projecting xnew to the 2t PCA models t new, i ¼ xnew 3 Pi
ð18Þ
And then, tnew,i is projected onto the corresponding SVDD model. Suppose that Wi is the transforming vector of the SVDD model i, the distance tnew,i to the corresponding center of the sphere in pattern i is calculated as di ¼ t new, i 3 W i
ð19Þ
Figure 7. First two features of the transition patterns based on PCA and multiclass SVDD: (a) start of transition 1; (b) end of transition 1; (c) start of transition 2; and (d) end of transition 2.
As before, an object sample is accepted as within the sphere if its distance to the center is smaller than the radius. The detection and isolation rules for transition process monitoring are as follows: Rule 1 For the new sample xnew,i which belongs to a pattern i, if it is within the sphere of SVDD model i, it means that it is successfully detected. Rule 2.1 For the new sample xnew,i which belongs to a pattern i, if and only if it is within the sphere of model i, it means that it is successfully isolated. Rule 2.2 As a soft boundary clustering algorithm, there are overlapping modeling samples in the start and end portion of each transition mode. If the new sample xnew,i, which belongs to an overlapping domain, is within the spheres of the corresponding two modes, it also means that it is successfully isolated. To evaluate the performance of the transition process monitoring method, three different evaluation statistics are used: 1 Detection rate (including missed and false alarm rates): percent of missed- and falsely detected data in the transition process samples. 2 Detection delay: number of samples detected after a delay. 3 Isolation rate: percent of transition process samples that are well isolated. The flowchart of the transition process modeling and monitoring is shown in Figure 2. The number of cluster k is determined by the dendrogram of dynamic ensemble clustering result. For each cluster, the monitoring model is built with the strategy of PCA based feature extraction and SVDD based classification, respectively. 13975
dx.doi.org/10.1021/ie201792r |Ind. Eng. Chem. Res. 2011, 50, 13969–13983
Industrial & Engineering Chemistry Research
ARTICLE
Table 3. Simulation Cases of TE Process for Transition Process Monitoring no.
case study
description
1
Hours 020 mode 1, 2040 mode 2, 4060 mode 3, and a step fault (A/C feed ratio,
good start and end of transition 1, good start and poor
B composition constant) occurs in hour 41.
end of transition 2
2
Hours 020 mode 1, 2040 mode 2, and a step fault (A/C feed ratio,
good start and poor end of transition 1
3
Hours 020 mode 1, 2040 new mode (set reactor level from 75% to 55%, reactor
B composition constant) occurs in hour 23. poor start and end of transition 1
pressure form 2705 to 2775 kPa).
Figure 8. Monitoring result of testing data 280430 in study case 1: (a) models of start (top) and end (bottom) of transition 1; (b) models of start (top) and end (bottom) of transition 2. Red circle: sample, and black number: index.
5. CASE STUDY 5.1. Process Description and Simulation Design. Tennessee Eastman (TE) Industrial Challenge Problem was created by the Eastman Chemical Company.32 As can be seen from Figure 3, there are five major unit operations in the process: a reactor, a product condenser, a product stripper, a recycle compressor, and a vapor liquid separator. The TE process has 12 manipulated variables and 41 measured variables. More detailed descriptions of the process can be found in Chiang et al.33 As discussed and also listed in Table 1,34 sixteen variables are recorded and used for modeling and monitoring. The decentralized control system of TE process proposed by Ricker35,36 is applied here, since it has the merit of less variability
in product quality and rate and long periods of on-spec operation without the feedback of composition measurements. As a benchmark simulation of a chemical process, it is widely used for evaluating the performance of process monitoring.34,3740 Process transition usually occurs at the change from one steady-state to another. Multimode operations can be well designed in the TE benchmark process, while typical transitions exist in these grade changes. Therefore, we can use it to evaluate the performance of the proposed framework. As shown in Table 2, in order to simulate the transition processes in the TE process, three different process modes of the base case are used. 5.2. Results and Discussions. Before comparing the performance of proposed clustering algorithm with k-PCA models clustering,24 some parameters should be specified first. For both algorithms, we set the moving window size P as 60, the window moving rate L as 8, the number of runs nru as 50, and the maximum iteration number niter as 100. For k-PCA models clustering, we set the number of PCs q as 6, meanwhile, for the proposed clustering algorithm, we set lagged copies of variables K as 2, the number of ICs p as 12, and the number of PCs q as 6. The k-PCA models clustering solutions including the ΔMSE, window similarity, dendrogram, and membership probability graphs are shown in Figure 4. As can be seen from Figure 4 (a), as kd increases, the ΔMSE converges to zero, hence, according to eq 8, it is reasonable to set the number of clusters kd as 6. In the chromaticity plot of Figure 4 (b), high level of window similarity is expressed through a similar coloring. The clustering performance of k-PCA models is poor, since four clusters would be roughly observed, and the middle cluster appears to contain two subclusters. The dendrogram and membership probabilities are shown in Figure 4 (c) and (d). There are a total of 106 windows, and we can find that windows 131 and 70106 are incorrectly assigned to the same class. The performance will not improve if we oversegment these windows into four clusters. Therefore, we conclude that k-PCA models clustering fails to distinguish these patterns. Correspondingly, the ΔMSE, window similarity, dendrogram, and membership probability graphs for proposed dynamic ensemble clustering algorithm are shown in Figure 5. According to Figure 5 (a), kd can be set as 6 as before. In Figure 5 (b), three clusters can be easily distinguished. These are quite distinct and clear based on chromaticity. From Figure 5 (c) and (d), it is obvious that the aggregated distances between different windows show that the three clusters are well-defined. Samples 1366 (Windows 139) belong to mode 1, samples 315598 (windows 4068) belong to mode 2, and samples 547900 (windows 69105) belong to mode 3. With three operational modes successfully clustered, process transition samples can now be labeled and collected for modeling. As seen in Figure 5 (b), the transitions are identified with circles between two adjacent modes. To obtain the samples 13976
dx.doi.org/10.1021/ie201792r |Ind. Eng. Chem. Res. 2011, 50, 13969–13983
Industrial & Engineering Chemistry Research
ARTICLE
Figure 9. Monitoring result of testing data 515600 in study case 1: (a) models of start (top) and end (bottom) of transition 1; (b) models of start (top) and end (bottom) of transition 2. Red circle: sample, and black number: index.
belonging to the process transitions, we propose a procedure based on the dendrogram in Figure 5 (c). To find the start and end of a process transition in each mode, we relabel the dendrogram of proposed dynamic ensemble clustering solution, and the result with process transitions is depicted in Figure 6. It is reasonable that the start of a transition will be similar to the previous mode, while the end of the transition will much likely resemble the subsequent mode. Based on Figure 6, the clustering results are listed as follows: samples 283366 (windows 3639) and samples 315430 (windows 4047) belong to the start and
end of transition 1. Meanwhile, samples 515598 (windows 6568) and samples 547670 (windows 6977) belong to the start and end of transition 2, respectively. Next, transition process models are constructed based on the samples of all the start and end of transitions using SVDD. Each SVDD model will represent a start or an end of a process transition. First, PCA is used to reduce data dimensionality. To determine the number of principal components, 85% of the measurement of the percent variance explained is selected. We also set the missed isolation rate as 5%, and the kernel function 13977
dx.doi.org/10.1021/ie201792r |Ind. Eng. Chem. Res. 2011, 50, 13969–13983
Industrial & Engineering Chemistry Research
ARTICLE
Figure 10. Monitoring result of testing data 601670 in study case 1: (a) models of start (top) and end (bottom) of transition 1; (b) models of start (top) and end (bottom) of transition 2. Red circle: sample, and black number: index.
parameter in SVDD is tuned to meet the requirement that the number of false isolation samples should be approximately the same as that of the support vector. The first two features of the four transition patterns based on PCA and multiclass SVDD are shown in Figure 7. The blue points represent the modeling samples, while the black curve denotes the SVDD control limit. To evaluate the performance of the multiclass SVDD-based transition process models, three case studies are selected as listed in Table 3. These include normal mode changes; a fault and a
new mode are designed to test whether the proposed model can effectively detect and isolate the good/poor start and end of all transitions. As described previously in Section 4.2, three measurements including detection rate, detection delay, and isolation rate are used. For the following discussion, we chose a time slice of each transition process for performance evaluation and divide each case into further slices. Case Study 1. Here, three sections of testing data are monitored, samples 280430 to evaluate transition 1, and 13978
dx.doi.org/10.1021/ie201792r |Ind. Eng. Chem. Res. 2011, 50, 13969–13983
Industrial & Engineering Chemistry Research
ARTICLE
Table 4. Monitoring Performance of Case Study 1 model
missed alarm rate
false alarm rate
start of transition 1
14.29%
2.24%
0
end of transition 1
7.76%
5.24%
1
66.38%
start of transition 2
7.14%
8.52%
0
83.33%
end of transition 2
10.81%
14.16%
0 (transition mode)/4 (fault)
86.49%
Figure 11. Monitoring result of testing data 280360 in study case 2: (a) models of start (top) and end (bottom) of transition 1; (b) models of start (top) and end (bottom) of transition 2. Red circle: sample, and black number: index.
samples 515600 and 601670 to evaluate transition 2 when the fault occurs. The monitoring result of testing data 280430 is shown in Figure 8. Since transition 1 occurs at the sample 283, as can be seen from the top of Figure 8(a), samples 280366 are well scattered within the SVDD control limit of the start of transition 1 model, while 367430 are all beyond the circle. The start of transition 1 model is closely related to the features of mode 1. It is reasonable and logical that samples 280282 are also within the circle, even if these three samples do not belong to the process transition. Meanwhile, samples 315430 are the end portion of transition 1, according to the bottom of Figure 8 (a), shown as the model of end of transition 1, these samples are successfully detected with few missed and falsely detected
detection delay
isolation rate 85.71%
samples. Moreover, from Figure 8(b), we can also observe that the performance of this portion is acceptable since a quite small percentage of samples are falsely isolated by the model of the end of transition 2. Similarly, the monitoring results of the other two portions are shown in Figures 9 and 10. To quantify the performance of the proposed transition process monitoring algorithm, the monitoring performance of Case Study 1, including the detection rate, detecting delay, and isolating rate are listed in Table 4. The model of the start of transition 1 has 14.29% and 2.24% missed and false alarm detection rates. Meanwhile, it has no detection delay and has the ability to 85.71% successful isolation of the process transition. The detection performance of the end of transition 2 model is a little worse than that of the start of transition 1 model; however, its isolation rate reaches 86.49%. Therefore, this case study shows that the proposed monitoring approach can successfully monitor the good start and end of transition 1 and can also monitor well the good start and poor end of transition 2. Case Study 2. Two portions of the testing data are collected for monitoring in this Case Study: samples 280360 to evaluate the situation before the fault happens, while samples 361430 to evaluate the situation of transition 1 after the fault occurs. In other words, this case is designed to specially focus on transition 1. The monitoring result of testing data 280360 and 361430 in this case study case are shown in Figures 11 and 12, respectively. Samples 280360. As described previously, the start of transition 1 occurs at sample 283, while in this case, the fault occurs at hour 23 (sample 361). Hence, the samples of the first portion should be detected by the start of transition 1 model. As can be seen from Figure 11 (a), almost all of the samples are within the SVDD control limit. Since some data overlap between the start and end of transition 1 models, the end of transition 1 model can also help us monitor the process to a certain extent. All the samples are collected from transition 1, as we observe in Figure 11(b), and these samples are all beyond the control limit. Samples 361430. As seen in Figure 12, the models can monitor the situation fairly well when a fault occurs during the transition. Figure 12(a) shows that the end of transition 1 model can detect the fault with a little delay. Meanwhile, there are some false detection samples by the end of transition 2 model. The monitoring performance of Case Study 2 including detection rate, detecting delay, and isolating rate are given in Table 5. The isolation performance of the start and end of transition 1 models is quite satisfactory, as the isolation rates reach to 85.90% and 86.96%, respectively. We note that the detection performance of these models is not as good as the performance observed in Case Study 1; however, we consider these missed and false alarm rates as still being acceptable since the fault occurs during the process transition which makes this case much more complex and challenging. Case Study 3. To check whether the proposed monitoring approach can detect and isolate the fault which occurs during the process transition, two different case studies are proposed. 13979
dx.doi.org/10.1021/ie201792r |Ind. Eng. Chem. Res. 2011, 50, 13969–13983
Industrial & Engineering Chemistry Research
ARTICLE
Figure 12. Monitoring result of testing data 361430 in study case 2: (a) models of start (top) and end (bottom) of transition 1; (b) models of start (top) and end (bottom) of transition 2. Red circle: sample, and black number: index.
Finally, we discuss the monitoring result of the situation that a new mode is entered during a process transition. The portion of testing data 280430 is collected to evaluate the monitoring performance. The monitoring result is shown in Figure 13. Since the process is operated at a new mode in hour 20, samples 301430 should be outside of these four SVDD control limits. Only samples 280300 should be within the model of the start of transition 1. As can be seen from Figure 13, the new mode can be well isolated by the proposed four SVDD models. The
monitoring performances of these four models are quite satisfactory, especially for the models of the end of transition 1, and the start and the end of transition 2. Under the model of start of transition 1, the missed and false alarm rates are 11.11% and 6.77%, respectively. There is only one sample associated with the detection delay, and the isolation rate reaches 88.89%. This case study also demonstrates that the proposed method is somewhat limited, since it cannot distinguish between a fault and a new mode as the new mode is treated simply as an abnormal 13980
dx.doi.org/10.1021/ie201792r |Ind. Eng. Chem. Res. 2011, 50, 13969–13983
Industrial & Engineering Chemistry Research
ARTICLE
Table 5. Monitoring Performance of Case Study 2 model
missed alarm rate
false alarm rate
detection delay
isolation rate
start of transition 1
14.10%
10.96%
0 (transition mode)/6 (fault)
85.90%
end of transition 1
13.04%
19.05%
1 (transition mode)/2 (fault)
86.96%
start of transition 2
/
0
/
/
end of transition 2
/
5.30%
/
/
Figure 13. Monitoring result of testing data 280430 in study case 3: (a) models of start (top) and end (bottom) of transition 1; (b) models of start (top) and end (bottom) of transition 2. Red circle: sample, and black number: index. 13981
dx.doi.org/10.1021/ie201792r |Ind. Eng. Chem. Res. 2011, 50, 13969–13983
Industrial & Engineering Chemistry Research process operation. To address this shortcoming, one needs to incorporate the process knowledge and update the monitoring models as appropriate. These issues are among the current challenges for data-driven methodologies. To summarize, according to these three different case studies, including normal mode change, occurrences of a fault and a new mode, we can observe that the proposed PCA and multiclass SVDD approach can provide an effective tool for transition process monitoring.
6. CONCLUSIONS We propose a novel dynamic k-principal component analysisindependent component analysis (k-ICA-PCA) models clustering for process transition construction and a multiclass support vector data description (SVDD) based transition process monitoring to form a systematic framework. To evaluate the performance, the Tennessee Eastman (TE) process is used as a benchmark case study. Specifically, the proposed dynamic ensemble clustering is shown to have superior performance compared to the previously reported k-PCA models clustering approach. In this paper, we mainly focus on process monitoring. However, high quality produce is also a very important issue should be considered about. Therefore, incorporating the aim of optimal operation in the proposed transition process modeling and monitoring framework could be the future research work. ’ AUTHOR INFORMATION Corresponding Author
*E-mail:
[email protected].
’ ACKNOWLEDGMENT This work was financially supported by the National Natural Science Foundation of China (No. 60974056) and the National High Technology Research and Development Program of China (No. 2009AA04Z154). ’ REFERENCES (1) Ku, W.; Storer, R. H.; Georgakis, C. Disturbance detection and isolation by dynamic principal component analysis. Chem. Intell. Lab. Syst. 1995, 30, 179–196. (2) Nomikos, P.; MacGregor, J. F. Multi-way partial least square in monitoring batch processes. Chem. Intell. Lab. Syst. 1995, 30, 97–108. (3) Qin, S. J. Recursive PLS algorithms for adaptive data monitoring. Comput. Chem. Eng. 1998, 22, 503–514. (4) Kano, M.; Tanaka, S.; Hasebe, S.; Hashimoto, I.; Ohno, H. Monitoring independent components for fault detection. AIChE J. 2003, 49, 969–976. (5) Kruger, U.; Antory, D.; Hahn, J.; Irwin, G. W.; McCullough, G. Introduction of a nonlinearity measure for principal component models. Comput. Chem. Eng. 2005, 29, 2355–2362. (6) Thornhill, N. F.; Horch, A. Advances and new directions in plant-wide disturbance detection and diagnosis. Control Eng. Pract. 2007, 15, 1196–1206. (7) Cinar, A.; Palazoglu, A.; Kayihan, F. Chemical Process Performance Evaluation; CRC Press-Taylor & Francis: Boca Raton, FL, 2007. (8) Ge, Z. Q.; Song, Z. H. Process monitoring based on independent component analysis-principal component analysis (ICA-PCA) and similarity factors. Ind. Eng. Chem. Res. 2007, 46, 2054–2063. (9) AlGhazzawi, A.; Lennox, B. Monitoring a complex refining process using multivariate statistics. Control Eng. Pract. 2008, 16, 294–307.
ARTICLE
(10) Bhagwat, A.; Srinivasan, R.; Krishnaswamy, P. R. Fault detection during process transitions: A model-based approach. Chem. Eng. Sci. 2003, 58, 309–325. (11) Li, W.; Yue, H. H.; Valle-Cervantes, S.; Qin, S. J. Recursive PCA for adaptive process monitoring. J. Process Control 2000, 10, 471–486. (12) Wang, X.; Kruger, U.; Lennox, B. Recursive partial least squares algorithms for monitoring complex industrial processes. Control Eng. Pract. 2003, 11, 613–632. (13) Chen, J.; Liu, J. Using mixture principal component analysis networks to extract fuzzy rules from data. Ind. Eng. Chem. Res. 2000, 39, 2355–2367. (14) Zhao, S. J.; Zhang, J.; Xu, Y. M. Monitoring of processes with multiple operation modes through multiple principle component analysis models. Ind. Eng. Chem. Res. 2004, 43, 7025–7035. (15) Yao, Y.; Gao, F. R. Multivariate statistical monitoring of multiphase two-dimensional dynamic batch processes. J. Process Control 2009, 19, 1716–1724. (16) Srinivasan, R; Qian, M. S. Online fault diagnosis and state identification during process transitions using dynamic locus analysis. Chem. Eng. Sci. 2006, 61, 6109–6132. (17) Bhagwat, A.; Srinivasan, R.; Krishnaswamy, P. R. Model-based fault detection during process transitions. Presented in the 4th IFAC workshop on online fault detection and diagnosis in process industries 2001, 63–68. (18) Nimmo, I. Start up plants safely. Control Eng. Pract. 1993, 89, 66–69. (19) Srinivasan, R.; Wang, C.; Ho, W. K.; Lim, K. W. Dynamic principal component analysis based methodology for clustering process states in agile chemical plants. Ind. Eng. Chem. Res. 2004, 43, 2123–2139. (20) Zhao, C. H.; Wang, F. L.; Lu, N. Y.; Jia, M. X. Stage-based softtransition multiple PCA modeling and on-line monitoring strategy for batch processes. J. Process Control 2007, 17, 728–741. (21) Zhu, Z. B.; Song, Z. H.; Palazoglu, A. Process Pattern Construction and Multi-Mode Monitoring. J. Process Control 2011, doi:10.1016/j.jprocont.2011.08.002. (22) Duchesne, C.; Kourti, T.; MacGregor, J. F. Multivariate SPC for Startups and Grade Transitions. AIChE J. 2002, 48, 2890–2901. (23) Duda, R. O.; Hart, P. E.; Stork, D. G. Pattern Classification; John Wiley & Sons: 2001. (24) Beaver, S.; Palazoglu, A.; Romagnoli, J. A. Cluster analysis for autocorrelated and cyclic chemical process data. Ind. Eng. Chem. Res. 2007, 46, 3610–3622. (25) Cao, L. J.; Chua, K. S.; Chong, W. K.; Lee, H. P.; Gu, Q. M. A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine. Neurocomputing 2003, 55, 321–336. (26) Tax, D. M. J.; Duin, R. P. W. Support vector domain description. Pattern Recogn. Lett. 1999, 20, 1191–1199. (27) Tax, D. M. J.; Duin, R. P. W. Support vector data description. Mach. Learn. 2004, 54, 45–66. (28) Johannesmeyer, M. C.; Singhal, A.; Seborg, D. E. Pattern matching in historical data. AIChE J. 2002, 48, 2022–2038. (29) Hyv€arinen, A.; Oja, E. Independent component analysis: algorithms and applications. Neural Network 2000, 13, 411–430. (30) Lee, J. M.; Yoo, C. K.; Lee, I. B. Statistical process monitoring with independent component analysis. J. Process Control 2004, 14, 467–485. (31) Jackson, J. E. A User’s Guide to Principal Components; Wiley: New York, 1991. (32) Downs, J. J.; Vogel, E. F. A plant-wide industrial process control problem. Comput. Chem. Eng. 1993, 17, 245–255. (33) Chiang, L. H.; Russell, E. L.; Braatz, R. D. Fault detection and diagnosis in industrial systems; Springer-Verlag: London, 2001. (34) Ge, Z. Q.; Xie, L.; Kruger, U.; Lamont, L.; Song, Z. H.; Wang, S. Q. Sensor fault identification and isolation for multivariate nonGaussian processes. J. Process Control 2009, 19, 1707–1715. (35) Ricker, N. L. Optimal steady-state operation of the Tennessee Eastman challenge process. Comput. Chem. Eng. 1995, 19, 949–959. 13982
dx.doi.org/10.1021/ie201792r |Ind. Eng. Chem. Res. 2011, 50, 13969–13983
Industrial & Engineering Chemistry Research
ARTICLE
(36) Ricker, N. L. Decentralized control of the Tennessee Eastman challenge process. J. Process Control 1996, 6, 205–221. (37) Kano, M.; Nagao, K.; Hasebe, H.; Hashimoto, I.; Ohno, H.; Strauss, R.; Bakshi, B. R. Comparison of multivariate statistical process monitoring methods with applications to the Eastman challenge problem. Comput. Chem. Eng. 2002, 26, 161–174. (38) Singhal, A.; Seborg, D. E. Evaluation of a pattern matching method for the Tennessee Eastman challenge process. J. Process Control 2006, 16, 601–613. (39) Zhu, Z. B.; Song, Z. H. Fault diagnosis based on imbalance modified kernel Fisher discriminant analysis. Chem. Eng. Res. Des. 2010, 88, 936–951. (40) Zhu, Z. B.; Song, Z. H. A novel fault diagnosis system using pattern classification on kernel FDA subspace. Exp. Sys. Appl. 2011, 38, 6895–6905.
13983
dx.doi.org/10.1021/ie201792r |Ind. Eng. Chem. Res. 2011, 50, 13969–13983