Fault Detection and Classification for a Process with Multiple

Oct 4, 2008 - The proposed method was successfully applied to monitor a polyethylene process with multiple production grades and time-varying ...
0 downloads 0 Views 1MB Size
8250

Ind. Eng. Chem. Res. 2008, 47, 8250–8262

Fault Detection and Classification for a Process with Multiple Production Grades Jialin Liu* Department of Information Management, Fortune Institute of Technology, 1-10, Nwongchang Road, Neighborhood 28, Lyouciyou Village, Daliao Township, Kaohsiung Country, Taiwan, Republic of China

In practice, an industrial polyethylene process produces various products, even often developing new production grades for market demand. Therefore, the process has not only multiple operating conditions, but also timevarying characteristics. In addition, the process measurements inevitably are redundant and noisy. It is a challenging problem for on-line classifying the operating conditions in the industrial process. In this paper, principal component analysis (PCA) is applied to the reference data set to reduce the dimensions of variables and eliminate the collinearities among process measurements. Since outliers are inevitable in a real plant data set, they significantly stretch the cluster centers and covariances and reach an unreliable solution. In this paper, the distance-based fuzzy c-means (DFCM) algorithm is proposed. A boundary distance for each cluster is derived for identifying outliers, which should be discarded from the reference data set. Before the on-line classification, the statistic Q and T2 of new data have to be evaluated. If any one of the statistics is out of its control limits, it indicates the new data do not belong to the PCA subspace and they should be collected for the next model update. In this paper, the blockwise recursive formulas for updating the means and covariance matrix are derived. By utilizing the updated means and covariance, a new PCA subspace that accounts for all events is derived recursively. In addition, through rotating and shifting the coordinates of the PCA subspace, the cluster parameters on the new subspace can be directly transferred from the previous one. The proposed method was successfully applied to monitor a polyethylene process with multiple production grades and time-varying characteristics. 1. Introduction In order to produce a variety of production grades in a single manufacturing process, the operating conditions such as set points of reactor temperature and pressure, compositions of catalysts, etc., have to be adjusted to meet the production specifications. In that case, a process monitoring model has to be capable of identifying normal operating conditions of different production grades. In addition, the operating conditions also are adjusted in order to compensate for unwanted disturbances, such as the aging of equipment, catalyst degradation, and impurity of raw materials. Consequently, the process monitoring model is not only able to detect anomalies early for each production grade, but also can be adapted by using new event data to deal with the time-varying characteristics in a real plant. Fuzzy clustering is one of the artificial intelligence tools for pattern recognition. It has also been applied to resolve process monitoring issues.1,2 Fuzzy rules can be generated by using a fuzzy clustering method such as the fuzzy c-means (FCM) algorithm. In the on-line monitoring phase, the operating status can be identified by examining the memberships of the new data. In order to monitor a time-varying process, an adaptive FCM algorithm1 has been proposed and applied to an anaerobic digestion process. In that example, there were only two variables and four different operating conditions. In the modeling phase, there were neither convergence problems nor suboptimal solutions when the FCM algorithm was applied to the reference data set. Generally, a real plant process consists of highdimensional variables and existing collinearities among different variables. When applying FCM to the real plant data, the convergence issue has to be of concern.Teppola et al.2 applied principal component analysis (PCA) on an activated sludge * Tel.: 886-7-7889888, ext 6122. Fax: 886-7-7889777. E-mail: jialin@ center.fotech.edu.tw.

wastewater treatment plant data set with 20 variables for reducing the variable dimensions before generating fuzzy rules. In their approach, FCM was applied to the score vectors instead of the original variables. Since the dimensionality of the process variables was reduced and the collinearities among variables were eliminated, the convergence issue of the FCM could be effectively alleviated. However, a static PCA model was used in their application, in which there were only two different operating conditions. It was not discussed how the fuzzy rules were generated when the new events emerged. Chu et al.3 provided a different aspect for reducing high-dimensional variables. The variables were selected by minimizing the total entropy of training data to ensure superior generalization performance in classification. The support vector machine (SVM) was utilized with the selected variables as a classification tool to detect abnormal events in a process with multiple operational modes. However, the SVM classifiers should be adjusted when a new operational mode is operating. This was not addressed in their approach. Principal component analysis (PCA) is a popular method of compressing high-dimensional inputs into a few orthonormal variables. Past researchers4,5 have constructed PCA subspace from “healthy” data for the purposes of fault detection and identification. Qin6 unified several fault detection indices and provided a comprehensive review of statistical process monitoring methods for fault detection, identification, and reconstruction. However, when a PCA-based method is applied to monitor a nonstationary process in which the means, variances, and covariances of the variables are time-varying, the PCA model has to be adapted with the new event data. Li et al.7 proposed two recursive PCA algorithms based on rank-one modification and Lanczos tridiagonalization. The former needed to calculate all eigenpairs from the updating correlation matrix, and the latter had to perform a reorthogonalization procedure for the computed Lanczos vectors. Choi et al.8 proposed to perform a singular

10.1021/ie0710014 CCC: $40.75  2008 American Chemical Society Published on Web 10/04/2008

Ind. Eng. Chem. Res., Vol. 47, No. 21, 2008 8251

value decomposition (SVD) on the current correlation matrix to obtain a new PCA model. Actually, they performed SVD procedures twice for a single update. In this paper, the recursive formulas for updating means, variances, and covariances are derived. A new PCA model can be obtained by performing the SVD procedure once on the updated covariance matrix. Since the FCM used the constraint that the memberships of an observation in all clusters must sum to 1, the clustering results, including cluster centers and covariances, could be stretched to an infeasible solution due to outliers, which are inevitable in a real plant data set. Krishnapuram and Keller9 proposed a possibilistic c-means (PCM) that used a membership function, which was inversely proportional to the dissimilarity between observations and centers of clusters. Since they relaxed the constraints of the membership functions, the objective function had to be reformulated with a resolution parameter in order to avoid a trivial solution. Therefore, the interpretation of memberships in FCM as degrees of sharing was replaced by the possibilities of the observations belonging to the groups. The outliers can be detected through the sum of memberships from all clusters in PCM. However, they introduced the resolution parameter, which was estimated from the dissimilarities among clusters and the fuzzifier. In the fuzzy clustering category, the fuzzifier was an empirical value, and in most cases it was set to 2. Krishnapuram and Keller10 reported the PCM clustering results were sensitive to the fuzzifier. A proper fuzzifier has to be determined before clustering reference data using PCM for achieving a feasible solution. In this paper, a distance-based FCM algorithm has been proposed. A boundary distance is derived for each cluster, and an outlier can be identified when its data points to all cluster centers are greater than the respective boundary distances. The outlier should be discarded from the reference data set. The proposed monitoring method is divided into three phases: (1) off-line modeling phase, (2) on-line monitoring and updating phase, and (3) off-line updating phase. In the modeling phase, the PCA is applied to the reference data set to reduce the dimensions of the measurement variables, whereas the collinearities among variables also are eliminated. After that, the distance-based FCM algorithm is applied to the score vectors to extract the fuzzy rules for the quasi-steady-state operating conditions. In the on-line monitoring phase, the statistic Q and T2 of new data are examined with their control limits. New event data are collected for the next model update, when any one of the statistics is out of its control limits. If the new data belong to the PCA subspace, the distances of score vectors to all cluster centers are compared with the respective cluster boundary distances. If any one of the distances to the cluster centers is less than the respective cluster boundaries, the operating conditions can be identified through the memberships of the score vectors, whereas the cluster centers and covariances are updated by using the adaptive FCM.1 On the other hand, if all of the distances are greater than the respective boundary distances, the data points, which do not belong to any groups, should be collected for the next model update. In the off-line updating phase, the PCA subspace is adapted by using the proposed method with the new event data in order to account for all events. After that, the cluster parameters of the known groups have to be transferred to the new subspace. Through rotating and shifting coordinates of the PCA subspace, the cluster centers, covariances, and boundary distances can be directly transferred to the new subspace.11 The rest of this paper is organized as follows. Section 2 briefly describes the PCA method and derives the recursive formulas

of the means, standard deviations, and covariances with additional data sets, whereas the adaptive PCA method is proposed. The FCM, adaptive FCM, and distance-based FCM (DFCM) algorithms are presented in section 3. The method of transferring the cluster parameters to the new subspace is detailed in the section. Section 4 summarizes the overall strategy for monitoring a process with multiple operating conditions using the proposed method. In section 5, the advantages of the proposed method are assessed through a simulation example. In addition, a polyethylene process with multiple production grades is applied to demonstrate the effectiveness of the proposed monitoring method. Finally, the conclusions are given in section 6. 2. Feature Extracting 2.1. Principal Component Analysis. Consider the data matrix W ∈ Rm×n with m rows of observations and n columns of variables. Each column is normalized to zero means and unit j )S-1, where W j is a mean vector, 1 variances: X ) (W - 1jW is a column vector in which all elements are 1, and S is a diagonal matrix of standard deviations. The eigenvectors (P) of the covariance matrix can be obtained from the normalized data set. The score vectors are the projection of the data matrix X to each eigenvector. The data matrix X can be decomposed as T ˆ +E X ) t1pT1 + t2pT2 + ... + tkpTk + tk+1pk+1 + ... + tnpTn ) X (1)

ˆ is the projection of the data matrix X onto the subspace where X formed by the first k eigenvectors and E is the remainder of X that is orthogonal to the subspace. The statistic Q is defined in order to examine if the new data belong to the PCA subspace. Q ) eeT ) (x - xˆ)(x - xˆ)T ) x(I - PkPTk )xT

(2)

The loading vectors Pk ∈ R are the first k terms of eigenvectors of covariance matrix. Statistic Q is a measure of approximation error of the new data with PCA subspace. The confidence limit of Q is defined as follows: m×k

[

cR√2Θ2h02 Θ2h0(h0 - 1) +1+ QR ) Θ1 Θ1 Θ2 1

h0 ) 1 -

2Θ1Θ3 3Θ22

n

,

Θi )



λij,

]

1/h0

i ) 1, 2, 3

(3)

j)K+1

The percentile R is the probability of type I errors in hypothesis testing, and cR is the standard normal deviate corresponding to the upper (1 - R) percentile. Another measure of the difference between new data and the PCA subspace is the statistic T2. T2 ) xPkΛ-1PTk xT

(4)

The diagonal matrix Λ is the first k terms of eigenvalues, where Λ ) diag[λ1 λ2... λk]. The confidence limits are defined as k(m - 1) F (5) m - k k,m-1,R where Fk,m-1,R is an F distribution with degrees of freedom k and m - 1. The new data belong to the PCA subspace within (1 - R) confidence limits only when Q < QR and T2 < T2R. 2.2. Adaptive PCA. Assuming the data of new events with m′ rows of observations, the data matrix is denoted as W′ ∈ j ′) and the diagonal matrix of standard Rm′×n. The mean vector (W T2R )

8252 Ind. Eng. Chem. Res., Vol. 47, No. 21, 2008

deviations (S′) of the new event data have to be prepared for j ′)S′ -1 with zero normalizing the data matrix X′m′ ) (W′ - 1jW means and unit variances. The covariance matrix of the new data set can be obtained from the normalized data matrix: T Σ′m′ ) X′m′ X′m′/(m′ - 1). The mean vector and the standard deviations of combining the reference and the new data can be derived as follows: ¯ + m′ W ¯′ ¯ *) m W W m* m* σ*i )



(6)

(m - 1)σi2 + mw j i2 + (m ′ -1)σi2 + m ′ w j i2 - m*w*i2 m* - 1 S* ) diag[σ*1 σ*2 ... σ*n ]

(7)

where the m* are the total numbers of observations in the combined data set, i.e. m* ) m + m′. Based on the updated means and standard deviations, the covariance matrix of the combined data set is written as

subject to the constraints uij ∈ [0, 1],

m

ij ) 1, 1 e j e m, 0
ε,

k)k+1

i)1

go back to step 2. 3.3. Adaptive Distance-Based FCM. In order to detect a process with time-varying characteristics departing from normal operating conditions, Marsili-Libelli and Mu¨ller1 enhanced the FCM with an adaptive capability. After the initial training phase, the reference data set with m observations was classified into c groups according to their memberships. The centers of clusters were updated when the new observation was available. m+1

µim+1 )



m+1

∑u x ∑u q ij j

j)1

q ij )

j)1

(

m

∑u x +u q ij j

j)1

q i,m+1xm+1

)⁄(

m

∑u

q ij +

j)1

)

q (21) ui,m+1

where ui,m+1 is the membership of the new data, which was computed from the unchanged prototypes, and the xj are the raw data without compressing the variable dimensions. Since ∑j )m 1 uijq xj and ∑j m) 1 uijq can be recursively obtained, the computational requirement of the adaptation is moderate. However, one of the drawbacks of this approach is that the prototypes will be misled when an outlier appears. In addition, the FCM will encounter difficulties when the training data are from a high-dimensional system. Teppola et al.2 applied PCA to the training data set before fuzzy clustering; the score vectors were used instead of raw data. In their approach, the complexity of the FCM was relieved for a large-scale system. However,

another drawback was found, in that the PCA subspace, which was constructed from the training data set, could not account for the data of new events. Therefore, before updating the prototypes, the on-line data have to be verified with the control limits of the statistic Q and T2. Furthermore, outliers have to be excluded from the updating phase by examining the distances to the cluster centers. When the on-line data belong to the known event clusters, the covariance matrices of the clusters also are updated. m

∑ u (t - µ ) (t - µ ) + u q ij j

Σim+1 )

q T i,m+1(tm+1 - µi) (tm+1 - µi)

T

i

j

i

j)1

m

∑u

q q ij + ui,m+1

j)1

(22) where ∑j ) 1 uij(tj - µi) (tj - µi) could be recursively obtained. In this paper, except for the centers and covariance matrices, the boundary distances of the clusters also have been adapted. m

q

T

Db,im+1 ≡ Dave,im+1 + 3Dstd,im+1

(∑ { [( ∑ )

q 2 uqijDave,i + ui,m+1 Di,m+1

j)1 m

Dstd,im+1 )

2 uqij - 1 Dstd,i +

j)1

) ⁄ (∑ m

m

Dave,im+1 )

q uqij + ui,m+1

j)1

m

∑ u (D q ij

]⁄(

2 ave,i - Dave,i|m+1) +

j)1

q 2 - Dave,i|m+1)2 ui,m+1 (Di,m+1

)

m

∑u

q q ij + ui,m+1

j)1

)}

0.5

(23)

where D2i,m+1 is the Mahalanobis distance, which is the new data point to the ith prototype; Dave,i|m+1 and Dstd,i|m+1 respectively are the updated average and standard deviation of the distances, for which the data belong to the ith cluster. On the other hand, if the PCA subspace cannot account for on-line data, the data should be collected for the off-line model updating phase. When the subspace has been adapted by using the algorithm in section 2.2, the trained data can be described by using both subspaces. ¯ ) T*P*TS* + 1W ¯* W ) TPTS + 1W

(24)

The score vectors on the new subspace can be obtained by ¯ S*-1P* (25) T* ) TC + 1∆W -1 j ≡ W j - W j *. Equation 25 where C ≡ P SS* P* and ∆W suggests that the data point on the previous subspace can be adapted to the new one through coordinate rotating (C) and j S*-1P*). Therefore, the center of the ith cluster shifting (1∆W can be transferred to the new subspace according to the following form. T

¯ S*-1P* µ*i ) µiCk + 1∆W

(26)

Since the previous subspace only retained k term PCs, the coordinate rotating term is expressed as Ck ) PTk SS*-1P*. The covariance matrix of the ith cluster on the new subspace is written by Σ*i ) (T*i - 1µ*i)T(T*i - 1µ*i)/(mi - 1)

(27)

where mi are the numbers of data points belonging to the ith cluster. From eqs 25 and 26, the updated covariance matrix can be obtained from the previous subspace. T Σ˜iCn-k Σ*i ) CTk ΣiCk + Cn-k

(28)

8254 Ind. Eng. Chem. Res., Vol. 47, No. 21, 2008

Figure 2. Time-series plots of the measured variables x1 and x4.

in eq 30, onto the new subspace. The updated boundary distance of the ith cluster can be derived as follows: Db,i* ) -1

where the Σ*i

|| Σ || || Σ || 1/k*

i

i

-1

≈ (Ck,k*ΣiCk,k*) T

-1/k

Db,i

(31)

are used.

4. Proposed Approach

Figure 1. Flow diagram of the proposed approach.

where Σi is the covariance matrix of the ith cluster on the previous subspace. The covariance matrix of the (k + 1)th to nth term score vectors for the ith cluster is defined as Σ˜ i T ≡ Ti,n-k Ti,n-k/(mi - 1), whose coordinate rotating term is Cn-k ) PTn-kSS*-1P*. If the first k* terms of PCs are retained on the new subspace, the recursive formulas are rewritten as follows: ¯ S*-1P*k* µ*i ) µiCk,k* + 1∆W

(29a)

T Σ*i ) CTk,k*ΣiCk,k* + Cn-k,k* Σ˜ iCn-k,k*

(29b)

T T Σ˜*i ) Ck,n-k* ΣiCk,n-k* + Cn-k,n-k* Σ˜ iCn-k,n-k*

(29c)

Ck,k* ≡ PTk SS*-1P*k*,

T Cn-k,k* ≡ Pn-k SS*-1P*k*,

Ck,n-k* ≡ PTk SS*-1P*n-k*,

T Cn-k,n-k* ≡ Pn-k SS*-1P*n-k*

where P*k* are the first k* terms of eigenvectors, which span the new subspace. The boundary distance of the ith cluster is given by Db,i )

|| Σi || 1/k(tb - µi)Σi-1(tb - µi)T

(30)

where tb is an arbitrary data point on the previous subspace. The boundary distance on the new subspace can be obtained by transferring the data points and covariance, which were used

The proposed approach consists of three phases to monitor a process with multiple operating conditions, including (1) offline modeling phase, (2) on-line monitoring and updating phase, and (3) off-line updating phase, as Figure 1 shows. In the offline modeling phase, a distance-based FCM algorithm has been proposed in this paper. Outliers can be gradually discarded at the iterating steps. Therefore, the geometric shapes of clusters are capable of reflecting the actual data gatherings. The cluster boundaries are also applied to the on-line data while adapting the cluster parameters. If any one of the cluster boundaries is less than the distance of the scores to the corresponding cluster centers, the data point should be an outlier and be trimmed in the on-line updating phase. In addition, a recursive PCA algorithm has been applied in this paper. The PCA subspace is adapted using the new event data in order to account for the known and new events. After that, the cluster parameters on the previous subspace have to transfer to the new one. A concept of rotating and shifting coordinates of subspace has been applied to transfer the cluster parameters. Therefore, only new event data have to be clustered on the new subspace. The overall strategy for monitoring a process with multiple operating conditions using the proposed method is summarized as follows: (1) Off-line modeling phase: j and the 1. Obtain the initial values of the sample mean W standard deviation S from the training data set, normalizing the training data matrix to zero means and unit variances and calculating the initial covariance matrix Σ. 2. Perform SVD on Σ to obtain the loadings P. The number of principal components (PCs) is determined by utilizing the

Table 1. Six Types of Set-Point Changes Introduced into the Simulation Data state no.

time points

S0 S1 S2 S3 S2 S1

t t t t t t

e > > > > >

200 200 400 600 800 1000

x1 ∼ x3

xi(t) ) xi*(t) - 5(1 - e-(t-400)/10) xi(t) ) xi*(t) + 5(1 - e-(t-1000)/10)

x4 ∼ x6 xi(t) ) xi*(t) xi(t) ) xi*(t) + 5(1 - e-(t-200)/10) xi(t) ) xi*(t) - 5(1 - e-(t-600)/10) xi(t) ) xi*(t) + 5(1 - e-(t-800)/10)

xi(t) ) xi*(t) + 5(1 - e-(t-400)/10) xi(t) ) xi*(t) - 5(1 - e-(t-1000)/10)

Ind. Eng. Chem. Res., Vol. 47, No. 21, 2008 8255

Figure 3. Cluster boundaries from FCM and the proposed method (DFCM).

cross-validation procedure. Calculate the control limits of the statistic Q and T2 from the initial PCA subspace and store all of loadings P and scores T.

3. The number of clusters c is determined by evaluating the partition density index from multiple runs of FCM-GK with various numbers of clusters. 4. Perform DFCM to the score vectors on the PCA subspace to iterate the cluster parameters, including centers µi, covariances Σi, and boundary distances Db,i for each cluster. After converging the DFCM iterations, store the cluster parameters and the covariance matrix Σ˜ i that is orthogonal to the subspace. (2) Online monitoring and updating phase: j and S and project 1. Normalize the on-line data by using W the scaled data onto the PCA subspace. The statistic Q and T2 of the new data are compared with their control limits to verify if the on-line data belong to the subspace. 2. If T2 < TR2 and Q < QR, go to step 3; otherwise, go to step 6. 3. Calculate the Mahalanobis distances of the new sample to each cluster centers. If any one of the Mahalanobis distances is less than the corresponding cluster boundary distances, go to step 4; otherwise, go to step 6. 4. Report current operating states according to the membership values.

Figure 4. Process monitoring charts for the first test data set: (a) statistics Q and T2 for the training and test data sets by using the PCA from the training data set; (b) statistics Q and T2 for the training and test data sets by using the adaptive PCA.

8256 Ind. Eng. Chem. Res., Vol. 47, No. 21, 2008

(3) Off-line updating phase: j ′ and the 1. Calculate the values of the sample mean W standard deviation S′ based on the new event data, normalizing the new data to zero means and unit variances. Obtain the covariance matrix Σ′ based on the normalized data set. 2. Update the means, standard deviations, and covariance matrix by utilizing the additional information from the previous step and eqs 6, 7, and 11 respectively. 3. Calculate the eigenvalues and eigenvectors by performing SVD on the updated covariance matrix. The number of PCs on the new subspace (k*) is determined by comparing the captured variances, which can be evaluated from the eigenvalues, with the values of the off-line modeling phase. The captured variances are approximately the values of the previous PCA model captured. 4. Transfer the cluster parameters to the new subspace with eqs 29a, 29b, 29c, and 31, respectively. 5. Illustrative Examples

Figure 5. Transferred clusters and new clusters for the first test data set: (a) the first two score plots; (b) the first and third score plots.

5. Update cluster parameters, µi, Σi, and Db,i, based on the new data and eqs 21, 22, and 23, respectively. 6. Clarify the root causes of the new events and store the new event data for the off-line updating phase.

Figure 6. Process monitoring charts for the second test data set.

5.1. Simulation Example. A mathematical example is given to illustrate the proposed method monitoring a multivariate process with multiple operating conditions. This example is slightly modified from that of Choi et al.8 in order to reflect the changeovers among the operating states. The measured variables were generated using x* ) Ξu, where u ) [u1, u2,..., u10 ]T is a Gaussian random vector in which ui ∼ N(0, 0.1). The transformation matrix is Ξ ∈ R30×10, in which the elements were drawn from a normal distribution function, i.e., Ξij ∼ N(0, 1). Four hundred samples were generated for the training data set, in which a grade changeover from state number S0 to S1 began at time point t ) 201. The set points of the measured variables x1 to x6 were changed from 0 to 5 with first-order dynamics and a time constant of 10. Table 1 summarizes the set-point changes for each operating state, in which the variables x1 to x6 were changed and the others were kept at 0 with random noises. The data of the first test data set were generated after the training data set at time point t ) 401, from which the state was transited to the state number S2. Another transition in the first test data set began at time point t ) 601. The state was transited from S2 to S3. Four hundred samples were generated in the first test data set. The data of the second test data set

Ind. Eng. Chem. Res., Vol. 47, No. 21, 2008 8257

Figure 7. Classification of the second test data set from time points t ) 801-1000: (a) simulated data for x1 and x4; (b) membership values from FCM; (c) membership values from DFCM.

were generated from time point 801. First, the state was transited from S3 to S2. Then, the state was transited from S2 to S1 after time point 1001. Four hundred samples were generated in the second test data set. Figure 2 shows the trends of x1 and x4 during the data generating procedure. The 400 training samples were utilized by applying the crossvalidation procedure to determine the initial PCA subspace, in which the number of principal components (PCs) was 8 and the captured variances were 92%. In the training data set, there were 397 samples within 99% confidence limits, as Figure 4a shows. The score vectors were clustered using the FCM-GK and the proposed method (DFCM) that identified there were two clusters in the training data set. The first two scores and the cluster boundaries are plotted in Figure 3, in which the clusters C1 and C2 correspond to the state numbers S0 and S1, respectively. Since all scores within the PCA subspace were used to cluster in the FCM-GK algorithm, the geometric shape of C2 was stretched due to the scores of the changeover. In contrast, the outliers were trimmed gradually during the iteration steps by using the DFCM, and thus it results in cluster boundaries capable of reflecting the actual data gathering in the training data set. The initial PCA subspace was applied to the first test data set and the statistic Q and T2 are plotted in Figure 4a, in which the solid and dashed lines respectively are the control limits with 99% and 95% confidence limits. Since the simulated data in the data set were generated according to the state number transited from S1 to S2 and then S3, it can be expected the PCA subspaces identified in the observations were abnormalities in the first test data set. Figure 4a shows that the statistics Q of the first test data set were gradually increasing after time point 400. The blockwise adaptive algorithm was applied to the first test data set to adapt the initial PCA subspace. The result is shown in Figure 4b, in which the statistic Q and T2 are almost under their control limits. This shows that the adaptive PCA subspace is capable of explaining the data of the training and the first test data sets. Only the new data need to be clustered on the new PCA subspace, as the known cluster parameters, including cluster centers, covariances, and boundaries, are transferred to the new subspace through the proposed method. In order to illustrate

Figure 8. Blindly updating the clusters by using the transition data from S3 to S2.

that the training data still were covered by the transferred clusters, the first three scores of the training data set on the new subspace are plotted in Figure 5 with white points. In Figure 5, the clusters C1 and C2 were transferred from the previous subspace. Since the PCA subspace was adapted using the new data, which rotate and shift the coordinates of the subspace, the known clusters were overlapped on the first two score plots, as Figure 5a shows. However, Figure 5b shows that they were well separated on the third score vector. The scores of the new data were clustered by using DFCM. The cluster boundaries that were labeled as C3 and C4 correspond to state numbers S2 and S3, respectively, and the scores of the first test data set were plotted with black points in Figure 5. The transition data from S1 to S2 were excluded as Figure 5a shows the black points between clusters C2 and C3. Similarly, the black points between clusters C3 and C4 in Figure 5b display that the changeover data from S2 to S3 were trimmed. The second test data set was examined by utilizing the adaptive PCA subspace. Since the data were generated in the steady states of the state numbers S2 and S1 chronologically, it can be expected that those data belong to the adaptive PCA

8258 Ind. Eng. Chem. Res., Vol. 47, No. 21, 2008

Figure 9. Classification of the second test data set from time points t ) 1001-1200: (a) simulated data for x1 and x4; (b) membership values from FCM; (c) membership values from DFCM.

subspace. Figure 6 shows almost the entire statistic Q and T2 of the second test data set, for which time points t > 800 are within their control limits. In the on-line updating phase, the cluster parameters were adapted only when the scores of new data were not outliers. Figure 7a displays the trends of x1 and x4 during time points between 801 and 1000, in which the transition state is around before time point 825. In the traditional FCM algorithm, there always exist memberships to measure the data sharing to each cluster. Consequently, the memberships unstably oscillate during the transition state as Figure 7b shows. If the cluster parameters were blindly updated by using the transition data, the geometric shapes of the clusters would be unreasonably stretched. Figure 8 shows that the C4 cluster boundaries were pulled during the changeover, resulting in unstable memberships after the transition state in Figure 7b. In contrast, the outliers were identified and discarded before updating the cluster parameters. Figure 7c shows that the memberships of the data in state number S2 were more stable than blindly updating. Figure 9a shows that x1 and x4 transited from state number S2 to S1 in the second test data set, in which the time points are between 1001 and 1200. Similar to the previous discussion, if the transition data were taken into account, the memberships significantly oscillated during the steady state as Figure 9b shows. Figure 10 shows that the cluster boundaries of state number S2 were also twisted due to the transition data. When the clusters were updated by only using the inliers, which can be recognized through the cluster boundary distances, the memberships of the steady state data in S1 were also steady as Figure 9c shows. In the second test data set, it has been demonstrated that, although the new data belong to the PCA subspace while online monitoring, the outliers may still exist in a process with multiple operating conditions. If the cluster parameters are blindly updated, the geometric shapes of the clusters will be twisted. This results in an unreliable on-line classification. In this paper, the cluster boundaries are proposed to identify the outliers in the on-line updating phase. It can effectively avoid the outliers that mislead the direction for updating the clusters. 5.2. Industrial Application. The proposed method is applied to monitor a polyethylene plant, which is located in Kaohsiung, Taiwan. In Figure 11, the process flow diagram is shown. A

Figure 10. Blindly updating the clusters by using the transition data from S2 to S1.

Figure 11. Process flow diagram of the polyethylene autoclave process.

detailed description can be found in previous work.11 In this paper, 14 measurement variables, which are listed in Table 2, were chosen according to the plant expert’s advice. The reference data were collected every 5 min for 5 days. There

Ind. Eng. Chem. Res., Vol. 47, No. 21, 2008 8259 Table 2. Variable Descriptions in the Polyethylene Process abbreviation

description

FI-001 FIC-001 FIC-002 FI-002 TIC-001 TIC-002 TIC-003 TIC-004 PIC-001 FI-003 FI-004 FI-005 FIC-003 FIC-004

high-purity ethylene flow rate modifier (M1) flow rate modifier (M2) flow rate reactor feeding flow rate reactor feeding temperature reactor top temperature reactor middle temperature reactor bottom temperature reactor pressure initiator flow rate for regulating TIC-002 initiator flow rate for regulating TIC-003 initiator flow rate for regulating TIC-004 low-pressure recycle flow rate purge flow rate

were 1440 observations in the data set, including three different production grades, labeled G1, G2, and G3. Figure 12 shows these measured values, which are reactor top temperature and pressure, in arbitrary units for confidentiality. In the production grades G1 and G2, the specifications of melting index, the major quality, are the same. The difference of the specifications between the two production grades is the concentration of the copolymer in the final products. Therefore, the reactor temperature profile and pressure were regulated at the same ranges, as Figure 12 shows, while the concentrations of the modifiers were maintained at different levels. There were 1366 observations that could be explained by the PCA subspace, in which the captured variances were 93% with four PCs, within 99% confidence limits. The partition density index was used to evaluate the optimal number of clusters, as Figure 13 shows. It indicates that the training data within the PCA subspace should be clustered into four groups. The clustering results of the projection of the first two score vectors are shown in Figure 14, in which the data of grades G1 and G2 are clustered as C1 and C2, respectively, and the data of grade G3 are chronologically shared with clusters C3 and C4. It should be noted that clusters C3 and C4 could be well separated in the third score vector. In Figure 14, the gray and black lines represent the boundary distances of the clusters by using FCMGK and DFCM. It is obvious that the shapes of the FCM-GK clusters were stretched due to the outliers from the grade changeover. The situation would be significantly improved by using the proposed method. The outliers are symbolized as white points in Figure 14. Eventually there were different recipes for

Figure 12. Measured values of the reactor temperature and pressure in the training data set.

Figure 13. Determination of the optimal number of cluster by using the partition density index.

Figure 14. Comparisons of the clustering results by using the FCM and the proposed method.

each production grade. It is reasonable that the data points of grades G1 and G2 were clustered respectively to C1 and C2. However, grade G3 was shared by clusters C3 and C4. This indicates that there existed different operating conditions, which the operators might not be aware of, in the same production grade. The data after the reference data set were collected in 1 day. Figure 15 shows the statistic Q and T2 of the test data set, in which the solid and dashed lines represent the 99% and 95% confidence limits, respectively, and the Q chart suggests that the new events emerged after the 60th observation. The data within the PCA subspace were classified into cluster C4 according to their membership values. The centers, covariances, and boundary distances of cluster C4 were finely adapted by using the additional information. After that, the PCA subspace was accommodated by using the new event data and the proposed adaptive PCA method. The adaptive PCA captured 92% variances with four PCs, in which the Q and T2 statistics for the test data are shown in Figure 16. In Figure 16, almost all of the statistics of the test data are under their control limits. This shows that the proposed method is capable of updating the subspace to accommodate new events without recalculating the trained data.

8260 Ind. Eng. Chem. Res., Vol. 47, No. 21, 2008

Figure 15. Q and T2 statistics for the test data by using the PCA from the reference data set.

Figure 16. Q and T2 statistics for the test data by using the adaptive PCA model.

Only the new event data need to be clustered on the new subspace, which is labeled as C5 in Figure 17, in which the solid line represents the boundary distances of the group. The parameters of the known groups could be directly transferred from the previous subspace by using the proposed method. Figure 17 shows that the reference data and the cluster boundary distances were transferred from the previous subspace and are symbolized with gray points and lines. Figures 16 and 17 demonstrate that the features of known event data, which had been extracted from the historical data, could be adapted by using the new event data. The proposed algorithm can effectively monitor a process with multiple operating conditions and timevarying characteristics. Eventually the production grade G3 was shared by clusters C3, C4, and C5. Figure 18a shows the melting index analyzed from the laboratory (Lab MI) and its specification limits during the production of grades G2 and G3. The specifications of MI transited from low to high during the grade changeover, whereas the operating conditions, such as reactor pressure and temper-

Figure 17. Clustering results of the new event data and known groups.

Ind. Eng. Chem. Res., Vol. 47, No. 21, 2008 8261

Figure 18. Comparisons of the laboratory measured MI (a) and modifier flow rate (b) during the operations of grades G2 and G3.

Figure 19. Variations of the reactor pressure (a) and temperature (b) in the different clusters.

ature, were adjusted to bring the MI to the target. The most significant effect for adjusting MI from low to high is the modifier concentration of the reactor inlet gas. Figure 18b shows that the modifier flow rate (FIC-001) was regulated at a higher level during the initial stage of grade G3, in which the data points were clustered as C3. Since the reaction conversion was only around 10-15%, and most of the unreacted gas recycled continuously in this process, operators gradually reduced the modifier flow rate to a relatively stable level, in which the sampling data were clustered as C4 and C5. In the C4 cluster, two measurements of MI were outside the specifications, as Figure 18a shows. The cluster was identified as an abnormal operating condition. Figure 19 shows the variations of the reactor pressure and temperature and their specification limits during the production of grades G2 and G3. In the C4 cluster, operators tried to adjust the reactor pressure for regulating the MI. This resulted in the pressure highly oscillating between specification limits, as Figure 19a shows, and led to an unstable operating state. The situation was eliminated when the operators increased the set point of the reactor temperature, as Figure 19b shows. It is interesting to note that the reactor temperature was usually set at the low specification in order to avoid a runaway reaction. It has been suggested that the reactor temperature should be increased if the C4 cluster is encountered.

6. Conclusions In this paper, an outlier rejection clustering algorithm has been proposed in order to trim the plant data from grade changeover and reach more feasible clustering results. The turning parameters in the proposed method, the cluster boundary distances, are more intuitive than the resolution parameter in the possibilistic c-means approach. It is also easily updated, whereas the subspace is adapted. In addition, blockwise recursive formulas for updating the PCA subspace have been derived. The means and covariances are stored for the next subspace updating, in order to account for the data of new events without recalculating the trained data. When the subspace has been adapted, the known event clusters must be transferred to the new subspace. The updating method of the cluster parameters, which include centers, covariances, and boundary distances, has been proposed in this paper. The real plant data from a polyethylene process with multiple production grades have been demonstrated with the proposed method. Results show that the challenges of process monitoring, such as outliers and timevarying characteristics, can be effectively dealt with. Acknowledgment This work is supported by the National Science Council, Republic of China, under Grant NSC-96-2221-E-268-004.

8262 Ind. Eng. Chem. Res., Vol. 47, No. 21, 2008

Nomenclature

Greek Symbols

A ) ratio of the standard deviations of the training data set to the combined data set A′ ) ratio of the standard deviations of the new event data set to the combined data set C ) coordinate rotating operator for the updated subspace c ) number of clusters Db,i ) boundary distances of the ith cluster Db,i* ) updated boundary distances of the ith cluster on the new subspace Dave,i ) average of the Mahalanobis distances that the data points belong to the ith cluster Dstd,i ) standard deviation of the Mahalanobis distances that the data points belong to the ith cluster D2ij,Σi ) Mahalanobis distances of the jth data point to the ith cluster E ) residual part of the data matrix k ) number of principal components k* ) number of principal components on the new subspace m ) number of observations in the training data set m′ ) number of observations in the new event data set m* ) total number of observations mret ) number of the retained data points in the DFCM algorithm n ) number of variables P ) loading matrix of the PCA subspace P* ) loading matrix of the new subspace pi ) ith loading vector QR ) control limits of the statistic Q with (1 - R) confidences q ) fuzzifier S ) diagonal matrix of standard deviations S′ ) diagonal matrix of standard deviations for the new event data S* ) updated standard deviation matrix by using S and S′ Si ) sum of memberships of the ith cluster T2R ) control limits of the statistic T2 with (1 - R) confidences T ) score matrix of the training data set T* ) score matrix of the training data set on the new subspace ti ) ith score vector uij ) membership value of the jth data point sharing with the ith cluster Vi ) fuzzy hypervolume of the ith cluster W ) raw data matrix in the training data set W′ ) new event data matrix jW ) mean vector of the training data j ′ ) mean vector of the new event data jW j * ) updated mean vector by using W j and W j′ jW X ) normalized data matrix X′ ) normalized data matrix of the new event data X* ) normalized data matrix by adding X and X′ ˆ ) systemic part of data matrix X

Λ ) diagonal matrix of the eigenvalues from Σ Λ* ) diagonal matrix of the eigenvalues from Σ* Σ ) covariance matrix of the training data Σ′ ) covariance matrix of the new event data Σ* ) updated covariance matrix by using Σ and Σ′ Σi ) covariance matrix of the ith cluster Σ˜ i ) covariance matrix of the ith cluster in which the scores are orthogonal to the subspace σi ) standard deviation of variable i of the training data σi′ ) standard deviation of variable i of the new event data σi* ) adaptive standard deviation of variable i by using σi and σi′ µi ) ith cluster centers

Literature Cited (1) Marsili-Libelli, S.; Mu¨ller, A. Adaptive Fuzzy Pattern Recognition in the Anaerobic Digestion Process. Pattern Recogn. Lett. 1996, 17, 651. (2) Teppola, P.; Mujunen, S.; Minkkinen, P. Adaptive Fuzzy c-Means Clustering in Process Monitoring. Chemom. Intell. Syst. 1999, 45, 23. (3) Chu, Y.; Qin, S. J.; Han, C. Fault Detection and Operation Mode Identification Based on Pattern Classification with Variable Selection. Ind. Eng. Chem. Res. 2004, 43, 1701. (4) Wise, B. M.; Gallagher, N. B. The Process Chemometrics Approach to Process Monitoring and Fault Detection. J. Process Control 1996, 6, 329. (5) Kourti, T.; Lee, J.; MacGregor, J. F. Experiences with Industrial Applications of Projection Methods for Multivariate Statistical Process Control. Comput. Chem. Eng. 1996, 20, S745. (6) Qin, S. J. Statistical Process Monitoring: Basics and Beyond. J. Chemom. 2003, 17, 480. (7) Li, W.; Yue, H. H.; Valle-Cervantes, S.; Qin, S. J. Recursive PCA for Adaptive Process Monitoring. J. Process Control 2000, 10, 471. (8) Choi, S. W.; Martin, E. B.; Morris, A. J.; Lee, I. Adaptive Multivariate Statistical Process Control for Monitoring Time-varying Processes. Ind. Eng. Chem. Res. 2006, 45, 3108. (9) Krishnapuram, R.; Keller, J. M. A Possiblistic Approach to Clustering. IEEE Trans. Fuzzy Syst. 1993, 1, 98. (10) Krishnapuram, R.; Keller, J. M. The Possibilistic c-Means Algorithm: Insights and Recommendations. IEEE Trans. Fuzzy Syst. 1996, 4, 385. (11) Liu, J. Process Monitoring Using Bayesian Classification on PCA Subspace. Ind. Eng. Chem. Res. 2004, 43, 7815. (12) Gustafson, D.; Kessel, W. Fuzzy Clustering with a Fuzzy Covariance Matrix. In Proceedings of IEEE Conference on Decision and Control, San Diego, CA, USA; Institute of Electrical and Electronics Engineers: New York, 1979; p 761. (13) Theodoridis, S.; Koutroumbas, K. Pattern Recognition; Academic Press: New York, 1999.

ReceiVed for reView July 23, 2007 ReVised manuscript receiVed January 31, 2008 Accepted August 27, 2008 IE0710014