Fault Detection Based on a Maximum-Likelihood Principal Component

Jump to The Maximum-Likelihood Principal Component Analysis (MLPCA ... - The mixture model, which is based on K local principal component models, ...
2 downloads 0 Views 252KB Size
2316

Ind. Eng. Chem. Res. 2005, 44, 2316-2327

Fault Detection Based on a Maximum-Likelihood Principal Component Analysis (PCA) Mixture Sang Wook Choi, Elaine B. Martin,* and A. Julian Morris Centre for Process Analytics and Control Technology, School of Chemical Engineering and Advanced Materials, University of Newcastle upon Tyne, NE1 7RU, United Kingdom

In-Beum Lee Department of Chemical Engineering, Pohang University of Science and Technology, San 31 Hyoja Dong, Pohang, 790-784, Korea

Multivariate statistical process control (MSPC) that is based on principal component analysis (PCA) has been applied to many industrial processes. Such an approach assumes that the process signals are normally distributed and exhibit stationary behavior. These assumptions significantly limit the range of applicability of the methodology. In this paper, a monitoring scheme based on a maximum-likelihood PCA (MLPCA) mixture is proposed. The main idea behind the approach is that complex data patterns obtained from a process can be described by several local models. A learning algorithm that helps determine model structure, i.e., the number of local PCA models to be constructed and the number of principal components to be included in each local model, is proposed, resulting in the efficient determination of a MLPCA mixture model. Two monitoring statistics used in the MLPCA-based monitoring scheme are then derived and a two-step fault detection method is proposed. Several mathematical models that describe different process features, including multiple steady states, nonlinearity, and process dynamics, are used to demonstrate the potential of the proposed method. The approach is compared with previous approaches, including nonlinear PCA and dynamic PCA, and it is confirmed that the developed methodology is a good alternative. Finally, the method is applied for fault detection on a continuously stirred tank reactor process, and its performance for detecting six types of faults, in terms of false alarm rate, missed alarm rate, and detection delay, is shown to outperform current approaches. 1. Introduction One of the more recent applications of principal component analysis (PCA) has been for the monitoring of the performance of industrial processes.1,2 Conventional PCA-based multivariate statistical process control (MSPC) schemes have been based on the assumption that the process variables are normally distributed and asymptotically stationary. Occasionally, problems associated with the on-line monitoring scheme are encountered, including its inability to detect faults or, alternatively, the scheme causing a false alarm rate that is greater than that which is statistically acceptable. These may materialize as a consequence of the underlying assumptions of PCA-based MSPC theory being violated. For example, the process may exhibit nonlinear behavior, operate in different operational regions, or exhibit dynamic behavior. In other words, the historical data obtained from more-complex situations is not normally distributed and the process does not exhibit stationary behavior. By applying linear PCA-based algorithms to nonlinear processes, inefficient and unreliable process performance monitoring schemes will materialize, even where a large number of principal components are retained to explain a large proportion of the sample variance. * To whom correspondence should be addressed. Tel.: +44 191 222 6231. Fax: +44 191 222 5748. E-mail: e.b.martin@ newcastle.ac.uk.

Including too many principal components will result in an increase in the probability of false alarms occurring, particularly in the T2 statistic, as a result of the lowerorder components only explaining minimal levels of variability. One approach to explaining a greater proportion of the sample variance using fewer components has been to utilize nonlinear PCA algorithms.3-6 Such nonlinear PCA approaches for the development of nonlinear monitoring schemes typically use arbitrary nonlinear functions to globally fit the process nonlinearity. An alternative paradigm is to divide a global nonlinear region into several stages and model each stage with a local linear model.7,8 From the perspective of process monitoring, this approach is attractive in that the monitoring concepts that are based on linear PCA, including the associated monitoring statistics, can be directly applied. Moreover, the application of local modeling methods is not limited to nonlinear processes. The structure of multivariate process data becomes more complex in the presence of process characteristics such as process dynamics and unavoidable phenomena such as process degradation. Similarly, a change in the process operating conditions can make it difficult to model the underlying data structure. A local modeling approach can help address these challenges by dividing the data into separate regions and modeling them separately. These challenges are explored in this paper. Whiteley et al.9 proposed an adaptive resonance theory (ART2) network, which is a form of the adaptive

10.1021/ie049081o CCC: $30.25 © 2005 American Chemical Society Published on Web 02/25/2005

Ind. Eng. Chem. Res., Vol. 44, No. 7, 2005 2317

clustering algorithm, to estimate the distribution of dynamic sensor data. They demonstrated its application for the monitoring of a dynamic process. Chen and Liu10 proposed a mixture PCA model that combined PCA and heuristic smoothing clustering (HSC) techniques. Their method consisted of two steps. The first step was to apply HSC to the data, which incorporated an automatic procedure for determining the number of clusters. The next step was to develop a PCA monitoring scheme for each cluster using the standard metricssHotelling’s T2 statistic and the Q-statistic. More recently, Choi et al.11 proposed a fault detection and isolation procedure whereby a Gaussian mixture model was trained using the expectation-maximization (EM) algorithm, which simultaneously approximates the data distribution and defines groups of data that exhibit similar behavior. This is unlike previous approaches10,12 in which clustering and local modeling were performed sequentially. The rationale for this is that the effect of the model parameters and data partitioning on the EM algorithm are inter-related. In this paper, a probabilistic version of PCA and its mixture model13,14 is developed to examine complex data patterns. The proposed approach is implemented for process performance monitoring. Compared to conventional PCA, probabilistic PCA has several advantages from a process modeling perspective. First, it allows a PCA model to account for the measurement (instrument) noise, which is estimated as one of the model parameters in the EM algorithm. Second, both the measured variables and the derived principal components are represented by a probability density function, which allows an insight into system behavior to be obtained. More importantly, in the case of a probabilistic PCA mixture model, data clustering and local modeling are simultaneously performed through the maximization of a single likelihood function.13 The methodology proposed is termed a maximum-likelihood PCA (MLPCA) mixture model. This paper proposes an overall procedure for process performance monitoring based on a MLPCA mixture model. First, the model structure is determined including the number of, and dimensionality of, the local principal component models. As the number of local principal component models and measured variables increase, the number of possible solutions to be searched significantly increases, and, by utilizing an exhaustive search to identify the model, this stage becomes extremely time-consuming and potentially impractical. To identify an optimal model structure, a methodology for the efficient determination of the model structure combined with a model parameter learning procedure is proposed. Second, two monitoring statistics (using the probability density of the score vector) are calculated. Finally, a procedure for on-line fault detection is developed for efficient process performance monitoring. The remainder of this paper is structured as follows. Section 2 presents a detailed description of a MLPCA mixture model and many of its key properties. A model development procedure, including the derivation of the model structure and the learning of the parameters, is described in Section 3, along with the concept and algorithm of the proposed process monitoring scheme based on the MLPCA mixture model. In Section 4, the advantages of the proposed process monitoring methodology are demonstrated through several scenarios. Case studies considered include mathematical simula-

tions that reflect multiple process operation (i.e. multiple clusters), nonlinear behavior, and dynamic behavior prior to the methodology being applied to a continuous stirred tank reactor (CSTR) process. Conclusions are drawn in Section 5. 2. The Maximum-Likelihood Principal Component Analysis (MLPCA) Mixture Model The mixture model, which is based on K local principal component models, has a distribution of the following form: K

p(x) )



K

Rkp(x|k) )

k)1

∑ ∫ p(x|t,k)p(t|k)Rk dt

(1)

k)1

where x ∈ Rm and t ∈ Rpk are the observation and score vectors, respectively. If a score vector t is assumed to be distributed as a standard normal (i.e., N(0,I)), then each local model has a probability distribution expressed as p(x|t,k) ) N(Pkt + µk,σk2I). Given N samples (D ) {x1,x2, ..., xN}), the model parameters Θ ) {θk,Rk}k)1:K require to be determined where Rk is the mixing proportion and is constrained as ∑k Rk ) 1 and θk ) {Pk,µk,σk2}, where Pk is a loading matrix, µk is a mean vector, and σk2 is the variance of the noise for the kth local principal component model. Assuming that all unknown parameters are constants and there is no prior information available, the parameters can be estimated by maximizing the likelihood N function Πn)1 p(xn).15 In situations where multiple parameters and a highly nonlinear likelihood function result, an extremely complex optimization problem materializes. One major advantage of the EM algorithm is that it provides a simple but practical iterative solution to obtaining the model parameters, thereby avoiding a complex optimization problem.16,17 The procedure for identifying the maximum likelihood estimates of the model parameters of an underlying distribution from a given incomplete data set is a standard problem. Instead of maximizing the log likelihood, L(Θ), the expected complete-data log-likelihood E(Lc) is iteratively maximized in the EM algorithm (see Appendix I). One of the desirable characteristics of the EM algorithm is that the log likelihood L(Θ) never decreases between iterations, i.e., L(Θnew) gL(Θold). The objective function to be maximized in the EM algorithm for a MLPCA mixture model is the expected complete-data log likelihood E(Lc), which is the expectation of N

K

p(k|xn,Θold) ln(p(xn,t,k|Θnew)) ∑ ∑ n)1k)1

(2)

with respect to the posterior distribution p(t|k,xn,θold) of a score vector (see Appendix I). In the EM algorithm, each iteration consists of two steps: the expectation (the E-step) and the maximization (the M-step). In the E-step of the EM algorithm, given the parameters, Θold, obtained in the previous M-step and the nth sample xn, p(k|xn,Θold) and p(t|k,xn,θold) are determined. The class posterior prob-

2318

Ind. Eng. Chem. Res., Vol. 44, No. 7, 2005

ability p(k|xn,Θold) is obtained by applying Bayes’ theorem:

p(k|xn,Θold) )

First, Rknew can be obtained by introducing a Lagrange multiplier λ and maximizing: N

p(xn|k,θold)p(k|Θold)

(3)

old

p(xn|Θ )

where p(xn|k,θold) is the multivariate normal density with a mean of

E[xn|k,θold] ) E[Pkt + µk|k,θold] ) PkE[t|k,θold] + µk ) µk

(4)

K

∑∑ n)1k)1

K

p(k|xn,Θold) ln(Rknew) + λ(

Rknew )

) PkPkT + σk2I

(5)

that is, p(x|k,θold) ) N(µk,Gk) where Gk ) PkPkT + σk2I. The scores posterior probability p(t|k,xn,θold) is then derived. From Bayes’ theorem, the scores posterior probability can be written as

p(t|k,xn,θold) )

p(xn|t,k,θold)p(t|k,θold) p(xn|k,θold)

N

E(t|k,xn) ) E(t|k,θ ) +

p(k|xn,Θold) ∫ p(t|xn,k,θold) ln(p(xn,t|k,θnew)) dt ∑ ∑ n)1k)1 ) E[ )

K

∑ ∑ p(k|xn,Θold) ln(p(xn,t|k,θnew))]p(t|x ,k,θ n)1k)1 n

) (σi2I + PkTPk)-1PkT(xn - µk)

(7)

and a covariance of

cov(t|k,xn) ) cov(t|k) -

p(k|xn,Θold)[- 0.5m ln(σknew)2 ∑ ∑ n)1k)1

0.5(σknew)-2{|xn - µknew|2 - 2E(tT|k,xn)(Pknew)T(xn 0.5tr(E(ttT|k,xn))] + C (13)

cov(t,xn|k)(cov(xn|k)) cov(xn,t|k)

Pknew ) (

p(k|xn,Θold)xjnknewE(tT|k,xn)) × ∑ n)1 N

(8)

Because GkPk ) Pk(PkTPk + σk2I) ≡ PkHk and Hk-1Pk ) PkGk-1, eq 8 can be rewritten as

cov(t|k,xn) ) I - PkTPkHk-1 ) (Hk - PkTPk)Hk-1 ) σk2Hk-1 (9) Thereafter, in the M-step of the EM algorithm, the model parameters are updated by maximizing

N

where tr is the trace operator and C is a constant that is not related to the model parameters (θ). Taking the derivative of eq 13, with respect to Pk and setting it to zero gives N

-1

N

old)

K

cov(xn,t|k,θold)(cov(xn|k,θold))-1(xn - E(xn|k,θold))

E(Lc) )

(12)

µknew) + tr((Pknew)T(Pknew)E(ttT|k,xn))} -

old

) I - PkTGk-1Pk

N

p(k|xn,Θold) ∑ Nn)1

K

N

where p(xn|t,k,θold) ) N(Pkt + µk,σk2I) and p(t|k,θold) ) N(0,I). Because x and t are jointly Gaussian-distributed, p(t|k,xn,θold) is also normally distributed with a mean of

1

Second, the other model parameterssPk, µk, and σk2s are updated by maximizing the second term of the RHS of eq 10. Using E(t|k,xn) and E(ttT|k,xn) obtained in the E-step, this term can be represented by

N

(6)

(11)

K Rknew ) 1. Setting the derivawith the constraint ∑k)1 tive of eq 11 with respect to Rknew to zero, the expression Rknew ) -∑np(k|xn,Θold)/λ is obtained, and summing both sides over k gives λ ) -N. The updated equation for Rknew is now

and a covariance of

cov[xn|k,θold] ) Pkcov[t|k,θold]PkT + cov[ek|t,k,θold]

Rknew - 1) ∑ k)1

(

(14)

where x j nknew is the mean-centered values of xn that have been centered about the mean of the kth cluster ( x j nknew ) xn - µknew). Similarly, the updating rules can be calculated for µk,

new

µk

)

∑n p(k|xn,Θold)xn - PknewE(t|k,xn)

K

∑ ∑ p(k|xn,Θold) ln(Rknew) + n)1k)1

p(k|xn,Θold)E(ttT|k,xn))-1 ∑ n)1

∑n p(k|xn,Θold)

(15)

and for (σknew)2,

K

p(k|xn,Θold) ∫ p(t|xn,k,θold) ln(p(xn,t|k,θnew)) dt ∑ ∑ n)1k)1 (10) (see eq A3 in Appendix I). The first term of the righthand side (RHS) of eq 10, which includes Rknew, and the second term, which incorporates Pk, µk, and σk2, can be optimized independently, because they are unrelated.

(σknew)2 )

∑n p(k|xn,Θold){|xj nknew|2 j nknew) + 2(E(tT|k,xn)(Pknew)Tx

tr((Pknew)T(Pknew)E(ttT|k,xn))}/[m

∑n p(k|xn,Θold)]

(16)

The iteration of the E- and M-steps in sequence is guaranteed to find the local maximum of L(Θ). In

Ind. Eng. Chem. Res., Vol. 44, No. 7, 2005 2319

summary, the EM algorithm for learning a MLPCA mixture model is as follows: E-step: Calculate p(k|xn,Θold), E(t|k,xn) and E(ttT|k,xn) ) cov(t|k,xn) - E(t|k,xn)E(t|k,xn)T based on Θold and D ) {x1,x2,...,xN}. M-step: Update the model parameters Θnew ) {Pknew,µknew,(σknew)2,Rknew}k)1:K ) arg maxΘ E(Lc(Θ)). 3. Process Monitoring Using a MLPCA Mixture Model 3.1. MLPCA Mixture Model with Identical Noise Variance. In Section 2, the model parameters were estimated based on the assumption that a model structure is defined, i.e., the number of local principal component models and the number of principal components to be retained were determined prior to model parameter estimation. In this section, this model structure design problem is addressed for the MLPCA mixture model. First, a simple model is considered. The variance of the noise, representing the uncertainty in the measured variables, of a local principal component model is assumed to differ from that of the other local models in the MLPCA mixture model representation described in the previous section. However, it is unlikely that measurements collected from one sensor will exhibit different noise levels as a consequence of the magnitude of the signal (i.e., noise will not be heterosedastic). In this respect, it is reasonable to assume that all local models will exhibit the same variance in terms of the noise. Equation 16 in the M-step then can be replaced by

(σnew)2 )

∑n ∑k

p(k|xn,Θold){|x j nknew|2 -

2(E(tT|k,xn)(Pknew)Tx j nknew) + tr((Pknew)T(Pknew)E(ttT|k,xn))}/(Nm) (17) which is a specific form of the updating equation for the noise variance. 3.2. Model Structure Design and Parameter Estimation. To construct an MLPCA mixture model for process monitoring, the number of local principal component models and number of principal components to be included in each local model needs to be determined prior to executing the EM algorithm for model parameter estimation. Searching all possible combinations of numbers of local models and model dimensionality to attain an optimal solution requires a large amount of processing time to search a subspace comprising m(mK - 1)/(m - 1) models, where m and K are the number of measured variables and local models, respectively. Cross-validation18 is commonly used to determine the number of principal components to retain in the individual local models. However, in practice, this may be computationally impractical. To reduce the computational time and simplify the learning procedure, the complexity of the model structure is restricted by additional constraints. In Bishop19 and Tipping and Bishop,14 it is assumed that all local principal component models have the same number of principal components. This significantly reduces the computation time needed to identify the most suitable model structure. With the aforementioned restriction on possible solutions, the number of local models and PCA model

dimensionality were determined using Bayesian PCA.19 However, it is unlikely that local principal component models with different covariance structures will have the same model dimensionality. Meinicke and Ritter20 handled this additional complexity with a resolutionbased framework. Unlike the EM algorithm for a MLPCA mixture,14 where the noise variance of each local model is estimated with fixed model dimensionality, the resolution-based method finds the optimal model dimensionality for the local principal component models for a fixed noise variance. Each model dimensionality is determined as the number of principal components whose eigenvalues are greater than a fixed σ2 value.20 This approach allows all local principal component models to have different dimensionalities. However, the number of local models is first determined, and then their appropriate dimensionality is calculated. That is, the final model dimensionalities and model parameters are not considered when determining the number of local models. However, in practice, the model structure, represented by the number of local models and their corresponding dimensionalities, and the model parameters are inter-related. In this study, the optimal number of local models, K o, is determined using the following criterion:

[

K° ) arg min J(k) ≡ K K 1 N K p(k|xn,Θ) ln(p(xn|k)) Rk ln Rk (18) Nn)1k)1 k)1

∑∑



]

This is a form of the complexity measure of the Bayesian Ying-Yang system.21 Because J(k) contains the model parameters Θ, J(k) is obtained after the implementation of the EM algorithm for a given value of K. Hence, the model structure and parameters are mutually interlinked. However, the number of local principal component models may be overestimated when the number of samples is not sufficiently large and the model parameters will be poorly estimated. In this case, the number of local models can alternatively be selected by finding the smallest integer that satisfies the expression |J(k) - J(k + 1)| > δ, where δ is a predefined threshold. A new learning algorithm is proposed for identifying the model structure and parameters of a MLPCA mixture model by combining and modifying the original EM algorithm and the resolution-based framework: 1. Select a MLPCA mixture with K local models. 2. Identify the appropriate number of principal components for each local model (p1:K ≡ {p1,...,pK}) and estimate the model parameters Θ. 2.1. Identify the number of principal components. Let σ2 ) σmax2 initially. 2.1.1. For a fixed σ2, calculate Θ ) {Pk,µk,Rk}k)1:K, using the EM algorithm. 2.1.2. Determine the dimensionality of the local m I(λki - σ2), where I(t) is equal to 1 if t models; pk ) ∑i)1 > 0 and zero otherwise (λki is the ith eigenvalue of Sk ) ∑n p(k|xn)(xn - µk)(xn - µk)T/(RkN)).20 2.1.3. Set σ2 ) γσ2, where γ ) (σmin2/σmax2)1/20. If σ2 < σmin2, go to step 2.2; otherwise, go to step 2.1.4. Twenty distinct values of σ2 in the range σmax2 and σmin2 were investigated. By defining γ as γ ) (σmin2/σmax2)1/20 and starting from σmax2, σ2 becomes σmin2 after 20 iterations. 2.1.4. Select the optimal number of principal components as maxp1:K[∆LLK], where ∆LL is the change in the

2320

Ind. Eng. Chem. Res., Vol. 44, No. 7, 2005

log-likelihood during the period when the mixture model includes p1:K ≡ {p1,...,pK}. (The log-likelihood monotonically increases with the number of principal components, that is, as model complexity, increases. During the period when the mixture model includes the optimal numbers of principal components, the change in the loglikelihood is the largest.) 2.2. Given p1:K ≡ {p1,...,pK}, find the optimal model parameters Θ ) {Pk,µk,σk2,Rk}k)1:K by applying the EM algorithm. (Although the model parameters are defined in step 2.1.1, the optimal value of σ2 is not known exactly. That is the reason for the inclusion of this step.) 2.3. Calculate J(K) and go to step 1 until K ) Kmax. 3. Find the optimal number of local principal component models K° among K ) {1, 2, ..., Kmax}, using the criterion given in eq 18. In this algorithm, the maximum and minimum values of the noise variancesσmax2 and σmin2sare determined by implementing the EM algorithm after the number of principal components are set to 1 and m - 1, respectively. The maximum number of local models (Kmax) varies, depending on the data pattern; however, Kmax values of 5-10 are typical. 3.3. Monitoring Statistics. After a MLPCA mixture model is constructed based on the data set obtained from normal operating conditions, two statistics, T2 and Q, are used to detect faults in the model and the residual subspaces, respectively, in MSPC. The T2-statistic is the Mahalanobis distance between a sample and its mean within the model subspace, whereas the Q-statistic is the Euclidian distance of a sample from the model subspace. In contrast to conventional PCA, a score vector in the probabilistic PCA model is represented by a Gaussian posterior distribution p(t|k,x) in the model subspace, not by a single fixed vector.13 Thus, the posterior mean considered is

htk ) E(t|k,x) ) Hk-1PkTx jk

(19)

with E(thk) ) 0 and cov(thk) ) Hk-1PkTGkPkHk-1 ) PkTGk-1Pk being representative of the score vector for the kth local principal component model. A reconstructed sample vector can be obtained by transforming the posterior mean, i.e., xˆ k ) Wkhtk + µk, where Wk ) Pk(PkTPk)-1Hk is the weight vector that minimizes the sum of squares of deviations of N observed samples (see Appendix II). Using the posterior mean htk, the T2-statistic for the kth local principal component model is given by

Tk2 ) htkT(PkTGk-1Pk)-1 htk )x j kTPk(PkTGkPk)-1PkTx jk

(20)

where Gk ) PkPkT + σk2I. On the other hand, the Q-statistic for fault detection in the residual subspace of the kth local PCA is expressed as

j k - Wkhtk|2 Qk ) |x ) ||x j k - Pk(PkTPk)-1PkTx j k|2 )x j kT[I - Pk(PkTPk)-1PkT]x jk

(21)

j k, is conAlternatively, the projected vector, t˜k ) PkTx sidered for calculating the monitoring statistics as for conventional PCA. The mean and covariance of t˜k are

E(t˜k) ) 0 and cov(t˜k) ) PkTGkPk, respectively. It can be shown that the T2- and Q-statistics, based on ˜tk, are the same as those based on htk. Note that htk is equivalent to t˜k as σ2 approaches a value of zero. As the number of local principal component models increase, the number of monitoring charts simultaneously increases. Hence, it is tedious to monitor all monitoring charts. In fact, only the T2 and Q monitoring charts for the local principal component model to which the current sample belongs require to be interrogated.11 Therefore, the class posterior probability p(k|xnew,Θ) is first investigated for a new sample xnew to identify which local model is responsible for the signal. The monitoring statistics for the corresponding local principal component model then are considered for monitoring if the behavior of the current process is normal. That is, if Tk*2 g δT2(k*) or Qk* g δQ(k*) (where k* ) arg maxk p(k|xnew,Θ) for xnew), then the process is considered abnormal. Here, δT2 and δQ are the control limits for the T2- and Q-statistics, respectively. The control limits for the two statistics δT2(k) and δQ(k) are calculated based on the sample distributions j k - Wkhtk. The control limit for the of htk (or t˜k) and ek ) x T2 statistic for a significance level of R is given by

pk(Nk2 - 1) δT2(k;R) ) F(R;pk,Nk - pk) N(Nk - pk)

(22)

where Nk is the number of samples whose class posterior density for the kth local principal component model is a maximum among all class posterior densities and pk is the number of principal components retained in the kth local principal component model.22 The control limit for the Q-statistic for a significance level of R is given by δQ(k;R) ) [s/(2m)]χR2(2m2/v), where m and s are the sample mean and variance of the Q values from the training data, respectively.23 Occasionally, these control limits are poorly estimated by the aforementioned equations, because measured samples are not described well by a multivariate normal distribution. In this case, the control limits can be estimated using nonparametric density estimation techniques, such as kernel density estimation.24 4. Application Studies and Results Three mathematical simulations and a case study based on a continuous stirred tank reactor (CSTR) are considered for the investigation of the effectiveness and reliability of the proposed monitoring methodology based on a MLPCA mixture model. The performance of the proposed method is compared with PCA, dynamic PCA, and nonlinear PCA, in terms of the false alarm rate, missed detection rate, and the delay in the detection of a process change. 4.1. Example 1: Data from a Mixture of Local Structures. To test the appropriateness of the proposed learning algorithm to identify a model structure that is typically unable to be modeled directly by a standard PCA monitoring scheme, an artificial data set that is comprised of 3 variables and 600 observations is generated from three different local “process” structures. The measurement vector x˜ for the kth local structure was derived as follows. A variable vector was generated from x° ≈ N(µk,Σk), where µk and Σk are the mean vector and the covariance matrix for the kth local model structure, respectively. The variable vector is then rotated, i.e., x

Ind. Eng. Chem. Res., Vol. 44, No. 7, 2005 2321 Table 1. Summary of the Performance for Selecting the Number of Principal Components in a Local Principal Component Analysis (PCA) Model for an Estimated Noise Variance signal-to-noise (SN) ratio

correct selection (%)

σˆ 2a

10 7.5 5 2.5 2 1

100 100 100 92 78 35

0.0107 ( 0.000885 0.0112 ( 0.000950 0.0133 ( 0.0015 0.0242 ( 0.008 0.0325 ( 0.014 0.0524 ( 0.0235

a

Figure 1. Sample patterns and box-plots of the J-values used to find the number of local PCAs: (a and b) µ1 ) [300]T, µ2 ) [030]T, and µ3 ) [003]T; (c and d) µ1 ) [200]T, µ2 ) [020]T, and µ3 ) [002]T; and (e and f) µ1 ) [100]T, µ2 ) [010]T, and µ3 ) [001]T.

) Rkx° (where Rk is a random rotation matrix), and, finally, white noise (v ≈ N(0,Σv)) was added to x, i.e., x˜ ) x + v. In this example, the covariance matrixes for x° took three different local structures:

[ [ [

0 0.52 0 2 Σ1 ) 0 0.3 0 0 0 0.32 0 0.52 0 2 Σ2 ) 0 0.3 0 0 0 0.12 and

0 0.52 0 Σ3 ) 0 0.12 0 0 0 0.12

] ] ]

(23a)

(23b)

(23c)

It is expected that the dimensionality of the three local models of the MLPCA mixture will be 3, 2, and 1, respectively, if properly estimated, based on 0.12 being considerably smaller than 0.52 and 0.32, and, similarly, 0.32 is much smaller than 0.52. First, the performance of the algorithm for estimating the number of local principal component models is investigated. Three data sets were generated by changing the distance between the centers of the local structures, i.e., the distances were x18, x8, and x2. A Monte Carlo simulation of 100 realizations was performed for each data set and the results are displayed for the 100 realizations, in terms of box plots (Figure 1). The J values are calculated from eq 18, and d2 represents the squared distance between the centers of

Value given includes the standard deviation.

the local structures. The objective function value (J) has a minimum at K ) 3 for most of the 100 realizations, indicating that the sample distribution is fairly welldescribed by three local models. As the overlap of the distinct local distributions increases (that is, the distance between the centers of the local structures converge), the probability of overestimating the number of local models increases if the selection criterion represented in eq 18 is used. In contrast, when selected by finding the smallest integer such that |J(k) - J(k + 1)| > δ, where δ was set to be 0.1 × J(1), the true number of local principal component models was obtained for all realizations. Second, the performance of the proposed algorithm for estimating the number of principal components was tested. The probability that the dimension of the local models is correctly derived was obtained for various signal-to-noise (SN) ratios, which are defined by the ratio of Σ to Σv. A Monte Carlo simulation of 100 realizations was performed for a specific SN ratio. The results reported in Table 1 indicate that the true number of principal components for each local model can be exactly determined if the SN ratio is not significantly low (i.e., SN ratios of >2.5). As expected, the magnitude of the estimated noise variance of the model and its variability increases as the SN ratio decreases. For a low SN ratio, the number of principal components retained in the local PCA model is underestimated if the estimated noise variance σˆ 2 exceeds 0.32. The results in this example indicate that the proposed approach is generally good at finding the structure of a MLPCA mixture, as defined in terms of the number of local models and the number of principal components in each local model. 4.2. Example 2: Simulated Nonlinear Process. To test the applicability of a MLPCA mixture model to nonlinear processes, the data set used in Dong and McAvoy4 was simulated and analyzed using PCA, principal curve-based nonlinear PCA, and MLPCA mixture models. The system consists of three variables that are nonlinearly related:

x ) [t|t2 - 3t| - t3 + 3t2]T

(for t ∈ [0.01,2])

(24)

The measured variables were contaminated with white Gaussian noise, N(0,0.12). The normal training data set was comprised of 200 samples obtained using eq 24. The test data set consisted of two parts: the first 200 samples (samples 1-200) were collected under the same conditions as the nominal data set, and the remaining 200 test samples (samples 201-400) were generated from the same system but with x3 expressed as x3 ) -1.1t3 + 3.2t2.

2322

Ind. Eng. Chem. Res., Vol. 44, No. 7, 2005

Table 2. Monitoring Results Represented as False and Missed Alarm Rates

PCA nonlinear PCAa nonlinear PCAa MLPCA mixtureb

level of significance

false alarm rate (%)

missed alarm rate (%)

0.022 0.01 0.05 0.022

1.5 1 5 5

98.5 74 68 47.5

a Monitoring results obtained using nonlinear PCA.4 b A MLPCA mixture model consisting of six local PCA models with a dimension of 1.

than the nonlinear PCA methodology: just that it is a good alternative that should be considered, particularly in nonlinear process monitoring. 4.3. Example 3: Simulated Dynamic Process. When process inputs in a dynamic process are nonGaussian, such as sinusoidal signals, samples measured from the steady-state processing condition do not necessarily exhibit multivariate Gaussian patterns. In this example, the objective is to show that a MLPCA mixture model can accommodate this issue. The following 4 × 4 multivariate dynamic process25 is considered:

x(t + 1) ) Ax(t) + B[u(t) - v(t)] + p(t) (25a) y(t) ) Cx(t) + D[u(t) - v(t)] + o(t)

(25b)

where u(t), y(t), and x(t) are the input, output, and state variables, respectively, and v(t), o(t), and p(t) define the input, output, and process noise, respectively. The system matrices A, B, C, and D are given by

A) B)

[

[

[

0.67 0.67 -0.67 0.67

]

-0.4236 0.1253 -1.1465 1.1892 -1.6656 0.2877 1.1909 -0.0376

[

0.3273 0.1746 C) -0.1867 0.7258

1.0668 0.0593 D) -0.0956 -0.8323

0.2944 -1.3362 0.7143 1.6236

-0.5883 2.1832 -0.1364 0.1139 -0.6918 0.8580 1.2540 -1.5937

]

]

-1.4410 0.5711 -0.3999 0.6900

]

with the process input denoted by u(t) ) [10 sin(t) 10 sin(2t) 10 sin(3t) 10 sin(4t)]T. It is assumed that all noise terms are white and Gaussian, N(0,1). In this test, 200 samples formed both the training and test data sets. The training process data were collected under normal conditions. To denote an abnormal situation, a drift-type sensor fault was introduced to the second input sensor after t ) 100 on the test mode as follows: Figure 2. Fault detection results: (a) PCA-based monitoring charts and (b) MLPCA mixture-based monitoring charts.

u˜ 2(t) ) u2(t) + s(t - 100)

The comparison of conventional PCA, principal curvebased nonlinear PCA, and MLPCA mixture for fault detection is summarized in Table 2. The false alarm rate and missed alarm rate represent the proportion of samples whose T2 or Q indices exceed the control limits in the normal situation and the number that do not breach the limits in an abnormal situation, respectively. All the methods have an acceptable false alarm rate, whereas the missed alarm rate of the MLPCA mixture model is much lower than that of the other methods. The monitoring charts based on conventional PCA and the MLPCA mixture models are shown in Figure 2. Because the nonlinear pattern of samples in the training and test data sets are very similar, the two monitoring charts based on conventional PCA have difficulty, in terms of detecting the fault condition. In contrast, a MLPCA mixture model, based on the Q-statistic, could clearly detect the fault condition as a result of partitioning the nonlinear pattern into six linear local models. It is not claimed that this approach is necessarily better

where s is the drift rate, which was varied between 0.1 and 0.8. For each case, a Monte Carlo simulation of 100 realizations was performed and the average run length (ARL) was used as a comparison index, where the run length was defined as the time step at which the outof-control signal was detected after the introduction of the fault. In practice, a signal was deemed out of control only if at least three consecutive points were outside the control limit.26 The monitoring results acquired using PCA, dynamic PCA,27 and MLPCA mixture are summarized in Table 3. The 97.8% control limits for the T2- and Q-statistics were used for all the models, based on the kernel density estimation technique. The number of principal components in PCA and dynamic PCA were selected as 5 and 7, which, on average, explained over 98% and 95% of total sample variance, respectively. For the MLPCA mixture model, the two local PCA models both have five principal components, which were determined by the criterion described in example 1.

(26)

Ind. Eng. Chem. Res., Vol. 44, No. 7, 2005 2323 Table 3. False Alarm Rate, Missed Alarm Rate, and Empirical Average Run Length (ARL) for Three Model Types (PCA, Dynamic PCA, and MLPCA Mixture) model type

false alarm rate (%)

missed alarm rate (%)

Empirical ARLa Q

T2

overall

PCA dynamic PCA MLPCA mixture

1.2 2.5 2.1

Drift Rate of s ) 0.8 9.6 8.5 7.3

PCA dynamic PCA MLPCA mixture

1.5 2.1 3.0

Drift Rate of s ) 0.5 15.7 13.5 9.3

62.4 41.1 47.6

17.4 14.3 11.4

17.4 14.3 11.4

PCA dynamic PCA MLPCA mixture

1.4 2.56 2.27

Drift Rate of s ) 0.3 73.4 21.9 14.4

NDb 76.2 72.5

28.2 23.0 16.8

28.2 23.0 16.8

PCA dynamic PCA MLPCA mixture

1.8 2.4 2.85

Drift Rate of s ) 0.1 73.3 65.4 34.1

NDb NDb 77 [31]

69.7 [81] 60.9 [98] 37.1

69.7 [81] 60.9 [98] 37.1

39.4 32.0 39.7

11.3 9.9 9.9

11.3 9.9 9.9

a The number in square brackets ([‚‚‚]) represents the number of simulations out of 100 realizations in which fault detection using the T2- or Q-statistic was successful. b ND indicates that the fault is not detected by the monitoring chart in any of 100 realizations.

of reactant A and the total energy balance for the reacting system:

dC ) F(C - Ci) - Vr dt

V VFCp

Figure 3. Nonisothermal continuously stirred tank reactor (CSTR) process and nine measured variables: (1) coolant temperature, (2) reactant mixture temperature, (3) concentration of reactant A, (4) solvent concentration, (5) solvent flow rate, (6) solute flow rate, (7) coolant flow rate, (8) outlet concentration, and (9) outlet temperature.

As summarized in Table 3, the MLPCA mixture model has the lowest missed alarm rate and the smallest empirical ARL for all four cases of sensor faults. In particular, the superiority of the MLPCA mixture model over the PCA and dynamic PCA representations is evident when they are used to detect the slowly changing sensor drift (i.e., s ) 0.1). For the drift-type sensor fault with s ) 0.1, the PCA and dynamic PCA representations were unable to detect the fault for 19 and 2 cases out of the 100 realizations, respectively. The MLPCA mixture-based Q monitoring chart, on the other hand, could successfully detect the change for all 100 realizations. Moreover, as the drift rate becomes small, the difference between the detection speed of the MLPCA mixture model and the other methods increases significantly. 4.4. Case Study: The Continuously Stirred Tank Reactor (CSTR) Process. The proposed method is also used to monitor a nonisothermal CSTR process28 (Figure 3). The reactor model is based on three assumptions: perfect mixing, constant physical properties, and negligible shaft work (i.e., work done by the system on the surroundings). In the reactor, reactant A was premixed with a solvent (the reactant mixture) and it is converted into product B at a rate of r ) βrk0e-E/(RT)C. The dynamic behavior of the process is described by the mass balance

(27)

dT ) FCpF(Ti - T) dt UA (T - Tc) + (-∆Hr)Vr (28) 1 + UA/(2FcFcCpc)

where the heat-transfer coefficient UA is empirically represented as UA ) βUAaFcb. The concentration of the reactant mixture is calculated by Ci ) (FaCa + FsCs)/ (Fa + Fs). The outlet temperature (T) and the outlet concentration (C) are controlled by manipulating the inlet coolant flow rate (Fc) and the inlet reactant flow rate (Fa), respectively. The model parameters used in the CSTR modeling and the controller parameters of the two PI controllers are given in Table 4. In this study, some of the stochastic patterns of the inputs and disturbances used in Yoon and MacGregor28 are modified so that the process shows not only system dynamics but also periodic patterns. All process inputs and disturbances are generated using first-order autoregressive models or sinusoidal signals. In addition, all measured variables are contaminated with white Gaussian noise to describe the sensor noise (see Table 5). To evaluate the performance of the proposed method, six types of faults, including equipment faults, a disturbance change and sensor faults, were considered. Table 6 describes the conditions and characteristics of the six faults. In detail, F1, F2, and F6 are categorized as sensor faults, F3 and F4 as equipment faults, and F5 as a disturbance change. A Monte Carlo simulation of 30 realizations was executed for each fault case. The fault magnitude in each realization was randomly chosen, which meant that no two realizations were the same. In all cases, the CSTR process was simulated for 500 min, with the fault being introduced at t ) 301 min. Nine process variables were measured every minute (see Table 5). The samples obtained for t ) 100-300 min were used as the training set, and the subsequent 200 samples (t ) 301-500 min) were used as the test data set.

2324

Ind. Eng. Chem. Res., Vol. 44, No. 7, 2005

Table 4. Model Parameters Used in the CSTR Model and the PI Controller Parameters notation

parameters and constants

V F Fc Cp Cpc ∆Hr k0 E/R Kc(T) τI(T) Kc(C) τI(C)

volume of reaction mixture in the tank density of reaction mixture density of coolant specific heat capacity of the reaction mixture specific heat capacity of the coolant heat of reaction pre-exponential kinetic constant activation energy/ideal gas constant temperature controller gain integral time concentration controller gain integral time

value m3

1 106 g/m3 106 g/m3 1 cal g-1 K-1 1 cal g-1 K-1 -1.3 × 107 cal/kmol 1010 min-1 8330 K -1.5 5 0.4825 2

Table 5. Patterns of the Process Inputs, Disturbances, and Measurement Noise variable

mean value

φ

a

b

σe2

Input and Disturbance 0.1 1.90 × 10-2 0.2 4.75 × 10-2 0.2 4.75 × 10-2 0.9 1.90 × 10-3 0.95 9.75 × 10-4

Fs Ti Tc βr βUA

0.9 m3/min 370 K 365 K 1 1

Ca Cs

19.1 kmol/m3 0.3 kmol/m3

2 0.25 4.75 × 10-2 0.2 0.2 1.875 × 10-3 Measurements

(1) Tc (2) Ti (3) Ca (4) Cs (5) Fs (6) Fa (7) Fc (8) C (9) T

σv2

2.5 × 10-3 2.5 × 10-3 1.0 × 10-2 2.5 × 10-5 4.0 × 10-6 4.0 × 10-6 1.0 × 10-2 2.5 × 10-5 4.0 × 10-4

a All inputs and disturbances, except C and C , are modeled a s as x(t) ) φx(t - 1) + v(t), where the process noise is given as v(t) 2 ≈ N(0,σv ). On the other hand, Ca and Cs are modeled as x(t) ) a sin(bt) + v(t). b All the measured variables x˜ (t) are contaminated with white Gaussian noise, which is defined as e(t) ≈ N(0,σe2); x˜ (t) ) x(t) + e(t).

The performance of the PCA-based and MLPCA mixture-based monitoring methods were compared on the basis of the missed alarm rate and the empirical ARL. The number of principal components retained in the PCA model was chosen, by cross validation, to be six. In the case of the MLPCA mixture model, the number of local principal component models to be included in the model and the number of principal components need to be determined. Two MLPCA models with different model structure were considered: (1) two local principal component models and six principal components in each local PCA (MLPCA1), and (2) three local principal component models and six principal components in each local PCA (MLPCA2). The 97.8% control limit, derived based on the application of kernel density estimation to the training samples, was used in all monitoring charts. Figure 4 shows the time-series patterns of the nine process measurements for one of the 30 realizations for fault case F6 (f6 ) 8.4). The T2 and Q monitoring charts, based on PCA and MLPCA2, are compared in Figure 5. The MLPCA2-based monitoring charts were significantly superior to the PCA-based charts in detecting this fault. The monitoring performance of the three types of models for all six faults is summarized in Table 7. By adopting a monitoring scheme based on two or three local models, MLPCA1 and MLPCA2, a relatively low

Figure 4. Time-series pattern of nine measurements in fault case F6 (f6 ) 8.4).

Figure 5. T2 and Q monitoring charts obtained using (a) PCA and (b) MLPCA2 in fault case F6 (f6 ) 8.4).

missed alarm rate materializes, compared with that for PCA. Also, the smallest fault detection delay denoted in terms of the empirical ARL is exhibited by MLPCA2. In particular, MLPCA2 was able to detect F2 and F6, on average, ∼3 times faster than PCA. Figure 6 represents the fault detection rate for all fault cases. The fault detection rate is defined as the number of realizations in which each monitoring method detects the fault at each time step.29 MLPCA2 significantly outperformed PCA for fault cases F2, F5, and F6. For these cases, the fault detection rate of PCA did not attain values close to 30, i.e., the total number of realizations, even at the end of the simulation. In other words, although the PCA-based monitoring method failed to detect the faults

Ind. Eng. Chem. Res., Vol. 44, No. 7, 2005 2325 Table 6. Six Fault Scenarios in the Continuously Stirred Tank Reactor (CSTR) Process fault index

operating conditions

fault description

magnitudea

F1 F2 F3 F4 F5 F6

outlet temperature (T) sensor fault: drift (ramp) outlet concentration (C) sensor fault: bias (step) cooling coils fouling: βUA(k) decrease (ramp) reaction poisoning: βr decrease (ramp) inlet temperature (Ti) increase: ramp reactant A flow rate (Fa) sensor fault: precision degradation

x˜ 9(k) + f1(t - 300) x˜ 8(k) + f2 βUA(k) + f3(t - 300) βr(k) + f4(t - 300) x2(k) + f5(t - 300) e6 ≈ N(0, f6(4.0 × 10-6))

f1 ≈ U(0, 0.02) f2 ≈ U(0.02, 0.04) f3 ≈ -U(10-4, 2 × 10-4) f4 ≈ -U(2 × 10-4, 3 × 10-4) f5 ≈ U(0, 5 × 10-3) f6 ≈ U(8, 10)

a

U(a,b) represents the uniform distribution in the interval between a and b.

Table 7. Monitoring Results in Terms of Missed Alarm Rate and Empirical ARL method

empirical ARL (min)

missed alarm rate (%)

PCA MLPCA1 MLPCA2

Fault Index ) F1 62.6 57.8 42.8

42.6 39.6 36.7

PCA MLPCA1 MLPCA2

Fault Index ) F2 100.4 67.2 38.5

76.9 75.1 66.4

PCA MLPCA1 MLPCA2

Fault Index ) F3 57.0 46.8 40.9

35.0 32.9 28.7

PCA MLPCA1 MLPCA2

Fault Index ) F4 73.8 67.3 52.6

44.1 42.0 38.4

PCA MLPCA1 MLPCA2

Fault Index ) F5 75.2 42.4 37.4

65.0 61.4 55.5

PCA MLPCA1 MLPCA2

Fault Index ) F6 64.5 28.8 16.0

67.1 54.1 43.8

for several realizations of F2, F5, and F6, MLPCA2 almost always detected them. Failure to detect failure may be a consequence of fault magnitude, that is, it is too small to enable differentiation between fault patterns and normal patterns. In summary, the MLPCA mixture-based monitoring resulted in not only a low missed alarm rate but also a small detection time delay for all six faults, even though the CSTR model did not exhibit distinct nonlinear or dynamic behavior at steady state.

5. Conclusions In this paper, a new fault detection method based on a maximum-likelihood principal component analysis (MLPCA) mixture model has been proposed. It considers the measurement error by including an error term with an isotropic error covariance structure. The learning algorithm is comprised two stages: finding the model structure and the model parameters. An efficient learning method is proposed to identify the number of local models and the number of principal components to be retained in each local model and these steps are linked to the expectation-maximization (EM) algorithm for the learning of the model parameters. The proposed algorithm has two desirable properties: (1) Model structure and parameters can impact on one another and thus they are determined simultaneously. (2) It allows all local principal component analysis (PCA) model to have a different number of principal component. Also, the T2and Q-statistics in each local PCA model were derived and a two-step fault detection framework was presented for efficient and effective process monitoring. Three types of examples, which reflect processes with multiple steady states, nonlinearity, and dynamics, respectively, and the continuously stirred tank reactor (CSTR) model were considered to evaluate the performance of the proposed MLPCA mixture-based monitoring scheme. The approach was compared to PCA, nonlinear PCA, and dynamic PCA, in terms of false alarm rate, missed alarm rate, and detection delay, which was represented in terms of an empirical ARL. The simulation studies indicated that the MLPCA-based monitoring method not only gave the lowest number of missed alarms but also the smallest detection delay, but with a false alarm rate that was similar to the other methods. In conclusion, the proposed localized monitoring approach can be successfully implemented for the monitoring of processes with multiple steady-state conditions. Also, it is a good alternative over other techniques for the monitoring of nonlinear or dynamic processes. Acknowledgment S.W.C. acknowledges the financial contribution of the Centre for Process Analytics and Control Technology. Also, S.W.C. was supported by the Postdoctoral Fellowship Program of Korea Science and Engineering Foundation (KOSEF). Appendix I. Expected Complete Data log Likelihood

Figure 6. Fault detection rate versus detection time delay.

Assuming that the samples D ) {x1,x2,...,xN} are independent and identically distributed, the log likeli-

2326

Ind. Eng. Chem. Res., Vol. 44, No. 7, 2005

hood of the parameters for the given data set is given by N

L(Θ|D) ) ln

N

N

K

p(xn|Θ) ) ∑ ln[ ∑ ∫ p(xn,t,k|Θ) dt] ∏ n)1 k)1 n)1

(A1)

This can be written in the form

[

N

L(Θ

new

|D) )

K

ln ∑ ∫ p(t,k|xn,Θold) × ∑ n)1 k)1 p(xn,t,k|Θnew) p(t,k|xn,Θold) N

g

K

∑ ∑∫

n)1k)1 N

)

dt

]

p(xn,t,k|Θnew) dt p(t,k|xn,Θ ) ln p(t,k|xn,Θold)

K

∫ p(t,k|xn,Θold) × ∑ ∑ n)1k)1

)

Equation A4 is the posterior-weighted sum of squares of the reconstruction errors. x j nk is the mean of xn, centered about the mean of the kth cluster. Note that the posterior p(k|xn) determines the contribution of each observation to the final parameter estimate, Wk, for this optimizing criterion. Equation A4 can be rewritten in matrix form as

(A5)

j 2k ‚‚‚ x j Nk]T ∈ RN×m, Tk ) [th1kht2k ‚‚‚ htNk]T where X ) [x j 1kx ∈ RN×p, and Kk ) diag(p(k|x1),p(k|x2), ‚‚‚,. p(k|xN)) ∈ RN×N is the diagonal matrix whose elements are the class posterior probabilities. Taking the derivatives of Qk, with respect to W°, and setting the derivative to zero gives

(A6)

new

)) dt + C (A2)

Regarding the class index k as a missing variable, as for a Gaussian mixture model, the corresponding expected complete-data log likelihood is given by N

L′(Θ|D) )

M

∑ ∑ p(k|xn) ln[Rkp(xn|k)]

(A7)

n)1k)1

Pk is calculated by maximizing L′(Θ|D) i.e., by taking its derivative, with respect to Pk, and setting it equal to zero at the local maxima:

SkGk-1Pk ) Pk

(A8)

n)1k)1 N K n)1k)1

Wk ) Pk(PkTPk)-1Hk

K

∑ ∑ ∫ p(t,k|xn,Θ

old

) ln(p(xn,t,k|Θ

new

)) dt

∑ ∑ p(k|xn,Θold)∫ p(t|xn,k,θold) × ln(p(xn,t|k,θ N

)

(A4)

Sk ) ∑np(k|xn)(xn - µk)(xn - µk)T/(RkN) is the posterior weighted covariance matrix. Then, from eq A8, Sk-1Pk ) Gk-1Pk ) PkHk-1. By inserting Tk ) XPkHk-1 into eq A6 and letting SkPk ) PkHk, eq A6 becomes

N

E(L ) )

p(k|xn)|x j nk - Wohtnk|2 ∑ n)1

Wk ) XTKkTk(TkTKkTk)-1

where C ) -∑n∑k∫ p(t,k|xn,Θold) ln(p(t,k|xn,Θold)) dt is a constant term that is not associated with a new model parameter vector Θnew. The second inequality comes from Jensen’s inequality: f(∑iwixi) g ∑iwif(xi) where f(‚‚‚) is an arbitrary convex function and wi is a weighting. The first term on the right-hand side (RHS) is the expectation of the complete-data log likelihood, ln[p(xn,t,k|Θnew)], with respect to the joint distribution of the hidden variables p(t,k|xn,Θold), which can be rewritten as c

Qk )

Qk ) |(X - TkWkoT)TKk(X - TkWkoT)|

old

ln(p(xn,t,k|Θ

{th1k,th2k,...,thNk}, an estimate of Wo is sought that minimizes Qk:

new

new

)Rk

) dt

K

∑ ∑ p(k|xn,Θold){ln(Rknew) + n)1k)1

∫ p(t|xn,k,θold) ln(p(xn,t|k,θnew)) dt}

(A3)

Combining eqs A2 and A3 gives L(Θnew) g E(Lc) + C, which infers that E(Lc) + C is the lower bound of L(Θnew). In the EM algorithm, the expected completedata log likelihood E(Lc) is alternatively maximized, instead of L(Θnew). Appendix II. Reconstruction of a MLPCA Mixture Model To obtain the optimal reconstruction of sample x, a linear relationship between x and htk needs to be identified. Consider N samples {x1,x2,...,xN} and their associated score vectors for the kth principal component,

(A9)

Literature Cited (1) Kresta, J.; MacGregor, J. F.; Marlin, T. E. Multivariate Statistical Monitoring of Process Operating Performance. Can. J. Chem. Eng. 1991, 69, 35-47. (2) Wise, B. M.; Gallagher, N. B. The Process Chemometrics Approach to Process Monitoring and Fault Detection. J. Process Control 1996, 6, 329-348. (3) Kramer, M. A. Nonlinear Principal Component Analysis Using Autoassociative Neural Networks. AIChE J. 1991, 37, 233243. (4) Dong, D.; McAvoy, T. J. Nonlinear Principal Component AnalysissBased on Principal Curves and Neural Networks. Comput. Chem. Eng. 1996, 20, 65-78. (5) Tan, S.; Mavrovouniotis, M. L. Reducing Data Dimensionality through Optimizing Neural Networks Inputs. AIChE J. 1995, 41, 1471-1480. (6) Jia, F.; Martin, E. B.; Morris, A. J. Nonlinear Principal Component Analysis for Process Fault Detection. Comput. Chem. Eng. 1998, 22, S851-S854. (7) Hinton, G. E.; Dayan, P.; Revow, M. Modeling the Manifolds of Images of Handwritten Digits. IEEE Trans. Neural Networks 1997, 8 (1), 65-74. (8) Fletcher, N. M.; Martin, E. B.; Morris, A. J. Dynamic Approaches for Batch Process Performance Monitoring. European

Ind. Eng. Chem. Res., Vol. 44, No. 7, 2005 2327 Symposium on Computer Aided Process Engineering-12; Grievink, J., van Schijndel, J., Eds.; 35th European Symposium of the Working Party on Computer Aided Process Engineering: ESCAPE12, May 26-29, 2002, The Hague, The Netherlands; Elsevier: Amsterdam, 2002; pp 487-492. (9) Whiteley, J. R.; Davis, J. F.; Mehrotra, A.; Ahalt, S. C. Observations and Problems Applying ART2 for Dynamic Sensor Pattern Interpretation. IEEE Trans. Syst., Man, Cybernet., Part A 1996, 26 (4), 423-437. (10) Chen, J.; Liu, J. Mixture Principal Component Analysis Models for Process Monitoring. Ind. Eng. Chem. Res. 1999, 38, 1478-1488. (11) Choi, S. W.; Park, J. H.; Lee, I.-B. Process Monitoring Using a Gaussian Mixture Model via Principal Component Analysis and Discriminant Analysis. Comput. Chem. Eng. 2004, 28, 1377-1387. (12) Rengaswamy, R.; Venkatasurbramanian, V. A Fast Training Neural Network and Its Updation for Incipient Fault Detection and Diagnosis. Comput. Chem. Eng. 2000, 24, 431-437. (13) Tipping, M. E.; Bishop, C. M. Probabilistic Principal Component Analysis. J. R. Stat. Soc., Ser. B: Stat. Method. 1999a, 61 (3), 611-622. (14) Tipping, M. E.; Bishop, C. M. Mixtures of Probabilistic Principal Component Analysers. Neural Comput. 1999b, 11 (2), 443-482. (15) Hyva¨rinen, A.; Karhunen, J.; Oja, E. Independent Component Analysis; Wiley: New York, 2001. (16) Dempster, A. P.; Laird, N. M.; Robin, D. B. Maximum Likelihood from Incomplete Data via the EM Algorithm. J. R. Stat. Soc., Ser. B: Stat. Method. 1977, 39 (1), 1-38. (17) Bishop, C. M. Neural Networks for Pattern Recognition; Clarendon Press: Oxford, U.K., 1995. (18) Eastment, H. T.; Krzanowski, W. J. Cross-Validatory Choice of the Number of Components from a Principal Component Analysis. Technometrics 1982, 24, 73-77. (19) Bishop, C. M. Bayesian PCA. In 1998 Proceedings, NIPS; Neural Information Processing Systems Foundation: 1998, Vol. 11, pp 382-388.

(20) Meinicke, P.; Ritter, H. Resolution-Based Complexity Control for Gaussian Mixture Models. Technical Report, Faculty of Technology, University of Bielefeld, Germany, 1999. (Available via the Internet at http:// www.techfak.uni-bielefeld.de/gk/papers.) (21) Xu, L. Bayesian Ying-Yang Machine, Clustering and Number of Clusters. Pattern Recognit. Lett. 1997, 18, 1167-1178. (22) Tracy, N. D.; Young, J. C.; Mason, R. L. Multivariate Control Charts for Individual Observations. J. Quality Technol. 1992, 24 (2), 88-95. (23) Nomikos, P.; MacGregor, J. F. Multivariate SPC Charts for Monitoring Batch Processes. Technometrics 1995, 37, 41-59. (24) Martin, E. B.; Morris, A. J. Nonparametric Confidence Bounds for Process Performance Monitoring Charts. J. Process Control 1996, 6 (6), 349-358. (25) Qin, S. J.; Li, W. Detection and Identification of Faulty Sensors in Dynamic Processes. AIChE J. 2001, 47, 1581-1593. (26) van Sprang, E. N. M.; Ramaker, H.-J.; Westerhuis, J. A.; Gurden, S. P.; Smilde, A. K. Critical Evaluation of Approaches for On-Line Batch Process Monitoring. Chem. Eng. Sci. 2002, 57, 3979-3991. (27) Ku, W.; Storer, R. H.; Georgakis, C. Disturbance Detection and Isolation by Dynamic Principal Component Analysis. Chemom. Intell. Lab. Syst. 1995, 30, 179-196. (28) Yoon, S.; MacGregor, J. F. Fault Diagnosis with Multivariate Statistical Models Part I: Using Steady-State Fault Signatures. J. Process Control 2001, 11, 387-400. (29) Kano, M.; Hasebe, S.; Hashimoto, I.; Ohno, H. Evolution of Multivariate Statistical Process Control: Application of Independent Component Analysis and External Analysis. Comput. Chem. Eng. 2004, 28, 1157-1166.

Received for review September 20, 2004 Revised manuscript received December 12, 2004 Accepted January 6, 2005 IE049081O