A Nonlinear Probabilistic Method for Process Monitoring - American

Dec 29, 2009 - To improve monitoring performance, the traditional principal component ... probabilistic monitoring method and the kernel PCA model are...
0 downloads 3 Views 4MB Size
1770

Ind. Eng. Chem. Res. 2010, 49, 1770–1778

A Nonlinear Probabilistic Method for Process Monitoring Zhiqiang Ge* and Zhihuan Song State Key Laboratory of Industrial Control Technology, Institute of Industrial Process Control, Zhejiang UniVersity, Hangzhou 310027, Zhejiang, China

To improve monitoring performance, the traditional principal component analysis (PCA) based process monitoring approach has been extended to its probabilistic counterpart. However, its ability is limited in linear processes. This paper proposes a nonlinear probabilistic method for monitoring nonlinear processes, which is based on generative topographic mapping (GTM). Similar to traditional methods, the monitoring statistic and its corresponding fault diagnosis approach have both been developed. Two case studies are provided to evaluate the feasibility and efficiency of the proposed method. 1. Introduction

2. Probabilistic PCA and Kernel PCA Models

Data-based process monitoring methods have become very popular in recent years, especially the multivariate statistical process control (MSPC) based methods, such as principal component analysis (PCA) and partial least squares (PLS).1-5 To enhance the monitoring performance, some of these traditional methods have recently been extended to probabilistic counterparts such as probabilistic principal component analysis (PPCA) and factor analysis (FA).6-9 However, those developed probabilistic MSPC methods are limited in monitoring linear processes. For nonlinear process monitoring, several kinds of methods have been developed in past decades, such as principal curve, neural network, kernel PCA, and others.10-15 However, to the best of our knowledge, few probabilistic nonlinear methods have been developed for monitoring nonlinear processes up to now. Generative topographic mapping (GTM) is a form of a nonlinear latent variable model, which was originally developed by Bishop et al.16 As an alternative to the widely used selforganizing map (SOM) method, GTM overcomes most of the significant limitations of SOM, and provides a smooth mapping between the latent space and the data space. Compared to the linear probabilistic methods, GTM is also computationally tractable. This is because GTM is based on a constrained mixture of Gaussians, the parameters of which can be efficiently optimized through the well-known expectation-maximization (EM) algorithm. Although the GTM method has been used in many different areas,17-20 its application for process monitoring has not been reported. The aim of the present paper is to introduce the GTM method into the monitoring area for construction of a nonlinear probabilistic monitoring framework. As in the linear probabilistic monitoring model, a similar monitoring statistic can be constructed in the nonlinear case. Besides, the corresponding fault diagnosis strategy can also be developed. The rest of this paper is organized as follows. First, the preliminaries of the linear probabilistic monitoring method and the kernel PCA model are given in section 2, which is followed by the nonlinear probabilistic model based monitoring approach in section 3. Then two case studies are carried out in section 4. Finally, some conclusions are made in section 5.

2.1. Probabilistic PCA Model. The formulation of a linear probabilistic model is constructed in a generative manner, which is given as6

* Towhomcorrespondenceshouldbeaddressed.E-mail:[email protected]. edu.cn or [email protected].

x ) Pt + e

(1)

where x ∈ Rm is the original process variable vector, t ∈ Rk is the latent variable vector, P ∈ Rm×k is the parameter matrix, and e ∈ Rm is the noise vector, which is always assumed to follow the Gaussian distribution with zero mean and variance Σ. Therefore, the conditional distribution of x can be calculated as

{

p(x|t, P, Σ) ) (2πΣ)-m/2 exp -

|x - Pt| 2 2Σ

}

(2)

In the PPCA and FA probabilistic models, the prior distribution of the latent variable t is assumed to be a Gaussian distribution with zero mean and one variance. Then the distribution of x can be obtained as p(x|P, Σ) )

∫ p(x|t, P, Σ) p(t) dt

(3)

For a given data set X ) (x1, x2, ..., xn) of n data samples, the parameter matrix P and the noise variance Σ can be determined by maximizing the following log likelihood n

L(P, Σ) ) ln

∏ p(x |P, Σ)

(4)

i

i)1

After the probabilistic model has been determined, different statistics can be constructed for monitoring purposes.6 2.2. Kernel PCA Model. The KPCA algorithm was originally proposed by Scho¨lkopf.21 The feature space is constructed by using a nonlinear mapping: Φ(·)

Rm f Fh where Φ( · ) is a nonlinear mapping function and h is the dimension in feature space, which is assumed to be a large value. Similar to PCA, the covariance matrix in the feature space F can be calculated as ΣF )

1 n

n

∑ [Φ(x ) - c] [Φ(x ) - c] i

i)1

10.1021/ie900858v  2010 American Chemical Society Published on Web 12/29/2009

i

T

(5)

Ind. Eng. Chem. Res., Vol. 49, No. 4, 2010

1771

Figure 1. Monitoring results of normal process data set for the simple numerical example: (a) GTM; (b) PPCA; (c) KPCA.

j (xi) where c is the sample mean in the feature space. Denote Φ ) Φ(xi) - c as the centered feature space sample; the KPCA model can be obtained through the decomposition of the following eigenvalue problem 1 n

λv ) ΣFv )

n

∑ [Φ¯ (x ) v]Φ¯ (x ) T

i

(6)

i

i)1

n

∑ r Φ¯ (x )

v)

i

(7)

i

i)1

where λ and v denote eigenvalue and eigenvector of the covariance matrix ΣF, respectively. ri is the coefficient of each j ij ) training data sample. By introducing a kernel matrix K j (xi) · Φ j (xj), the eigenvalue problem can be transferred as Φ follows13 1¯ r λr ) K n

(8)

The score vector of the qth observation is calculated as n

¯ (xq)] ) tq,k ) [vk·Φ

n

∑ r [Φ¯ (x ) Φ¯ (x )] ) ∑ r K¯ k i

q

k i

i

i)1

qi

i)1

(9) Then the traditional T2 and SPE statistics can be developed for process monitoring. Table 1. ARL Values in the Simple Example method

normal fault 1 fault 2

GTM

PPCA_T2

PPCA_SPE

KPCA_T2

KPCA_SPE

98.4 18.8 54.8

98.8 99.2 100

100 100 76.8

100 100 100

99.0 41.6 77.6

Figure 2. (a) Assignment results and (b) posterior probability values for the simple numerical example.

1772

Ind. Eng. Chem. Res., Vol. 49, No. 4, 2010

Figure 3. Monitoring results of fault 1: (a) GTM; (b) PPCA; (c) KPCA.

3. Process Monitoring Based on Nonlinear Probabilistic Approach In this section, we intend to extend the linear probabilistic monitoring model to the nonlinear case. In the formulation of GTM, the prior distribution of the latent variable t is assumed to be a sum of delta functions centered on the nodes of a regular grid, which is given as16 p(t) )

1 K

K

∑ δ(t - t ) i

(10)

variable x corresponds to a constrained Gaussian mixture model, which can be obtained as16 p(x|Q, Σ) )

f(ti, Q) ) Qφ(ti)

K

∑ p(x|t , Q, Σ)

(12)

i

i)1

Similarly, the parameter matrix and the noise variance Σ can be determined by maximizing the log likelihood function given as follows

i)1

n

where the number K and locations of the sample points ti in the latent space could be selected previously. According to Bishop et al.,16 these two issues are not critical to the GTM method. Then each point ti is mapped to a corresponding point in the data space through a nonlinear function f( ti, Q), which follows the center of a Gaussian density function. The nonlinear function f(ti, Q) can be selected as a generalized linear regression model of the form

1 K

L(Q, Σ) )

{

K

∑ ln K1 ∑ p(x |t , Q, Σ) r)1

r i

i)1

}

(13)

As mentioned above, the solution of this problem can be obtained through the EM algorithm, which can be formulated into the E-step and the M-step, respectively. In the E-step, we use the old parameter values Qold and Σ old to evaluate the posterior probabilities of each Gaussian component for every data sample xr as

(11)

where Q ∈ R m×L is the mixing matrix, the elements of φ(ti) consist of L fixed basis functions φj(ti) (j ) 1, 2, ..., L). Particularly, the basis function can be chosen as a radially symmetric Gaussian form. Therefore, the distribution of process

PP(i, r) ) p(ti|xr, Qold, Σold) )

p(xr |ti, Qold, Σold) K

∑ p(x |t , Q r j

old, Σold)

j)1

(14)

Ind. Eng. Chem. Res., Vol. 49, No. 4, 2010

1773

Figure 4. Monitoring results of fault 2: (a) GTM; (b) PPCA; (c) KPCA.

The expectation of the complete-data log likelihood can then be calculated as follows n

E{L(Qold, Σold)} )

According to Bishop et al.,16 by conveniently writing the vectors in matrix notation, maximizing eq 15 with respect to Qold can be simplified into the form as

K

∑ ∑ PP(i, r) ln{p(x |t , Q r i

old, Σold)}

r)1 i)1

(15)

Figure 5. Fault diagnosis results: (a) fault 1; (b) fault 2.

T ΦTGoldΦQnew ) ΦTPPX

(16)

1774

Ind. Eng. Chem. Res., Vol. 49, No. 4, 2010

Figure 6. TE benchmark process. Table 2. Monitoring Variables in the TE Process no. measured variables no. 1 2 3 4 5 6 7 8

A feed D feed E feed A and C feeds recycle flow reactor feed rate reactor temperature purge rate

9 10 11 12 13 14 15 16

product separator temperature product separator pressure product separator underflow stripper pressure stripper temperature stripper steam flow reactor cooling water outlet temperature separator cooling water outlet temperature

where X is the data matrix, Φ is a K × L matrix with elements Φij ) φj(ti), and G is a K × K diagonal matrix with elements n PP(i,r). Hence, the new parameter matrix can be Gii ) ∑r)1 determined as T Qnew

-1

) X PP Φ(Φ GoldΦ) T

T

T

(17)

Similarly, maximizing eq 15 with respect to Σ leads to the following formula n

Σnew

PP(i, r)|Qnewφ(ti) - xr |

K

∑ p(x

new |tj, Q, Σ)

j)1

(19) Then the estimated value of xnew by ti can be calculated as xˆnew,i ) f(ti, Q) ) Qφ(ti)

(18)

i)1

Updating the E-step and the M-step repeatedly until convergence, the parameters {Q,Σ} can be finally determined. After the nonlinear probabilistic model has been constructed, we are in a position to develop the statistics for monitoring purposes. For a new data sample xnew, we first calculate the posterior probabilities of this data sample in each Gaussian component as

(20)

Finally, the estimated value of xnew can be obtained by summarizing all of the K estimated values xˆnew,i (i ) 1, 2, ..., K) K

xˆnew )

∑ pp(i, new)xˆ

K

new,i

i)1

)

∑ pp(i, new)Qφ(t ) i

i)1

(21) To monitor the process behavior, the squared predicted error (SPE) statistic can be constructed as SPEnew ) (xnew - xˆnew)T(xnew - xˆnew)

K

∑∑

1 ) nm r)1

p(xnew |ti, Q, Σ)

pp(i, new) ) p(ti |xnew, Q, Σ) )

measured variables

(22)

Theoretically, the confidence limit of the SPE statistic can be determined by a χ2 distributed function as1 SPElim ∼ g·χh2

(23)

where g and h are defined as g·h ) mean(SPEtr)2g2h ) var(SPEtr)

(24)

Ind. Eng. Chem. Res., Vol. 49, No. 4, 2010

1775

Figure 7. Monitoring results of normal process data set for the TE benchmark process: (a) GTM; (b) PPCA; (c) KPCA.

Figure 8. (a) Assignment results and (b) posterior probability values for the TE benchmark process.

where SPEtr is the SPE values of the train data samples, mean(SPEtr) represents the mean value of SPEtr, and var(SPEtr) is the variance of SPEtr. Compared to the linear probabilistic monitoring methods such as PPCA, the T2 statistic is not presented in the new nonlinear probabilistic monitoring framework. This is because GTM is

typically designed to embed the data in one or two dimensions, and the embedded latent variables are restricted within a specific region. Any data sample (whatever normal or faulty) will be mapped in the same latent space. However, this restriction will not impact the performance of GTM, since the definition of the latent space is not important in this method.16

1776

Ind. Eng. Chem. Res., Vol. 49, No. 4, 2010

Figure 9. Monitoring results of fault 4: (a) GTM; (b) PPCA; (c) KPCA.

x1 ) t + e1

Table 3. ARL Values in the TE Process

x2 ) t2 - 3t + e2

methods

normal fault 4 fault 15

2

GTM

PPCA_T

PPCA_SPE

KPCA_T

97.8 92.8 86.5

100 100 97.4

98.8 98.1 95.0

100 99.0 97.8

2

KPCA_SPE 96.5 93.8 86.6

When some fault has been successfully detected by the proposed method, the contribution plot method can be employed for fault diagnosis. Supposing the faulty data sample is represented as xfault, the contribution of each variable to the SPE statistic can be calculated as cont(i) ) (xfault,i - xˆfault,i)2

i ) 1, 2, ...,m

(25)

4. Results and Discussion In this section, we demonstrate the process monitoring performance of the proposed method by two case studies: a simple numerical example and the TE benchmark process. 4.1. Simple Numerical Example. A simple simulation case study was suggested by Dong and McAvoy,10 and was also used by others.12,13 This simple example contains three variables which are driven by the same factor

x3 ) -t3 + 3t2 + e3

(26)

where t ∈ [0.01,2], and e1, e2, and e3 are independent noises of zero means and variance 0.01. For model development, 500 samples were generated by the above equation as the training data set. In order to compare the proposed method to other monitoring approaches, the PPCA and KPCA models are also constructed. In the GTM method, 20 points of latent variables have been selected between -1 and 1. The number of basis functions is chosen as L ) 5. Two principal components are selected in PPCA and KPCA models. The confidence limits of all monitoring statistics are determined as 99%. To test the feasibility of the proposed method, a normal data set with 500 data samples has been generated. The monitoring results of GTM, PPCA, and KPCA for this normal data set are given in Figure 1, through which we can find that all methods can monitor the normal process very well. Precisely, the mean values of average run length (ARL) of 100 simulations are tabulated in Table 1. To extract more information from the GTM method, the posterior probabilities of each data sample in different Gaussian components and the assignment results (which correspond to the maximum value of posterior probability) are shown together in Figure 2. It can be seen that the

Ind. Eng. Chem. Res., Vol. 49, No. 4, 2010

1777

Figure 10. Monitoring results of fault 15: (a) GTM; (b) PPCA; (c) KPCA.

monitored data samples have been assigned to different Gaussian components. Then two fault test data sets with 500 data samples are generated as follows: (1) Fault 1: A step bias of x2 by -0.5 was introduced starting from sample 251. (2) Fault 2: A ramp change of x1 by adding 0.002(k - 250) began with the 251th sample. The monitoring results of the first fault by the four methods are given in Figure 3, in which one can easily find that the new method shows the best performance. In contrast, the PPCA method cannot detect this fault, while the KPCA can detect it discontinuously. For the second fault, the superiority of the proposed method can also be found, which can be seen in Figure 4. Compared to other methods, the fault can be detected earlier by the new method. Similarly, the ARL values of the two faults are given in Table 1, in which we can also determine that the best monitoring result has been obtained by the GTM method. The diagnosis results of both two faults are presented in Figure 5, which are mean values of all 250 faulty data samples in these two fault cases. It can be clearly determined that the root causes of these two faults are variable 2 and variable 1, respectively. 4.2. TE Benchmark Process. As a well-known benchmark process, the TE process has been widely used for algorithm testing and evaluation in past decades. This process consists of 41 measured variables and 12 manipulated variables, and a set of 21 programmed faults can be simulated in this process. A

flow chart of the TE process can be described in Figure 6. A more detailed description of this process can be found in ref 22. In the present paper, 16 continuous process variables are selected for monitoring purposes, which are tabulated in Table 2. For GTM model development, 500 training data samples are used. Similarly, both the PPCA and KPCA models are also constructed. In the GTM method, 20 points of latent variables have been selected between -1 and 1. The number of basis functions is chosen as L ) 5. A total of five principal components are selected in PPCA and KPCA model structures. The confidence limits of all monitoring statistics are determined as 99%. First, a normal data set which contains 800 samples are tested by the three different monitoring methods; all results are shown together in Figure 7. It can be find that all of the GTM, PPCA, and KPCA methods are feasible for process monitoring, since they all work well under the normal process condition. Precisely, the ARL values of the three methods for the normal data set are very close to the correct value, which can be seen in the first row of Table 3. Similarly, the posterior probability and the Gaussian component assignment result of each monitored data sample are given in Figure 8. To evaluate the monitoring performance of the proposed GTM method, two selected process faults are tested. The first one is fault 4, which is a step change fault that influences the reactor cooling water inlet temperature. The monitoring results of this fault by three different methods are shown in Figure 9.

1778

Ind. Eng. Chem. Res., Vol. 49, No. 4, 2010

The second fault is a very small process fault, which cannot be easily detected. Figure 10 shows the monitoring results of this fault by GTM, PPCA, and KPCA. The ARL values of these two different faults are tabulated together in Table 3, through which we can judge that the GTM method performs better than other two methods. 5. Conclusions In the present paper, a new nonlinear probabilistic monitoring method has been proposed which is based on GTM. Compared to the existing nonlinear and probabilistic monitoring methods, the proposed method has both nonlinear and probabilistic characteristics. Hence, it can represent the process information more correctly, and thus improve the monitoring performance of the process behavior. However, one disadvantage of the proposed method is that the information of the latent variable is lost, which is also very important for process monitoring purposes. How to incorporate the latent variable information into the monitoring model is one of our future works. Acknowledgment This work was supported by the National Natural Science Foundation of China (60774067), the National 863 High Technology Research and Development Program of China (2009AA04Z154), and the Natural Scinence Foundation of Zhejiang Province (Y1080871). Literature Cited (1) Nomikos, P.; MacGregor, J. F. Monitoring batch processes using multiway principal component analysis. AIChE J. 1994, 44, 1361–1375. (2) Chiang, L. H.; Russell, E. L.; Braatz, R. D. Fault detection and diagnosis in industrial systems; Springer-Verlag: London, 2001. (3) Qin, S. J. Statistical process monitoring: basics and beyond. J. Chemom. 2003, 17, 480–502. (4) Zhao, C. H.; Wang, F. L.; Mao, Z. H.; Lu, N. Y.; Jia, M. X. Quality prediction based on phase-specific average trajectory for batch processes. AIChE J. 2008, 54, 693–705. (5) Kruger, U.; Dimitriadis, G. Diagnosis of process faults in chemical systems using a local partial least squares approach. AIChE J. 2008, 54, 2581–2596. (6) Kim, D. S.; Lee, I. B. Process monitoring based on probabilistic PCA. Chem. Intell. Lab. Syst. 2003, 67, 109–123.

(7) Kim, D. S.; Yoo, C. K.; Kim, Y. I.; Jung, J. K.; Lee, I. B. Calibration, prediction and process monitoring model based on factor analysis for incomplete process data. J. Chem. Eng. Jpn. 2005, 38, 1025–1034. (8) Chen, T.; Sun, Y. Probabilistic contribution analysis for statistical process monitoring: A missing variable approach. Control Eng. Pract. 2009, 17, 469–477. (9) Ge, Z. Q.; Xie, L.; Song, Z. H. A novel statistical-based monitoring approach for complex multivariate processes. Ind. Eng. Chem. Res. 2009, 48, 4892–4898. (10) Dong, D.; McAvoy, T. J. Nonlinear principal component analysisbased on principal curves and neural networks. Comput. Chem. Eng. 1996, 20, 65–78. (11) Hiden, H. G.; Willis, M. J.; Tham, M. T.; Montague, G. A. Nonlinear principal component analysis using genetic programming. Chem. Intell. Lab. Syst. 1999, 23, 413–425. (12) Cho, J. H.; Lee, J. M.; Choi, S. W.; Lee, D. W.; Lee, I. B. Fault identification for process monitoring using kernel principal component analysis. Chem. Eng. Sci. 2005, 60, 279–288. (13) Choi, S. W.; Lee, C. K.; Lee, J. M.; Park, J. H.; Lee, I. B. Fault detection and identification of nonlinear processes based on kernel PCA. Chem. Intell. Lab. Syst. 2005, 75, 55–67. (14) Wang, X.; Kruger, U.; Irwin, G. W.; McCullough, G.; McDowell, N. Nonlinear PCA with the Local Approach for Diesel Engine Fault Detection and Diagnosis. IEEE Trans. Control Syst. Technol. 2007, 16, 122–129. (15) Ge, Z. Q.; Yang, C. J.; Song, Z. H. Improved kernel-PCA based monitoring approach for nonlinear processes. Chem. Eng. Sci. 2009, 64, 2245–2255. (16) Bishop, C. M.; Svensen, M.; Williams, C. K. I. The generative topographic mapping. Neural Comput. 1998, 10, 215–234. (17) Andrade, A. O.; Nasuto, S.; Kyberd, P.; Sweeney-Reed, C. M. Generative topographic mapping applied to clustering and visualization of motor unit action potentials. Biosystems 2005, 82, 273–284. (18) Vellido, A.; Lisboa, P. J. G.; Vicente, D. Robust analysis of MRS brain tumour data using t-GTM. Neurocomputing 2006, 69, 754–768. (19) Fyfe, C.; Barbakh, W.; Ooi, W. C.; Ko, H. Topological mappings for video and audio data. Int. J. Neural Syst. 2008, 18, 481–489. (20) Bose, I.; Chen, X. A method for extension of generative topographic mapping for fuzzy clustering. J. Am. Soc. Inf. Sci. Technol. 2009, 60, 363– 371. (21) Scho¨lkopf, B.; Smola, A. J.; Mu¨ller, K. Nonlinear component analysis as a kernel eignvalue problem. Neural Comput. 1998, 10, 1000– 1016. (22) Downs, J. J.; Vogel, E. F. A plant-wide industrial process control problem. Comput. Chem. Eng. 1993, 17, 245–255.

ReceiVed for reView May 25, 2009 ReVised manuscript receiVed October 9, 2009 Accepted December 7, 2009 IE900858V