Kernel Generalization of PPCA for Nonlinear Probabilistic Monitoring

Sep 30, 2010 - (PPCA)-based monitoring method is generalized through the kernel ... probabilistic monitoring approach, the monitoring performance of ...
3 downloads 3 Views 625KB Size
11832

Ind. Eng. Chem. Res. 2010, 49, 11832–11836

Kernel Generalization of PPCA for Nonlinear Probabilistic Monitoring Zhiqiang Ge* and Zhihuan Song State Key Laboratory of Industrial Control Technology, Institute of Industrial Process Control, Zhejiang UniVersity, Hangzhou 310027, Zhejiang, China

For probabilistic monitoring of nonlinear processes, the traditional probabilistic principal component analysis (PPCA)-based monitoring method is generalized through the kernel method. Thus, a probabilistic kernel PCA method is proposed for process monitoring in the present paper. Different from the traditional PPCA method, the new approach can successfully extract the nonlinear relationship between process variables. On the basis of the proposed nonlinear probabilistic monitoring approach, the monitoring performance of nonlinear processes can be effectively improved. To demonstrate the feasibility and efficiency of the proposed method, a case study on the Tennessee Eastman (TE) benchmark process is provided. 1. Introduction Data-based process monitoring has become very popular in recent years, especially the one based on multivariate statistical process control (MSPC), such as principal component analysis (PCA), partial least squares (PLS), etc.1-7 By projecting the data into a lower-dimensional space, these MSPC methods can accurately characterize the operation state of the monitored process systems. Different statistics such as T2 and SPE have been constructed for monitoring purpose. In practice, it is wellknown that almost all process variables are contaminated by random noise. Therefore, process variables are inherently collected as random variables. However, most traditional MSPC methods are carried out through a deterministic manner, thus the monitoring models are not probabilistic. In our opinion, all statistical inference and monitoring decisions should be made for these random process variables. In other words, on the basis of the random manner of the process variable, probabilistic models should be developed for process monitoring. Recently, the traditional PCA method has been extended to its probabilistic counterpart, probabilistic principal component analysis (PPCA), and used for process monitoring.8 Under the structure of the PPCA model, the noise information of the process data can be well-modeled. However, the traditional PCA method cannot efficiently incorporate the noise information into the model. Compared to the PCA model, the PPCA model is more precise and practical to represent the process data. Besides, for statistical monitoring purpose, it is also more significant and understandable that the monitoring decision is made based on the statistical inference and probabilistic interpretations. Compared to the traditional method, more satisfactory performance has been obtained by the PPCA approach.8 However, it is noticed that the developed PPCA-based monitoring method is limited in linear processes. For nonlinear processes, a nonlinear PPCA-based monitoring method should be developed. As an efficient method for nonlinear system modeling, the kernel method has been widely used for process modeling and monitoring in the last decades.9-12 However, to the best of our knowledge, most kernel-based methods that are used in the process system engineering field are not probabilistic, and thus they cannot directly be employed for monitoring in the present paper. Although several probabilistic kernel PCA models have already been proposed for data clustering, face recognition, * To whom all correspondence should be addressed. E-mail: [email protected] or [email protected].

image processing, etc.,13-15 its utilization for process monitoring has rarely been reported. This paper intends to generalize the traditional PPCA-based monitoring method to its nonlinear form through the kernel method. Therefore, a nonlinear probabilistic monitoring method can be developed. The rest of this paper is organized as follows. First, the traditional PPCA method is briefly introduced in section 2, which is followed by the detailed description of the kernel generalization of PPCA and its corresponding monitoring approach in the next section. A benchmark case study of the TE process is then carried out in section 4, and some conclusions are made in the last section. 2. Probabilistic Principal Component Analysis As an extension of PCA, PPCA was first proposed by Tipping and Bishop,16 which is developed by the following generative model x ) Pt + e

(1)

where x ∈ Rm represents the process variable, t ∈ Rk is the latent variable, P ∈ Rm×k specifies the relationship between the latent space and the original data space, e ∈ Rm is the noise term, which is always assumed to follow the Gaussian distribution with zero mean and variance β-1I, and thus p(e) ) N(e|0,β-1I). Therefore, the conditional distribution of x can be calculated as p(x|t, P, β) ) N(x|Pt, β-1I)

(2)

In the PPCA model, the prior distribution of the latent variable t is assumed to be a Gaussian distribution with zero mean and one variancep(t) ) N(t|0,I). For a given data set X ) (x1,x2, · · · ,xn), P and β can be determined by maximizing the following likelihood function n

L(P, β) ) ln

∏ p(x |P, β) i

(3)

i)1

3. Kernel Generalization of PPCA for Process Monitoring In the case of PPCA, an expectation-maximization (EM) algorithm has been used for parameter learning. The EM algorithm iterates two steps, expectation (E-step) and maximization (M-step) until convergence, and a local maxima of the data

10.1021/ie100852s  2010 American Chemical Society Published on Web 09/30/2010

Ind. Eng. Chem. Res., Vol. 49, No. 22, 2010

likelihood can be guaranteed. In the E-step, the model parameters are fixed for calculation of the expected distribution of the latent variable, which is given as follows E(t) ) (PTP + β-1I)-1PTx -1

-1

-1

E(tt ) ) β (P P + β I) T

T

(4)

+ E(t)E (t) T

n

n

∑ x t )( ∑ t t )

T -1 i i

T i i

i)1

(6)

i)1

n

β-1 )

∑ {x x

T i i

- 2ET(ti |xi)PˆTxi + Tr[E(titTi |xi)PˆTPˆ]}/(mn)

i)1

(7) Iterating E-step and M-step yields in convergence, the optimal parameter values can be obtained. For kernel PPCA model learning, a nonlinear EM algorithm should be developed.15 This can be done by introducing a kernel trick in the E-step and the M-step, respectively. First, we define the sum of covariance as C ) ∑i n) 1titTi . Noticing that X ) [x1,x2, · · · ,xn]T and T ) [t1,t2, · · · ,tn]T, the parameter matrix Pˆ can be rewritten as15 P ) XTTC-1

(8)

To calculate the expected distribution of the latent variable in the E-step, we should appropriately determined PTP and PTx, which are given as PTP ) C-1TTXXTTC-1

(9)

PTx ) C-1TTXx

(10)

If we introduce the kernel trick K ) XXT with each entry, the inner-product of data points of corresponding row and column as Kij ) 〈φ(xi),φ(xj)〉 ) φT(xi)φ(xj), eqs 9 and 10 can be calculated as PTP ) C-1TTKTC-1

(11)

PTx ) C-1TTk

(12)

where k ) Xx is a kernel vector. Depending on results of eqs 11 and 12, the expected distribution of the latent variable can be obtained as T

E(t) ) (C-1TTKTC-1 + β-1I)-1C-1TTk

the introduction of the kernel trick, the explicit nonlinear relationship between the original variable and the latent variable is avoided. However, the following data recentering in the feature space is necessary for the nonlinear EM algorithm ¯ ) K - 1n1Tn K/n - K1n1Tn /n + 1n1Tn K1n1Tn /n2 K

(17)

k¯ ) k - 1n1Tn k/n - K1n /n + 1n1Tn K1n /n2

(18)

(5)

where the process variable x is assumed to be mean-centered. In the M-step, the complete log-likelihood of the process data (eq 3) is to be maximized with respect to the model parameters, which can be done by setting the partial derivative with respect to each parameter to be zero. Therefore, the following updates can be obtained Pˆ ) (

11833

(13)

E(ttT) ) β-1(C-1TTKTC-1 + β-1I)-1 + E(t)ET(t)

(14) The matrix T and C can then be recalculated as Tˆ ) KTC-1(C-1TTKTC-1 + β-1I)-1

(15)

ˆ ) nβ-1(C-1TTKTC-1 + β-1I)-1 + TTT C

(16)

It can be found that the expected distribution of the latent variable is directly obtained from the process variable. With

where 1n ) [1, · · · ,1]T is a one-column vector of length n. In the PPCA model, the dimensionality of the latent variable is much smaller than that of the process variable. However, in the kernel PPCA model, the dimensionality of the new feature space may be very large or even infinite. Therefore, the estimation of the noise variance may be tended to be zero or the precise value β may be infinity. This may lead to numerical instability when calculating the inverse of the matrix. Hence, the value β is recommended to be treated as a tuning parameter, rather as an optimizing parameter. In summary, the EM algorithm of the kernel PPCA method can be iteratively calculated through eqs 15 and 16, until the following optimization log-likelihood function is converged15 L(X) ) -

{

n ln |C-1TTKTC-1 + β-1I| 2

}

β tr[KTC-1(C-1TTKTC-1 + β-1I)-1C-1TTK] + cont n

(19)

where tr( · ) is an operator for trace value calculation, and cont represents the constant term in the log-likelihood function. 4. Benchmark Case Study of TE Process TE process is a benchmark simulation process, which has been widely used for algorithm testing and evaluation in the past decades.17-21 The flowchart of this process is given in Figure 1. This process consists of five major unit operations: a reactor, a condenser, a compressor, a separator, and a stripper. There are four reactants A, C, D, and E and one inert B which are fed to the reactor, and in which the products G and H are formed, but at the same time, a byproduct F is also produced. In this process, 41 process variables and 12 manipulated variables are measured, including 19 component variables. Detailed description of this process can be found in ref 17. In the present paper, 16 continuous process variables are selected for monitoring purposes, and are tabulated in Table 1. For process monitoring purpose, a set of 21 programmed faults can be simulated in the TE process. Detailed descriptions of these 21 process faults are tabulated in Table 2. As seen in this table, these 21 process faults include 7 step faults, 5 random faults, 3 sticking and slow change fault, and 6 unknown process faults. On the basis of previous studies, it is very difficult to detect fault 3, fault 9, fault 15, and fault 19 by traditional monitoring methods. Therefore, this case study is mainly focused on these four process faults. However, before developing the nonlinear model for process monitoring, a nonlinearity test is necessary. In the present paper, the method proposed by Kruger et al.22 can be employed for nonlinearity measurement of the process data. To develop the proposed KPPCA monitoring model, we have collected 500 data samples for training. For comparison, the traditional PPCA and KPCA models are also constructed. The dimensionality of the feature space of KPPCA is simply selected as 30. The widely used Gaussian kernel is used in the kernel method, with its kernel parameter selected as 2000. To be fair, the component numbers of KPPCA, KPCA and

11834

Ind. Eng. Chem. Res., Vol. 49, No. 22, 2010

Figure 1. TE benchmark process. Table 1. Monitoring Variables in the TE Process no. measured variables no. 1 2 3 4 5 6 7 8

A feed D feed E feed A and C feed recycle flow reactor feed rate reactor temperature purge rate

9 10 11 12 13 14 15 16

measured variables product separator temperature product separator pressure product separator underflow stripper pressure stripper temperature stripper steam flow reactor cooling water outlet temperature separator cooling water outlet temperature

Table 2. Process Disturbances fault no.

process variable

1

A/C feed ratio, B composition constant (stream 4) B composition, A/C ratio constant (stream 4) D feed temperature (stream 2) reactor cooling water inlet temperature condenser cooling water inlet temperature A feed loss (stream 1) C header pressure loss-reduced availability (stream 4) A, B, C feed composition (stream 4) D feed temperature (stream 2) C feed temperature (stream 4) reactor cooling water inlet temperature condenser cooling water inlet temperature reaction kinetics reactor cooling water valve condenser cooling water valve unknown unknown unknown unknown unknown valve position constant (stream 4)

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

type step step step step step step step random random random random random

variation variation variation variation variation

slow drift sticking sticking unknown unknown unknown unknown unknown constant position

PPCA models are all selected as 5. The confidence limits of all monitoring statistics are set as 99%. First, fault 3 is tested by these three monitoring models. Unfortunately, this fault

can hardly be detected by all methods. Similarly, fault 9 is also difficult to be detected. In contrast, fault 15 and fault 19 can be successfully detected by the proposed method. Therefore, the monitoring results of these two faults are detailed as follows. Fault 15 a small fault, which can cause sticking problem of the condenser cooling water valve. According to Chiang (2001), traditional statistics can hardly detect this fault. Monitoring results of this fault by KPPCA, PPCA and KPCA are shown together in Figure 2. As can be seen, compared to the traditional PCA method which cannot detect this fault, based on the probabilistic monitoring approach, the performance has been improved by PPCA, because fault 15 can be detected by the SPE statistic after sample 750. In comparison, the monitoring performance has been further improved by the proposed KPPCA based method, which can be seen in Figure 2a. Although the T2 statistic of PPCA can hardly detect the fault, it can be successfully detected by the T2 statistic of KPPCA. Compared to the SPE monitoring result of PPCA, the fault can be detected much earlier by the SPE statistic of KPPCA. To be clear, the improved performance is highlighted by several ellipses in Figure 2a. Although KPCA can also successfully detect this fault, the T2 monitoring performance of the proposed method is much better than that of the KPCA-based monitoring result. Fault 19 is an unknown process fault, which is also very difficult to detect by traditional MSPC methods. The monitoring result of KPPCA for this fault is given in Figure 3a, through which one can find that both of the T2 and SPE statistic can successfully detect the fault. However, only the SPE statistic of PPCA and KPCA can successfully detect this fault, results of which are shown in panels b and c in Figure 3, respectively. Therefore, on the basis of these results, the KPPCA method is more sensitive for fault detection than the PPCA and the KPCA-based monitoring methods. It can be inferred that the monitoring performance could be greatly improved if the process has strong nonlinearities among different process variables. On the basis of the case study of

Ind. Eng. Chem. Res., Vol. 49, No. 22, 2010

11835

Figure 2. Monitoring results of fault 15: (a) KPPCA; (b) PPCA; (c) KPCA.

Figure 3. Monitoring results of fault 19: (a) KPPCA; (b) PPCA; (c) KPCA.

TE process, it can be found that the monitoring performance can be enhanced by the proposed nonlinear probabilistic method. 5. Conclusions In the present paper, the traditional PPCA-based probabilistic monitoring method has been generalized by the kernel

method, which is more efficient for nonlinear process monitoring. With the introduction of the kernel trick, the linear probabilistic model can be easily extended to its nonlinear form, upon which the nonlinear probabilistic monitoring approach can be developed. The simulation case study of TE process demonstrates the feasibility and efficiency of the proposed method.

11836

Ind. Eng. Chem. Res., Vol. 49, No. 22, 2010

Acknowledgment This work was supported in part by the National Natural Science Foundation of China (61004134, 60974056), and China Postdoctoral Science Foundation (20090461370). Note Added after ASAP Publication: This paper was published on the Web on September 30, 2010 with three extra references. References 5, 8, and 10 were deleted and the corrected version was reposted on October 12, 2010. Literature Cited (1) Wang, X.; Kruger, U.; Lennox, B. Recursive partial least squares algorithms for monitoring complex industrial processes. Control Eng. Pract. 2003, 11, 613–632. (2) Qin, S. J. Statistical process monitoring: basics and beyond. J. Chemom. 2003, 17, 480–502. (3) Simoglou, A.; Georgieva, P.; Martin, E. B.; Morris, A. J.; Feyo de Azevedo, S. On-line monitoring of a sugar crystallization process. Comput. Chem. Eng. 2005, 29, 1411–1422. (4) Wang, X.; Kruger, U.; Irwin, G. W. Process monitoring approach using fast moving window PCA. Ind. Eng. Chem. Res. 2005, 44, 5691– 5702. (5) Kruger, U.; Dimitriadis, G. Diagnosis of process faults in chemical systems using a local partial least squares approach. AIChE J. 2008, 54, 2581–2596. (6) AlGhazzawi, A.; Lennox, B. Monitoring a Complex Refining Process Using Multivariate Statistics. Control Eng. Pract. 2008, 16, 294–307. (7) Chen, T.; Sun, Y. Probabilistic contribution analysis for statistical process monitoring: A missing variable approach. Cont. Eng. Pract. 2009, 17, 469–477. (8) Kim, D.; Lee, I. B. Process monitoring based on probabilistic PCA. Chemom. Intell. Lab. Syst. 2003, 67, 109–123. (9) Vapnik, V. N. The Nature of Statistical Learning Theory; Springer: New York, 1995. (10) Lee, J. M.; Yoo, C. K.; Choi, S. K.; Vanrolleghem, P. A.; Lee, I. B. Nonlinear process monitoring using kernel principal component analysis. Chem. Eng. Sci. 2004, 59, 223–234.

(11) Ge, Z. Q.; Yang, C. J.; Song, Z. H. Improved kernel-PCA based monitoring approach for nonlinear processes. Chem. Eng. Sci. 2009, 64, 2245–2255. (12) Zhang, Y. W. Enhanced statistical analysis of nonlinear processes using KPCA, KICA and SVM. Chem. Eng. Sci. 2009, 64, 801–811. (13) Lawrence, N. D. Probabilistic non-linear principal component analysis with Gaussian process latent variable models. J. Machine Learn. Res. 2005, 6, 1783–1816. (14) Zhou, S. H. Probabilistic Analysis of Kernel Principal Components: Mixture Modeling, And Classification; Technical Report; University of Maryland: College Park, MD, 2003. (15) Yu, S. P. Advanced Probabilistic Models for Clustering and Projection. Ph.D dissertation, Ludwig-Maximilians Universita¨t Mu¨nchen, Munich, Germany, 2006. (16) Tipping, M. E.; Bishop, C. M. Probabilistic principal component analysis. J. R. Stat. Soc. 1999, 61, 611–622. (17) Chiang, L. H., Russell, E. L., Braatz, R. D. Fault Detection and Diagnosis in Industrial Systems; Springle-Verlag: London, 2001. (18) Kano, M.; Nagao, K.; Hasebe, H.; Hashimoto, I.; Ohno, H.; Strauss, R.; Bakshi, B. R. Comparison of multivariate statistical process monitoring methods with applications to the Eastman challenge problem. Comput. Chem. Eng. 2002, 26, 161–174. (19) Singhai, A.; Seborg, D. E. Evaluation of a pattern matching method for the Tennessee Eastman challenge process. J. Process Control 2006, 16, 601–613. (20) Ge, Z. Q.; Yang, C. J.; Song, Z. H.; Wang, H. Q. Robust Online Monitoring for Multimode Processes Based on Nonlinear External Analysis. Ind. Eng. Chem. Res. 2008, 47, 4775–4783. (21) Zhou, D. H.; Li, G.; Qin, S. J. Total projection to latent structures for process monitoring. AIChE J. 2010, 56, 168–178. (22) Kruger, U.; Antory, D.; Hahn, J.; Irwin, G. W.; McCullough, G. Introduction of a nonlinearity measure for principal component models. Comput. Chem. Eng. 2005, 29, 2355–2362.

ReceiVed for reView April 10, 2010 ReVised manuscript receiVed July 14, 2010 Accepted September 14, 2010 IE100852S