Ensemble Kernel Principal Component Analysis for Improved

Article pubs.acs.org/IECR

Ensemble Kernel Principal Component Analysis for Improved Nonlinear Process Monitoring Nan Li* and Yupu Yang Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China ABSTRACT: Kernel principal component analysis (KPCA) has been widely applied to nonlinear process monitoring. Conventionally, a single Gaussian kernel function with the width parameter determined empirically is selected to build a single KPCA model. Obviously, it is very blind to determine only a single Gaussian kernel function only by experience, especially when the fault information is unavailable. If a poor Gaussian kernel function is selected unfortunately, the detection performance may be degraded greatly. Furthermore, a single kernel function usually cannot be most effective for all faults, i.e., different faults may need different width parameters to maximize their respective monitoring performance. To address these issues, we try to improve the KPCA-based process monitoring method by incorporating the ensemble learning approach with Bayesian inference strategy. As a result, the monitoring performance is not only more robust to the width parameter selection but also significantly enhanced. This is validated by two case studies, a simple nonlinear process and the Tennessee Eastman benchmark process.

1. INTRODUCTION The purpose of fault detection and diagnosis is to timely determine the occurrence of an abnormal event in a process and accurately identify its reason or source.1 Hence, process monitoring is quite essential for ensuring plant and staff safety and the yield and quality of products.2−4 As a typical data-based method, multivariate statistical process monitoring (MSPM) has been widely used for process monitoring in chemical processes.5−10 Among all kinds of MSPM methods, two classical methods, principal component analysis (PCA) and partial least-squares (PLS), are most representative and have been studied intensively and applied extensively.11−17 In addition, some other complementary MSPM methods, such as independent component analysis (ICA),18−20 canonical variate analysis (CVA),21,22 and locality preserving projections (LPP),23,24 etc., have also been successfully used to address some special process monitoring problems. PCA is a linear dimensionality reduction technique which can handle high dimensional and correlated data. It decomposes the original data space into two subspaces: the principal component subspace (PCS), which contains most variation in data, and the residual subspace (RS), which corresponds to the noise part in data. For process monitoring, two statistics, Hotelling’s T2 statistic25 and the squared prediction error (SPE),5 are used by PCA to detect changes in PCS and RS, respectively. However, the intrinsic linear assumption makes PCA perform poorly when dealing with some complex cases with nonlinear correlated variables.26−29 To overcome the shortcomings of PCA dealing with nonlinear correlated data, various nonlinear extensions of PCA were developed.30−36 However, most of these methods have to solve nonlinear optimization problems. Kernel principal component analysis (KPCA) is another powerful tool for conducting nonlinear dimension reduction and has been widely used for nonlinear process monitoring.29,37−39 The key idea of KPCA is first to map the input space into a high dimensional © 2014 American Chemical Society

feature space via an implicit nonlinear mapping and then to conduct linear PCA in the feature space. By kernel functions, the nonlinear mapping and optimization are avoided. This is the main advantage of KPCA over the other nonlinear PCA methods. Similar to the conventional linear PCA-based monitoring method, KPCA also utilizes T2 and SPE to monitor the running state. Among all kinds of kernel functions, the Gaussian kernel function, k(x,y) = exp[−(∥x−y∥2/c)], is commonly used. However, its performance is heavily affected by the width parameter c. To date, there is still no better criterion for determining this parameter. Usually, this parameter is determined empirically.40 Obviously, the fault detection performance was not taken into account by the conventional KPCA-based methods while determining the appropriate Gaussian kernel function.29,37−39 If the information on a specific fault is available, i.e., its fault data is recorded, it is more reasonable to select the width parameter that maximizes the detection performance for this fault. However, a single selected Gaussian kernel function is usually not the most effective for all faults, i.e., different faults may need different width parameters to maximize their respective fault detection performance. This phenomenon will be shown in the following experiments. Most commonly, the fault information is unavailable. Only the data under normal conditions is provided to construct the monitoring system. This is the main reason why most MSPM methods are unsupervised. In this situation, it is more difficult or even impossible to determine the appropriate width parameter with fault information. It also appears very blind to determine only a single kernel function only by experience. If a poor one is selected unfortunately, the monitoring performance may be degraded greatly. Received: Revised: Accepted: Published: 318

July 30, 2014 December 5, 2014 December 10, 2014 December 10, 2014 dx.doi.org/10.1021/ie503034j | Ind. Eng. Chem. Res. 2015, 54, 318−329

Industrial & Engineering Chemistry Research

Article

Ensemble learning is the process by which multiple models, such as classifiers or experts, are strategically generated and combined to solve a particular computational intelligence problem. It is primarily used to improve the performance of a model (classification, prediction, function approximation, etc.) or reduce the likelihood of an unfortunate selection of a poor one.41 To yield satisfactory results, high diversity among different base models is needed.42,43 The diversity can come from not only the training data but also the base models themselves.44 Inspired by the idea of ensemble learning, we find that the diversity of KPCA can be easily obtained by selecting different width parameters because different width parameters correspond to different feature spaces.30 In this paper, we try to use the ensemble learning approach to improve the nonlinear detection performance of KPCA. Differing from the conventional KPCA method, multiple base KPCA models with different width parameters are developed through the ensemble learning viewpoint. When these different KPCA models are used for process monitoring, different monitoring results could be generated. To combine different base monitoring results into a final ensemble result, the Bayesian inference strategy is introduced to transfer the conventional monitoring statistic value to a fault probability value. By doing this, an efficient combination strategy can be constructed. Compared to the conventional single-KPCA method, the nonlinear detection performance is improved by the new ensemble KPCA (EKPCA) method. The rest of this paper is organized as follows. In section 2, the conventional KPCA-based process monitoring method is briefly reviewed. Then, the details of the proposed EKPCA method are presented in section 3. To prove the effectiveness and superiority of the EKPCA method, two case studies, a simple nonlinear process and the Tennessee Eastman (TE) benchmark process, are conducted in section 4. Finally, section 5 presents the conclusions.

λ v = Σ v =

i=1

n

1 n

∑ ⟨ϕ̅ (x i), v⟩ϕ̅ (x i) i=1

(2)

n

v=

∑ αjϕ̅ (x j) (3)

j=1

By substituting eq 3 into eq 2 and multiplying both sides of eq 2 with ϕ̅ (xk)T, the following equation is obtained: n

λ ∑ αj⟨ϕ ̅ (x k), ϕ ̅ (x j)⟩ = j=1

1 n

n

n

∑ ∑ αj⟨ϕ̅ (x i), ϕ̅ (x j)⟩⟨ϕ̅ (xk), ϕ̅ (x i)⟩ i=1 j=1

(4)

Note that the eigenvalue problem in eq 4 involves only inner products in the feature space. A centered n × n kernel matrix K̅ can be defined such that K̅ ij = ⟨ϕ̅ (xi),ϕ̅ (xj)⟩. Therefore, the eigenvalue problem can be further transformed into λα =

1 K̅ α n

(5)

where α should be scaled as ∥α∥2 = 1/nλ to ensure the normality of v.29,37−39 The kth score of a new observation x can be obtained by projecting ϕ̅ (x) onto vk in  as follows: n

tk = ⟨vk , ϕ ̅ (x)⟩ =

∑ αik⟨ϕ̅ (x i), ϕ̅ (x)⟩ i=1

k = 1, ..., p

(6)

where p is the number of the retained kernel principal components. To solve the eigenvalue problem of eq 5 and to project from the input space into the KPCA space using eq 6, one can avoid performing the nonlinear mappings and computing both the inner products in the feature space by introducing a kernel function of the form k(x,y) = ⟨ϕ(x),ϕ(y)⟩.45 Among all types of kernel functions, the Gaussian kernel function, k(x,y) = exp[−(∥x−y∥2/c)], is the most popular.30,38−40 In KPCA-based process monitoring, the T2 and SPE statistics are constructed to monitor the running state.29,39 The T2 statistic is the sum of normalized squared scores

m ϕ(•)  f ⎯⎯⎯⎯⎯⎯⎯⎯⎯→

T 2 = tTΛ−1t

where ϕ(•) is an implicit nonlinear mapping function and f is the dimension of the feature space which is assumed to be a large value. The covariance matrix in the feature space  can be estimated as 1 n

n

∑ [ϕ̅ (x i)T v]ϕ̅ (x i) =

where λ and v denote the eigenvalue and eigenvector of the covariance matrix Σ  , respectively, and ⟨•,•⟩ denotes the inner product. This implies that the eigenvector v is spanned by training samples in the feature space. Thus, there exist coefficients αi, i = 1,...,n such that

2. KPCA-BASED PROCESS MONITORING KPCA was originally proposed by Schölkopf et al.45 In KPCA, the observations are first mapped to a high-dimensional feature space, and then the linear PCA is used to extract the nonlinear correlations between the variables.29,37−39 Let X = [x1, x2 , ..., x n] ∈ m × n denote the original training data set, where m is the number of process variables, n is the number of observations, and x i ∈ m denotes the ith m-dimensional observation. The feature space is constructed by an implicit nonlinear mapping

Σ =

1 n

where t is the p-dimensional score vector obtained by eq 6 and Λ−1 is a diagonal matrix consisting of the inverse of the retained eigenvalues. The upper control limit (UCL) of T2 can be estimated using the F-distribution46

n

∑ [ϕ(x i) − c][ϕ(x i) − c]T i=1

(7)

Tp2, n , α ∼ (1)

p(n − 1) Fp , n − p , α n−p

(8)

The SPE statistic is calculated by

where c is the sample mean in the feature space. For convenience, let ϕ̅ (xi) = ϕ(xi) − c denote the centered feature space sample. The kernel principal component can be obtained by solving the following eigenvalue problem:29,37−39

2

SPE = || ϕ ̅ (x) − ϕp̅ (x)|| =

n0

∑ ti i=1

319

p 2

−

∑ ti 2 i=1

(9)

dx.doi.org/10.1021/ie503034j | Ind. Eng. Chem. Res. 2015, 54, 318−329


Article

Figure 1. KPCA monitoring charts of fault 1: (a) c = 30 and (b) c = 300.

Fault 1: A step change of x1 by −0.4 is introduced from sample 101. Fault 2: A step change of x3 by 0.4 is introduced from sample 101. Two KPCA models with different width parameters, c = 30 and c = 300, are constructed separately for this simple process. The numbers of retained principal components determined by the average eigenvalue approach29 are both 3 for the two different models. The T2 and SPE charts of faults 1 and 2 are shown in Figures 1 and 2, respectively. As shown in Figure 1, the T2 statistics of c = 300 can detect the occurrence and propagation of fault 1 more effectively than c = 30. However, in Figure 2 we can find that the SPE statistics of c = 30 can detect the occurrence and propagation of fault 2 more effectively than c = 300. Thus, it can be seen that a single KPCA model is usually not most effective for all faults, i.e., different faults may need different width parameters to maximize their respective detection performance. Hence, for more effective process monitoring, it is more reasonable that one should select a specific Gaussian kernel function, i.e., a specific width parameter, to build a specific KPCA model according to the specific fault information. This requires knowing the fault information in advance. However, most commonly, the fault information is unavailable. Only the data under normal conditions is provided to construct the monitoring system. This is the main reason why most MSPM methods are unsupervised. In this situation, it is more difficult or even impossible to determine the appropriate width parameter without fault information. It also appears extremely blind to determine a single kernel function only by experience. If a poor one is selected unfortunately, the monitoring performance may be degraded greatly. 3.2. Model Development and the Online Monitoring Strategy. In conventional KPCA-based process monitoring, a single Gaussian kernel function k(x,y) = exp[−(∥x−y∥2/c)] was widely employed to build a single-KPCA model, where c is the width parameter. The width parameter was usually

where n0 is the number of nonzero eigenvalues among all the n eigenvalues. Its UCL can be estimated from its approximate distribution47 SPEα ∼ gχh2

(10)

where g is a weighting parameter included to account for the magnitude of SPE and h accounts for the degrees of freedom. If a and b are the estimated mean and variance of SPE, respectively, then g and h can be approximated by g = b/2a and h = 2a2/b.

3. EKPCA FOR IMPROVED NONLINEAR PROCESS MONITORING In this section, we first, through a simple motivation example, describe the effect of different width parameter selections on the fault detection performance and show that a single Gaussian kernel function-based KPCA cannot perform best for all faults. Then, in the next subsection, the details of the development of EKPCA and the online monitoring strategy are given. 3.1. Simple Motivation Example. To describe the effect of different width parameter selections on the fault detection performance, a simple process with three variables but only one factor, originally suggested by Dong et al.,31 is simulated here and analyzed using KPCA with different width parameters. This process is shown as follows: x1 = u + e1 (11) x 2 = u 2 − 3u + e 2

(12)

x3 = −u3 + 3u 2 + e3

(13)

where e1, e2, and e3 are independent noise variables with N(0,0.01) and u is sampled uniformly from [0.01,2]. Normal data comprising 100 samples is generated according to these equations. Two sets of test data comprising 300 samples each are also generated. The following two faults are applied separately during the generation of the test data sets: 320



Article

Figure 2. KPCA monitoring charts of fault 2: (a) c = 30 and (b) c = 300.

determined as c = rmσ2 empirically,40 where r is a constant, m the dimension of the input space, and σ2 the variance of data. If the data is normalized as σ2 = 1, c = rm. To overcome the shortcomings of the single KPCA model-based method mentioned in section 3.1, ensemble learning is applied to the KPCA model structure. Instead of selecting a single Gaussian kernel function, a series of Gaussian kernel functions with different width parameters are selected: k(i)(x , y) = exp( −|| x − y ||2 /ci) i−1

interpretation of the fault behavior will also be more straightforward. In the first step, the conventional monitoring results in each KPCA model should be obtained, which can be calculated as n (i)

t(ki) = ⟨v(ki), ϕ ̅ (x)⟩ =

j=1 (i)

k = 1, ..., p

(14)

λ α

1 = K̅ (i)α(i) n

(16)

T 2(i) = t(i)T(Λ(i))−1t(i)

2

where ci = 2 rmσ , and i = 1,...,ns denotes the indices of kernel functions. Then these kernel functions are used to construct different base KPCA models. Then we have (i) (i)

∑ αj(i)k⟨ϕ̅ (i)(x j), ϕ̅ (i)(x)⟩,

(i)

(i)

SPE = || ϕ ̅ (x) −

(17)

2 ϕ ̅ p((ii))(x)||

n0(i)

=

p(i)

∑

t(ji)2

−

j=1

∑ t(ji)2 j=1

(18)

where i = 1,..., ns and p(i) and n(i) 0 denote the number of retained principal components and the number of nonzero eigenvalues in the ith submodel, respectively. Then, the Bayesian method is used to turn the two conventional monitoring results into fault probabilities as

(15)

Following the KPCA-based monitoring approach given in section 2, two statistics, T2(i) (i = 1, ..., ns) and SPE2(i) (i = 1, ..., ns), can be constructed for each KPCA submodel and their UCLs (i) T2(i) limit (i = 1, ..., ns) and SPE limit (i = 1, ..., ns) can also be estimated. Before the proposed EKPCA method is used to monitor a new data sample, x, an important issue to be solved is how to combine the monitoring results from different submodels into a final decision. Because each submodel has its own kernel function and different monitoring behaviors can be obtained, it is quite difficult to combine them into a single monitoring result. Simply, we can solve this problem by the vote strategy; thus, a threshold number of submodels can be set up to control the fault behavior of the monitored data sample. However, it is not straightforward how many normal-violated submodels should be selected to determine the fault behavior. In this paper, we intend to turn the conventional monitoring statistic value into a fault probability in each submodel by the wellknown Bayesian method. Therefore, the combination of the results from different submodels will become much easier, and

P T(i2)(F |x)

=

(i) PSPE (F | x) =

P T(i2)(x|F ) P T(i2)(F ) P T(i2)(x)

(19)

(i) (i) PSPE (x|F ) PSPE (F ) (i) PSPE (x)

where the two probabilities culated as

(20)

P(i) T2 (x)

and

P(i) SPE(x)

can be cal-

P T(i2)(x) = P T(i2)(x|N ) P T(i2)(N ) + P T(i2)(x|F ) P T(i2)(F )

(21)

(i) (i) (i) (i) (i) PSPE (x) = PSPE (x|N ) PSPE (N ) + PSPE (x|F ) PSPE (F )

(22)

P(i) T2 (N),

P(i) T2 (F),

P(i) SPE(N),

P(i) SPE(F)

where and are the prior probabilities of the process being normal and faulty in the PCS and the RS of the ith KPCA submodel. When the significance 321



Article

Figure 3. Flowchart of EKPCA-based process monitoring.

Figure 5. EKPCA monitoring charts of fault 2.

Figure 4. EKPCA monitoring charts of fault 1.

ns

(i) level is selected as α, the values of P(i) T2 (N) and PSPE(N) can (i) be determined as 1 − α, while values of PT2 (F) and P(i) SPE(F) can be selected as α. To calculate the fault probabilities of the (i) new data sample P(i) T2 (F|x) and PSPE(F|x), we also need to (i) (i) calculate the likelihood terms PT2 (x|N), P(i) T2 (x|F), PSPE(x|N), (i) and PSPE(x|F). In this paper, these likelihood terms are defined as follows:

P T(i2)(x|N ) = e−T

ESPE =

i=1

(23)

(i) i) PSPE (x|N ) = exp(− SPE(i)/SPE(limit )

(24)

2(i) P T(i2)(x|F ) = exp( −Tlimit /T 2(i))

(25)

(i) i) PSPE (x|F ) = exp(− SPE(limit /SPE(i))

(26)

After all the submodel monitoring results have been turned into fault probabilities, we can easily combine them through the following weighted combination strategy: ET2 =

P T2(2i)(F |x) ns (j) i = 1 ∑ j = 1 P T 2(F |x)

2(i) PSPE (F |x) n

(j) ∑ j =s 1 PSPE (F |x)

(28)

It is noted that large weights have been put in those submodels having big fault probabilities. Because of the fact that the process would probably be faulty once an obvious fault behavior has been presented, it is reasonable to use the weighted combination strategy in this EKPCA framework. Hence, we can finally judge the process behavior by simply checking whether the two combination monitoring statistics have violated their confidence limits. Therefore, when ET2 < α and ESPE < α, the process should be considered as normal; otherwise, a fault alarm should be triggered. The scheme of the EKPCA-based process monitoring is outlined as follows: Offline modeling:

2(i)

2(i) / Tlimit

∑

(1) Acquire the training data set X ∈ m × n from the normal operating condition and normalize the data using the mean and standard deviation of each variable, where m is the number of variables and n is the sample number.

ns

∑

(27) 322



Article

Figure 6. Control structure of the TE process. Reprinted with permission from ref 48. Copyright 1993 Elsevier.

4.1. Case Study of the Simple Nonlinear Process. First, the simple nonlinear process discussed in the motivation example of section 3.1 is considered. When building the EKPCA model, a series of Gaussian kernel functions are selected as k(x,y) = exp[−(∥x−y∥/ci)], ci = 2i−1 × m, i = 1, ..., 10, the number of principal components is set to be 3 for each submodel, and the significance levels of the monitoring statistics are all selected as α = 0.01. The monitoring results of the faults 1 and 2 which have been analyzed in section 3.1 are plotted in Figures 4 and 5, respectively. For fault 1, it can be clearly seen from Figures 1 and 4 that compared to the results of a single KPCA model (c = 30 or c = 300), ET2 and ESPE are both more sensitive to the fault occurrence and propagation. For fault 2, the comparison results from Figures 2 and 5 show that although ET2 does not perform significantly better than T2 of c = 30 or c = 300, ESPE performs much better than SPEs of c = 30 and c = 300. To compare the results quantitatively, the fault detection rates of these two faults are tabulated in Table 1. From

(2) Select a series of Gaussian kernel functions with different width parameters k(i)(x,y) = exp[−(∥x−y∥2/ci)], where ci = 2i−1rmσ2, and i = 1,..., ns denotes the indices of kernel functions. (3) For each kernel function, construct a KPCA model by solving the eigenvalue problem, λ(i)α(i) = (1/n)K̅ (i)α(i), (i) (i) (i) and then normalize α(i) k such that ⟨αk ,αk ⟩ = 1/nλk . (4) Using each submodel separately, extract the nonlinear components of each training data sample x via t(i) k = (i) n (i)k (i) (i) (i) ⟨v(i) k ,ϕ̅ (x)⟩ = ∑j = 1αj ⟨ϕ̅ (xj),ϕ̅ (x)⟩, k = 1, ..., p . 2 (5) Calculate the monitoring statistics (T and SPE) of the normal operating data in each submodel. (6) Determine the control limits of the T2 and SPE statistics for each submodel based on a specified significance level α. Online monitoring: (1) Normalize the new sample data x with the mean and standard deviation values of the training data. (2) Using each submodel separately, extract the nonlinear (i) (i) components of the new sample via t(i) k = ⟨vk ,ϕ̅ (x)⟩ = (i) (i) (i) n (i)k ∑j = 1αj ⟨ϕ̅ (xj),ϕ̅ (x)⟩, k = 1, ..., p . (3) Calculate the monitoring statistics (T2 and SPE) of the new sample in each submodel. (4) Compute the fault probability of x in each submodel by eqs 19−26. (5) Calculate the final two indices ET2 and ESPE through the weighted combination strategy given in eqs 27 and 28. (6) Monitor whether ET2 or ESPE exceeds its control limit α. To show the monitoring flow more intuitively, the steps of EKPCA for process monitoring are illustrated in Figure 3.

Table 1. Fault Detection Rates of Two Faults in the Simple Nonlinear Process fault detection rates (%) KPCA (c = 30)

KPCA (c = 300)

EKPCA

fault index

T2

SPE

T2

SPE

ET2

ESPE

1 2

9.00 2.00

54.50 44.50

50.00 6.50

47.00 8.50

77.00 6.00

78.50 65.00

the results presented in Table 1, one can find that the shortcomings of a single selected Gaussian kernel function are almost avoided, i.e., for both fault 1 and fault 2, EKPCA can produce better or comparable monitoring results at the same time. One may find that for fault 2, the fault detection rate of ET2 is a little lower than that of T2 of c = 300. By reexamining the experimental result of each value of i = 1, ..., 10, we find that when i = 1, 2, 3, 4, 5, 6, 7, and 10, all the fault detection rates of T2 are lower than that of c = 300. It is obvious that for most of the submodels, the detection performance of T2 is not better

4. CASE STUDIES In this section, the proposed EKPCA-based monitoring method is applied to the simple nonlinear process involved in section 3.1 and the TE benchmark process to evaluate the method’s performance. 323



Article

Figure 7. Monitoring charts of the normal conditions: (a) KPCA (c = 5m), (b) KPCA (c = 10m), and (c) EKPCA.

than that of c = 300. As a result, when the results of all the submodels are combined into the final result by weighted average, the final result is not as good as that of T2 for fault 2. 4.2. Case Study of the TE Benchmark Process. The Tennessee Eastman benchmark process developed by Downs and Vogel48 has been widely used to evaluate process monitoring methods. This process consists of five main unit operations: a reactor, a condenser, a separator, a stripper, and a recycle compressor. The control structure of the TE process is shown in Figure 6. There are four gaseous reactants (A, C, D, and E), two liquid products (G and H), a byproduct (F), and an insert (B). The gaseous reactants are fed to the reactor where the liquid products are produced. There are 22 continuous process variables and 19 noncontinuous process variables which are sampled by 3 composition analyzers and 12 manipulated variables in TE process. In this study, all 22 continuous variables and 11 manipulated variables, except for the agitation speed of the reactor’s stirrer, are selected for process monitoring. The details of these 33 variables are listed in Table 2. The TE simulator can generate 21 different types of faults, as listed in Table 3. To generate the closed loop simulated process data for each fault, the plant-wide control scheme recommended by Lyman and Georgakis49 was implemented. The training data set contains 500 samples which are collected under the normal conditions, and the testing data set for each fault contains 960 samples which are collected under each abnormal condition. Each fault is not introduced until the 161th sample. The sampling interval of all these data is 3 min. Cho et al.37 and Lee et al.29 used a single-KPCA model with width parameter empirically determined as c = 5m and c = 10m, respectively, for process monitoring. In this case study, these

Table 2. Monitoring Variables in the TE Process no.

no.

description

A feed (stream 1) D feed (stream 2) E feed (stream 3) total feed (stream 4)

18 19 20 21

5 recycle flow (stream 8)

22

stripper temperature stripper streamflow compressor work reactor cooling water outlet temp condenser cooling water outlet temp D feed flow (stream 2) E feed flow (stream 3) A feed flow (stream 1) total feed flow (stream 4) compressor recycle valve purge valve separator pot liquid flow (stream 10) stripper liquid product flow stripper steam valve reactor cooling water flow condenser cooling water flow

1 2 3 4

description

6 7 8 9 10 11 12

reactor feed rate (stream 6) reactor pressure reactor level reactor temperature purge rate separator temperature separator level

23 24 25 26 27 28 29

13 14 15 16 17

separator pressure separator under flow (stream 10) stripper level stripper pressure stripper under flow (stream 11)

30 31 32 33

two KPCA models are used for comparative experiments, and a series of Gaussian kernel functions with ci = 2i−1 5m, i = 1, ..., 11 are used to build the EKPCA model. It is obvious that c = 5m and c = 10m are two submodels of EKPCA. In each submodel of EKPCA and these two comparative models, 30 principal components are retained to calculate the T2 statistics, and 40 principal components are used to calculate the SPE statistics; the significance level is selected to be 0.01. First, the feasibility of the proposed method is evaluated through monitoring the normal testing data set. The 324



Article

Table 3. Process Faults of the TE Process fault index IDV(1) IDV(2) IDV(3) IDV(4) IDV(5) IDV(6) IDV(7) IDV(8) IDV(9) IDV(10) IDV(11) IDV(12) IDV(13) IDV(14) IDV(15) IDV(16) IDV(17) IDV(18) IDV(19) IDV(20) IDV(21)

description

Table 5. Fault Detection Rates of Three Models for All 21 Faults

type

A/C feed ratio, B composition constant (stream 4) B composition, A/C ratio constant (stream 4) D feed temperature (stream 2) reactor cooling water inlet temperature condenser cooling water inlet temperature A feed loss (stream 1) C header pressure loss-reduced availability (steam 4) A, B, C feed composition (stream 4) D feed temperature (stream 2) C feed temperature (stream 4) reactor cooling water inlet temperature condenser cooing water inlet temperature reaction kinetics reactor cooling water valve condenser cooling water valve unknown unknown unknown unknown unknown The valve of stream 4 was fixed at the steadystate position.

fault detection rates (%)

step KPCA (c = 5m)

step step step step step step random variation random variation random variation random variation random variation slow drift sticking sticking

KPCA (c = 10m)

EKPCA

fault index

T2

SPE

T2

SPE

ET2

ESPE

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

99.75 98.50 10.00 100 33.12 100 100 99.38 8.87 52.25 74.75 99.62 95.25 100 16.13 37.62 95.37 90.13 8.13 61.88 44.37

0 0.25 1.25 0.13 3.50 0 2.38 3.75 2.13 21.00 2.62 10.50 3.13 0 4.37 15.75 0.75 0 2.38 6.63 0.63

99.88 98.62 17.75 100 36.75 100 100 99.38 14.87 58.75 82.00 99.75 95.37 100 24.50 49.38 96.63 91.00 49.00 70.87 51.50

98.38 98.00 3.00 3.13 23.25 99.00 54.63 90.13 2.62 36.13 20.50 95.25 92.63 80.25 6.12 22.50 66.25 88.12 1.50 32.25 20.87

100 98.75 18.25 100 38.00 100 100 99.38 14.37 88.62 84.00 99.75 95.50 100 25.37 94.13 96.88 91.25 91.87 83.00 56.87

99.25 98.25 6.50 27.63 100 100 100 98.25 5.87 73.38 48.50 99.62 94.25 99.88 13.88 78.75 83.50 89.38 47.38 73.12 32.37

Table 4. False Alarm Rates of Three Models Table 6. Detection Delays of Three Models for All 21 Faults

false alarm rates KPCA (c = 5m) 2

T

0.0262

SPE 0.0037

KPCA (c = 10m) 2

T

0.0587

SPE 0.0050

detection delays

EKPCA ET

2

0.0575

KPCA (c = 5m)

ESPE 0.0175

monitoring results of KPCA with c = 5m, KPCA with 10m, and EKPCA are shown in Figure 7 and Table 4. It can be seen that although the false alarm rates of EKPCA are slightly higher, they are still acceptable. Next, all the 21 faults are tested by KPCA (5m), KPCA (10m), and EKCA. The fault detection rates and the detection delays are tabulated in Tables 5 and 6, respectively. It can be seen that for almost all faults, EPCA can detect their occurrence and propagation more quickly and effectively. Especially for faults 5, 10, 16, and 19, not only are the detection delays reduced, but also are the fault detection rates significantly improved. Fault 5 is a step change in the condenser cooling water inlet temperature. As it occurs, the flow rate from the condenser outlet to the gas−liquid separator increases, leading to the increase of the gas−liquid temperature and the separator cooling water outlet temperature. The closed control loop can compensate these changes and make the separator temperature return to near the set point. About 32 variables have similar transient processes. The monitoring charts of KPCA (c = 5m), KPCA (c = 10m), and EKPCA are shown in Figure 8. It can be seen that ET2 produces results comparable with those of T2 statistics of two single-KPCA models, while ESPE performs much better than the two comparative SPE statistics. During the whole period of the fault condition, ESPE can almost always indicate the existence of fault 5. Fault 10 is a random change in the temperature of stream 4 (C feed). The monitoring results for this fault are shown in Figure 9. We can find that compared to the results of KPCA (c = 5m) and KPCA (c = 10m), EKPCA performs much better

2

KPCA (c = 10m) 2

EKPCA 2

fault index

T

SPE

T

SPE

ET

ESPE

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

3 12 10 1 1 1 1 1 1 15 6 3 38 1 92 32 22 61 2 79 251

800 17 15 316 57 800 268 229 6 38 181 22 149 800 261 31 61 800 6 79 257

2 12 4 1 1 1 1 1 1 8 6 3 38 1 86 24 20 18 1 40 251

13 17 15 61 2 9 1 31 1 22 7 8 50 2 92 31 31 95 117 79 421

1 9 5 1 1 1 1 1 1 6 6 3 26 1 92 1 2 15 2 63 40

7 15 15 1 1 1 1 1 6 16 7 3 47 2 92 10 26 61 2 70 257

than two single comparative KPCA models. In particular, many faulty samples during the interval between 400 and 600 fail to be detected by the two T2 statistics and many faulty samples during the interval between 600 and 800 fail to be detected by the two SPE statistics. However, ET2 and ESPE can detect almost all the faulty samples during these two intervals, respectively. 325



Article

Figure 8. Monitoring charts of fault 5: (a) KPCA (c = 5m), (b) KPCA (c = 10m), and (c) EKPCA.


Faults 16 and 19 are two types of unknown faults. The monitoring results for these two faults are shown in Figures 10

and 11, respectively. From Figure 10, one can clearly see that both ET2 and ESPE are much more sensitive than the T2 and 326



Article



SPE statistics of the two comparative single KPCA models. Similarly, for fault 19, it can be seen from Figure 11 that EKPCA

produces much more superior results than the two comparative single KPCA models. 327



Article

(12) Kruger, U.; Dimitriadis, G. Diagnosis of process faults in chemical systems using a local partial least squares approach. AIChE J. 2008, 54 (10), 2581−2596. (13) Ku, W.; Storer, R. H.; Georgakis, C. Disturbance detection and isolation by dynamic principal component analysis. Chemom. Intell. Lab. Syst. 1995, 30 (1), 179−196. (14) Muradore, R.; Fiorini, P. A PLS-based statistical approach for fault detection and isolation of robotic manipulators. IRE Trans. Ind. Electron. 2012, 59 (8), 3167−3175. (15) Jiang, Q.; Yan, X.; Zhao, W. Fault detection and diagnosis in chemical processes using sensitive principal component analysis. Ind. Eng. Chem. Res. 2013, 52 (4), 1635−1644. (16) MacGregor, J. F.; Jaeckle, C.; Kiparissides, C.; Koutoudi, M. Process monitoring and diagnosis by multiblock PLS methods. AIChE J. 1994, 40 (5), 826−838. (17) Jiang, Q.; Yan, X. Just-in-time reorganized PCA integrated with SVDD for chemical process monitoring. AIChE J. 2014, 60 (3), 949− 965. (18) Lee, J.-M.; Yoo, C.; Lee, I.-B. Statistical process monitoring with independent component analysis. J. Process Control 2004, 14 (5), 467− 485. (19) Lee, J. M.; Qin, S. J.; Lee, I. B. Fault detection and diagnosis based on modified independent component analysis. AIChE J. 2006, 52 (10), 3501−3514. (20) Zhang, Y.; Zhang, Y. Fault detection of non-Gaussian processes based on modified independent component analysis. Chem. Eng. Sci. 2010, 65 (16), 4630−4639. (21) Russell, E. L.; Chiang, L. H.; Braatz, R. D. Fault detection in industrial processes using canonical variate analysis and dynamic principal component analysis. Chemom. Intell. Lab. Syst. 2000, 51 (1), 81−93. (22) Odiowei, P.; Cao, Y. State-space independent component analysis for nonlinear dynamic process monitoring. Chemom. Intell. Lab. Syst. 2010, 103 (1), 59−65. (23) Hu, K.; Yuan, J. Multivariate statistical process control based on multiway locality preserving projections. J. Process Control 2008, 18 (7), 797−807. (24) Shao, J.-D.; Rong, G.; Lee, J. M. Generalized orthogonal locality preserving projections for nonlinear fault detection and diagnosis. Chemom. Intell. Lab. Syst. 2009, 96 (1), 75−83. (25) MacGregor, J. F.; Kourti, T. Statistical process control of multivariate processes. Control Eng. Prac. 1995, 3 (3), 403−414. (26) Cui, P.; Li, J.; Wang, G. Improved kernel principal component analysis for fault detection. Expert Systems with Applications 2008, 34 (2), 1210−1219. (27) Ge, Z.; Yang, C.; Song, Z. Improved kernel PCA-based monitoring approach for nonlinear processes. Chem. Eng. Sci. 2009, 64 (9), 2245−2255. (28) Kramer, M. A. Nonlinear principal component analysis using autoassociative neural networks. AIChE J. 1991, 37 (2), 233−243. (29) Lee, J.-M.; Yoo, C.; Choi, S. W.; Vanrolleghem, P. A.; Lee, I.-B. Nonlinear process monitoring using kernel principal component analysis. Chem. Eng. Sci. 2004, 59 (1), 223−234. (30) Kramer, M. A. Autoassociative neural networks. Comput. Chem. Eng. 1992, 16 (4), 313−328. (31) Dong, D.; McAvoy, T. J. Nonlinear principal component analysisbased on principal curves and neural networks. Comput. Chem. Eng. 1996, 20 (1), 65−78. (32) Hiden, H. G.; Willis, M. J.; Tham, M. T.; Montague, G. A. Nonlinear principal components analysis using genetic programming. Comput. Chem. Eng. 1999, 23 (3), 413−425. (33) Cheng, C.; Chiu, M.-S. Nonlinear process monitoring using JITL-PCA. Chemom. Intell. Lab. Syst. 2005, 76 (1), 1−13. (34) Zhao, S.; Xu, Y. Multivariate statistical process monitoring using robust nonlinear principal component analysis. Tsinghua Sci. Technol. 2005, 10 (5), 582−586. (35) Maulud, A.; Wang, D.; Romagnoli, J. A. A multi-scale orthogonal nonlinear strategy for multi-variate statistical process monitoring. J. Process Control 2006, 16 (7), 671−683.

5. CONCLUSIONS To overcome the shortcomings of the conventional single KPCA model-based nonlinear process monitoring, a new EKPCA-based method incorporating ensemble learning approach and Bayesian inference strategy has been proposed. The EKPCA method first builds multiple base KPCA models with different Gaussian kernel functions using the ensemble learning approach and then calculates the final fault probabilities by combining the results of each KPCA submodel using the Bayesian inference strategy. The experimental results of two case studies have shown that, compared to the single empirically determined KPCA model, EKPCA not only is more robust to the width parameter selection but also improves the monitoring performance significantly. This is consistent with the properties of the ensemble learning approach. In this paper, we focus on only the application of ensemble learning in kernel-based nonlinear fault detection. In future work, we think it is valuable to apply the ensemble learning approach to kernelbased nonlinear fault diagnosis, such as fault identification, fault classification, etc.

■

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Notes

The authors declare no competing financial interest.

■

ACKNOWLEDGMENTS This work was supported in part by the National Natural Science Foundation of China (61273161) and the National High Technology Research and Development Program of China (“863” Program) (2011AA040605).

■

REFERENCES

(1) Yoon, S.; MacGregor, J. F. Fault diagnosis with multivariate statistical models part I: using steady state fault signatures. J. Process Control 2001, 11 (4), 387−400. (2) Zhang, M.; Ge, Z.; Song, Z.; Fu, R. Global−local structure analysis model and its application for fault detection and identification. Ind. Eng. Chem. Res. 2011, 50 (11), 6837−6848. (3) Yu, J. Local and global principal component analysis for process monitoring. J. Process Control 2012, 22 (7), 1358−1373. (4) Ge, Z.; Song, Z.; Gao, F. Review of recent research on data-based process monitoring. Ind. Eng. Chem. Res. 2013, 52 (10), 3543−3562. (5) Kresta, J. V.; MacGregor, J. F.; Marlin, T. E. Multivariate statistical monitoring of process operating performance. Can. J. Chem. Eng. 1991, 69 (1), 35−47. (6) Qin, S. J. Survey on data-driven industrial process monitoring and diagnosis. Annu. Rev. Control 2012, 36 (2), 220−234. (7) Yin, S.; Ding, S. X.; Haghani, A.; Hao, H.; Zhang, P. A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman process. J. Process Control 2012, 22 (9), 1567−1581. (8) Chiang, L. H.; Braatz, R. D.; Russell, E. L. Fault Detection and Diagnosis in Industrial Systems; Springer: New York, 2001. (9) Russell, E.; Chiang, L. H.; Braatz, R. D. Data-Driven Methods for Fault Detection and Diagnosis in Chemical Processes; Springer: New York, 2000. (10) Joe Qin, S. Statistical process monitoring: Basics and beyond. J. Chemom. 2003, 17 (8−9), 480−502. (11) Chiang, L. H.; Russell, E. L.; Braatz, R. D. Fault diagnosis in chemical processes using Fisher discriminant analysis, discriminant partial least squares, and principal component analysis. Chemom. Intell. Lab. Syst. 2000, 50 (2), 243−252. 328



Article

(36) Kruger, U.; Antory, D.; Hahn, J.; Irwin, G. W.; McCullough, G. Introduction of a nonlinearity measure for principal component models. Comput. Chem. Eng. 2005, 29 (11), 2355−2362. (37) Cho, J.-H.; Lee, J.-M.; Wook Choi, S.; Lee, D.; Lee, I.-B. Fault identification for process monitoring using kernel principal component analysis. Chem. Eng. Sci. 2005, 60 (1), 279−288. (38) Choi, S. W.; Lee, C.; Lee, J.-M.; Park, J. H.; Lee, I.-B. Fault detection and identification of nonlinear processes based on kernel PCA. Chemom. Intell. Lab. Syst. 2005, 75 (1), 55−67. (39) Choi, S. W.; Lee, I.-B. Nonlinear dynamic process monitoring based on dynamic kernel PCA. Chem. Eng. Sci. 2004, 59 (24), 5897− 5908. (40) Mika, S.; Schölkopf, B.; Smola, A. J.; Müller, K.-R.; Scholz, M.; Rätsch, G. Kernel PCA and De-Noising in Feature Spaces. Advances in Neural Information Processing Systems 11: Proceedings of the 1998 Conference; Kearns, M. J., Solla, S. A., Cohn, D. A., Eds.; MIT Press: Cambridge, MA, 1999; pp 536−542. (41) Polikar, R. Ensemble learning. Scholarpedia 2009, 4, 2776 DOI: 10.4249/scholarpedia.2776. (42) Kuncheva, L. I.; Whitaker, C. J. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 2003, 51 (2), 181−207. (43) Krogh, P. S. A. Learning with ensembles: How over-fitting can be useful.Advances in Neural Information Processing Systems 8: Proceedings of the 1995 Conference; Touretzky, D. S., Mozer, M. C., Hasselmo, M. E., Eds.; MIT Press: Cambridge, MA, 1996; pp 190− 196. (44) Harrington, P. Machine Learning in Action; Manning Publications Co.: Shelter Island, NY, 2012. (45) Schölkopf, B.; Smola, A.; Müller, K.-R. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 1998, 10 (5), 1299−1319. (46) Tracy, N.; Young, J.; Mason, R. Multivariate control charts for individual observations. J. Quality Technol. 1992, 24, (2). (47) Nomikos, P.; MacGregor, J. F. Multivariate SPC charts for monitoring batch processes. Technometrics 1995, 37 (1), 41−59. (48) Downs, J. J.; Vogel, E. F. A plant-wide industrial process control problem. Comput. Chem. Eng. 1993, 17 (3), 245−255. (49) Lyman, P. R.; Georgakis, C. Plant-wide control of the Tennessee Eastman problem. Comput. Chem. Eng. 1995, 19 (3), 321−331.

329


Ensemble Kernel Principal Component Analysis for Improved

Recommend Documents