Robust Online Monitoring Based on Spherical-Kernel Partial Least

Robust Online Monitoring Based on Spherical-Kernel Partial Least Squares for Nonlinear Processes with Contaminated Modeling Data. Yi Hu, Hehe Ma, and ...
0 downloads 3 Views 2MB Size
Article pubs.acs.org/IECR

Robust Online Monitoring Based on Spherical-Kernel Partial Least Squares for Nonlinear Processes with Contaminated Modeling Data Yi Hu, Hehe Ma, and Hongbo Shi* Key Laboratory of Advanced Control and Optimization for Chemical Processes, East China University of Science and Technology, Ministry of Education, Shanghai 200237, China ABSTRACT: KPLS is a very efficient technique for tackling complex nonlinear data sets by mapping an original input space into a high-dimensional feature space. However, KPLS may not function well when the model set is contaminated to a large extent by outliers. In this Article, a robust version of kernel partial least squares (KPLS) called spherical kernel partial least squares (SKPLS) is introduced for monitoring nonlinear processes. The key idea of SKPLS is to project all of the feature vectors in the feature space onto a unit sphere and then to perform KPLS on the sphered feature vectors. The effects of the outliers in the original data are eliminated or diminished because of the sphering. The robust monitoring statistics and robust control limits are derived for process monitoring purposes. The simulation results show that the proposed process monitoring strategy not only works well when the modeling data set does not contain outliers but also offers a satisfactory efficiency when the modeling data set is highly contaminated by outliers.

1. INTRODUCTION Fault detection and fault diagnosis in chemical and biological processes are very important for plant safety and quality consistency. During the past 2 decades, multivariate statistical process control (MSPC) has been successfully used for process monitoring in different industrial processes.1−9 Multivariate statistical methods based on principal component analysis (PCA)4−6 and partial least squares (PLS)7,8 are most widely applied to deal with high-dimensional and highly correlated data by performing dimension reduction on the data and constructing a lower-dimensional subspace. Unlike the PCA-based monitoring method, the PLS method utilizes not only the process variables but also the quality variables to construct the subspace. It is especially suited for strongly collinear data. However, for some complex physical and chemical systems that may exhibit significant nonlinear characteristic, linear approaches such as PLS are improper for describing the underlying data structure because of their linearity assumption. To solve the nonlinear problem of observer data, an efficient nonlinear PLS technique called kernel PLS (KPLS) has been developed by Rosipal et al.10,11 The main idea of this technique is that the original input data are nonlinearly transformed into a feature space of arbitrary dimensionality via nonlinear mapping and then a linear PLS model is created in the feature space.12 KPLS can explicitly characterize the nonlinear relationship among variables in high-dimensional feature spaces by the use of nonlinear kernel functions. On the basis of this merit, KPLS has shown better performance than linear PLS in regressing and feature extraction in nonlinear systems. Recently, there have been many applications of KPLS in the process monitoring and quality prediction domains.13−15 At the same time, various variants of KPLS have been presented for different data characteristics, including modified KPLS for non-Gaussian process data,16 multiblock KPLS for large-scale processes,17 multiscale KPLS for multiscale process data,18,19 and hierarchical KPLS for batch processes.20 © XXXX American Chemical Society

However, despite these extensions of the KPLS algorithm it still possess some shortcomings. One shortcoming is that it suffers from low efficiency in the presence of outliers. Generally, practical data can contain outliers and these outliers can distort the distribution of the underlying data. As we know, either in the conventional PLS or in the KPLS method, the method is usually based on the model assumptions such as normal distribution. Therefore, the principal components obtained from conventional methods will not describe the majority of the data well, which may lead to deceptive results. To resist these outliers in the modeling data, the robustness consideration is integrated into the multivariate statistical-process monitoring techniques. Romagnoli et al. proposed three robust monitoring strategies based on PCA,21 nonlinear PCA,22 and multiscale PCA.23 These approaches combine an outlier rejection step and a denoising step for the filtering of process signals corrupted with noise and outliers with a minimum loss of information. Ge et al.24 adopted a combined monitoring index to screen the outliers; that is, each new incoming observation will be tested for whether it is an outlier. If the new sample is an outlier, then it will be eliminated in the next recursive modeling step to maintain the robustness of the regression model. Although Jia et al.25 have presented a robust version of KPLS, this robust strategy is based on a reweighting method and has to tune additional parameters. In this Article, more attention is paid to the problem of outliers that strongly affect the KPLS model, and a novel robust improvement to KPLS is presented for industrial process monitoring. On the basis of the concept of spherical PCA proposed by Locantore et al.,26 we extend conventional KPLS to spherical KPLS (SKPLS). Spherical PCA is a very attractive method because of its good robust properties that make the Received: May 6, 2012 Revised: May 3, 2013 Accepted: June 12, 2013

A

dx.doi.org/10.1021/ie4008776 | Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Industrial & Engineering Chemistry Research

Article

number of retained latent variables). Then, the score matrix T can be computed as follows10,27

method valuable for modeling highly contaminated data. Moreover, the idea behind spherical PCA is simple, and the method is computationally fast. In the proposed method, the sphering can heavily reduce the effects of the outliers in the original data. Therefore, robust scores can be obtained by projecting the unsphered feature vectors on the extracted principal components. By considering the fact that the outliers will manifest themselves in score spaces, the robust estimation of the covariance matrix of the score matrix is obtained by the minimum covariance determinant (MCD) method so that monitoring statistics can be calculated accurately. Finally, the performance of the SKPLS approach for process monitoring is compared to the conventional KPLS approach through the simulations of a numerical example and the Tennessee Eastman (TE) process. The remainder of this Article is organized as follows. The concept of KPLS is introduced in section 2. In section 3, spherical KPLS is proposed. Next, a robust online monitoring strategy based on spherical KPLS is developed in section 4. In section 5, a numerical example and the TE process benchmark case study are given to illustrate the analysis results. Finally, some conclusions are presented in section 6.

−1

where R = Φ U(T KU) . Thus, for testing points score matrix Ttest can be calculated by T

T

nt {xtest i }i=1

the

T test = Φtest R = Φtest ΦTU (TTKU )−1 = K testU (TTKU )−1 (2)

where Φ is the matrix of the mapped testing points and K is the (nt × n) test matrix whose elements are Ki,jtest = K(xtest i,j , xj), nt n where {xtest i }i=1 and {xj}j=1 are the testing and training points, respectively. test

test

3. SPHERICAL-KERNEL PARTIAL LEAST SQUARES (SKPLS) In our work, a robust approach is developed by implementing the spherical PCA method of Locantore et al.26 within the KPLS framework. The core idea of spherical PCA is to project the data onto a sphere of unit radius around the spatial median and then perform PCA on the sphered data. By using the principal components obtained from the data projected onto a sphere, it is possible to project the original data onto new principal components to obtain robust scores and loadings.28 In this way, the influence of outliers can be bounded. Recently, Debruyne et al.29,30 incorporated kernels in the spherical-PCA algorithm and used the scores from spherical-kernel PCA for detecting points that are influential for ordinary kernel PCA. In the same way, we extend KPLS to spherical KPLS (SKPLS) and apply this new method in fault detection and process monitoring. 3.1. Spatial Median. In classical multivariate statistical methods, the data are usually centered around the mean in the first step. However, the mean is not a robust measure of the center. In this section, we propose to use the L1 M estimate of location, which is a multivariate extension of the univariate median that has been around for a long time.31 This location measure is also known as the spatial median. The spatial median is defined as follows

2. KERNEL PARTIAL LEAST SQUARES (KPLS) Partial least squares (PLS) is a wide class of multivariate regression algorithms for modeling the relations between input X and output Y by means of latent variables. Although PLS can work well on a set of observations that vary linearly, it does not show superiority when the variations are nonlinear because of its intrinsic linearity. To deal with nonlinear problems, the data can be mapped into a high-dimensional space, which is referred to as feature space  . To derive KPLS, we first assume a nonlinear transformation of the input variables {xi}in= 1 into feature space  , that is, mapping Φ: xi ∈ m → Φ(xi) ∈  , where it is assumed that Σin= 1Φ(xi) = 0 (i.e., mean centering in the high-dimensional space should be performed before applying KPLS) and Φ(·) is a nonlinear mapping function that projects the input vectors from the original space to  . Our goal is to construct a linear PLS model in the space  . Note that the dimensionality of the feature space is arbitrarily large and can even be infinite. Φ denotes an N × S matrix whose ith row is the vector Φ(xi) in the S-dimensional feature space  . By means of the introduction of the kernel trick, Φ(xi)TΦ(xj) = K(xi,xj), one can avoid both performing explicit nonlinear mapping and computing dot products in the feature space. We can see that ΦΦT represents the (n × n) kernel Gram matrix K of the cross dot products between all mapped input data points {Φ(xi)}ni=1 (i.e., [K]i,j = Ki,j = ⟨Φ(xi),Φ(xj)⟩). The steps of the KPLS algorithm are described in Table 1. According to the KPLS algorithm, the score matrix T of the high-dimensional data Φ is expressed as T = [t1, ..., tA], and the score matrix U of output Y is expressed as U = [u1, ..., uA] (A is the

n

μ ̂ = arg min ∑ || xi − μ || μ

i=1

(3)

where ∥·∥ is the Euclidean norm and μ̂ is the location estimator. The L1 M estimate of location can be regarded as a point in multidimensional data space, minimizing the sum of the Euclidean distances between this point and all points. For the computation of this center, the following simple iterative algorithm exists.31 With the given initial guess μ(0) ∈ m, we iteratively define n

(k)

μ

Table 1. KPLS Algorithm (1) (2) (3) (4) (5) (6) (7) (8)

(1)

T = ΦR

=

∑i = 1 ωixi n

∑i = 1 ωi

(4)

where

Set i = 1, K1 = K, and Y1 = Y; Initialize ui, set ui equal to any column of Yi; ti = Kiui, ti ← ti/∥ti∥; qi = YTi ti; If ti converges, then go to step 7, else return to step 3; ui = Yiqi, ui ← ui/∥ui∥; Ki+1 = Ki − titiTKi − KititiT + titiTKititiT, Yi+1 = Yi − titiTYi; Set i = i + 1 and return to step 2. Terminate if i > A.

ωi =

1 || xi − μ(k − 1) ||

(5)

Although other algorithms exist, we stick to the algorithm above because it allows for the extension to a kernelized version. 3.2. Spatial Median in the Feature Space. Assume again that the input {xi}ni=1 are mapped into high-dimensional feature space  and define the n × n kernel matrix as Ki,j = K(xi,xj). In the B

dx.doi.org/10.1021/ie4008776 | Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Industrial & Engineering Chemistry Research

Article

following, K.,j will stand for the jth column of this matrix. Similar to the algorithm in section 3.1, we want to find μ̂ ∈  such that

K i*, j = ⟨Φ∗(xi), Φ∗(xj)⟩

n

=

μ ̂ = arg min ∑ Φ(xi) − μ μ

(6)

i=1

Therefore, once the vector γ is computed we can easily obtain the sphered kernel matrix K*, which belongs to the sphered data based on the kernel matrix K of the original data. Then, we implement the eight steps of KPLS in Table 1 by replacing K with K* and the spherical score matrices T* and U* are obtained, and, consequently, R* = Φ*TU*(T*TK*U*)−1. Given any point xnew, according to eq 2 the score can be calculated by

n

∑ γk Φ(xk)

(7)

k=1

where γ = (γ1, ...,γn)T is the vector of coefficients. For the computation of the vector γ, the iterative algorithm in section 3.1 can be easily modified to be computed in a kernelinduced feature space only using the kernel inner product. With the initialized γ(0) = (1/n, ..., 1/n) ∈ n, we iteratively compute

γ (k) =

tnew = Φ̃(xnew)R∗ n

= (Φ(xnew ) −

n

ω

= (Φ(xnew ) −

n

∑i = 1 ωi

̃ U ∗(T ∗TK ∗U ∗)−1 = k new (14)

̃ is a centered vector with entries where knew

K i , i − 2(γ (k − 1))T K ., i + (γ (k − 1))T K (γ (k − 1))

n

(9)

̃ ]j = (Φ(xnew) − [k new

n

= ⟨Φ(xnew) −

n

=

Φ(xnew) −

l=1 n

∑ γk Φ(xk)

n

(10)

|| Φ(xj) − ∑k = 1 γk Φ(xk)||

where Φ̃(x) represents the centered feature vectors. Before applying SKPLS, the feature vectors should be centered around the spatial median. 3.3. Spherical KPLS. In the first step of the SKPLS procedure, all feature vectors are projected onto the unit sphere around the spatial median Σnk=1γkΦ(xk) in feature space, and new feature vectors are given as follows Φ(xi) − || Φ(xi) −

n ∑k = 1 γk Φ(xk) n ∑k = 1 γk Φ(xk)||

n

=

⟨Φ (xi), Φ (xj)⟩ =

Φ(xi) − || Φ(xi) −

Kj , j − 2γ tK ., j + γ tKγ (15)

̃ *(T *TK *U *)−1 Trob = KU

(16)

where K̃ is a centered matrix with entries (11)

n

K̃ i , j = (Φ(xi) −

∑ γl Φ(xl))(Φ*(xj))T l=1

=

n ∑k = 1 γk Φ(xk) , n ∑k = 1 γk Φ(xk)||

K i , j − γ TK ., i − γ TK ., j + γ TKγ Kj , j − 2γ tK ., j + γ tKγ

(17)

The strategy of SKPLS provides a robust alternative to the classical KPLS method. A strong advantage of the SKPLS approach is its conceptual simplicity and fast performance. Furthermore, it should be noted that spherical KPLS, just like classical KPLS, depends only on the kernel matrix without having to tune the additional constants. This is what makes the sphering method superior in comparison to other types of robust methods. For example, some robust algorithms use reweighting,

n

Φ(xj) − ∑k = 1 γk Φ(xk) n

|| Φ(xj) − ∑k = 1 γk Φ(xk)||

K (xnew , xj) − ∑k = 1 γkK (xnew , xk) − γ TK ., j + γ TKγ

Therefore, for the training data the robust score matrix is obtained by

where Φ*(xi) is the result of the projection of the ith feature vector onto a sphere and is called the sphered feature vector. This implies that ∗

∑ γl Φ(xl),

Φ(xj) − ∑k = 1 γk Φ(xk)

k=1



∑ γl Φ(xl), Φ∗(xj)⟩ l=1

n

Φ (xi) =

∑ γl Φ(xl))(Φ∗(xj))T l=1

Then, γ(k) converges to γ. Usually, the simple algorithm above converges in less than 10 steps and a good approximation will be obtained. When we find the n coefficients γk such that the spatial median equals Σnk=1γkΦ(xk), to center the data in feature space around the spatial median we define a new feature map as



∑ γl Φ(xl))Φ∗TU ∗(T ∗TK ∗U ∗)−1 l=1

(8)

1

Φ̃(x) = Φ(x) −

∑ γl Φ(xl))R∗ l=1

where the components of the vector ω ∈ n are given by ωi =

K i , i − 2γ TK ., i + γ TKγ Kj , j − 2γ TK ., j + γ TKγ (13)

Because the spatial median lies in the space spanned by the n inputs and any point in this space can be parametrized as a linear combination of the inputs, the spatial median in  can be written as μ=

K i , j − γ TK ., j − γ TK ., i + γ TKγ

(12)

In terms of the original and uncentered kernel matrix K, this lead to a sphered kernel matrix K* with entries C

dx.doi.org/10.1021/ie4008776 | Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Industrial & Engineering Chemistry Research

Article

and the standard deviation is substituted by estimator Qn, which is defined as32

requiring the choice of an appropriate weight function that usually implies an a priori assumption on the distribution of the data. This would be a difficult choice to make, especially in a general kernel-induced feature space.

Q n = c{|xi − xj| ; i < j}(k)

where k = = h(h − 1)/2, h = [n/2] + 1 and [·] denotes the integer part of the number. The pairwise difference between the two observations xi and xj is determined for i < j, which results in n(n − 1)/2 numbers. These numbers are sorted in ascending order, and the kth number multiplied with the constant c yields Qn. The constant c is a size-dependent correction factor, which makes Qn an unbiased estimate of the standard deviation for Gaussian data. Robust scaling is often considered extremely effective in practice; thus, in our trial, robust scaling is employed for the high-dimensional complex data containing outliers. 4.2. Off-Line Analysis. To monitor the process, Hotelling’s T2 statistic and the SPE statistic are used in the feature space. The statistic T2 is defined as follows

4. ROBUST ONLINE MONITORING STRATEGY BASED ON SKPLS Although KPLS has been successfully used for the modeling and monitoring of various industrial processes, these models could become seriously deteriorated when there are a considerable number of outliers in the modeling data. In this section, a robust online monitoring strategy based on SKPLS is developed. In our robust monitoring strategy, we not only use the robust KPLS method to extract the robust principle components but also adopt robust data scaling and calculate robust statistics and robust control limits. Thus, this means that the whole procedure in our proposed monitoring strategy will be resistant to outliers and therefore robust. The difference between the traditional monitoring strategies and the proposed monitoring strategy is illustrated in Figure 1.

T T 2 = tnew Λ−1tnew

4.1. Robust Scaling. Autoscaling is commonly applied to multivariate data. For a data sequence {xi}i n= 1, the autoscaling procedure follows

TA , n , α 2 ≈

A(n2 − 1) FA , n − A , α n(n − A)

where A denotes the number of KPLS components, n is the number of training samples, and α is the significance level. Another measure of the variation within the spherical KPLS model is given by the squared prediction error (SPE), which is also known as the Q statistic. Lee et al.5 proposed a simple calculation of SPE in the feature space, which is shown as follows A

dim F

(18)

SPE =

where x̅ is the mean of the variable and s is the standard deviation. Generally, the mean and variance of the sample provide a good estimation for data location and scatter if the sample is not contaminated by outliers. However, when a data set contains outliers, even a single out-of-scale observation, the sample mean and variance may deviate significantly. To reduce the effect of multiple outliers in estimating the mean and standard deviation, robust scaling has been suggested.31 In our strategy of robust scaling, the mean is replaced with the median xmedian = median(x1, x 2 , ..., xn)

(21)

where tnew is obtained from eq 14 and Λ is the covariance matrix of the robust score matrix Trob. It should be emphasized that the robust score matrix is calculated based on the training data that may contain outliers, so it is necessary to estimate the robust covariance matrix of the score matrix. In this Article, the robust covariance matrix is obtained with the minimum covariance determinant (MCD) approach.33 The objective of MCD is to find a subset of the data with the smallest determinant of the covariance matrix. The subset fulfilling this criterion is called a clean subset and serves to derive the robust estimates of the data mean and scale. It should be noted that the MCD approach will fail for multimodal operational data because the characteristics of the process data, such as the mean and covariance, vary significantly when the operating mode switches from one to another. To apply the MCD method and demonstrate the validation of the proposed robust monitoring strategy, the training and test data in this Article are composed of samples from one operating mode. The confidence limit for T2 can be obtained by using the F distribution

Figure 1. Comparison between the traditional monitoring strategies and the proposed strategy.

n ⎧ 1 ⎪ x ̅ = ∑ xi n i=1 ⎪ ⎪ n ⎪ ⎨ s 2 = 1 ∑ (x − x )2 i ̅ ⎪ n − 1 i=1 ⎪ x − x̅ ⎪ di = i ⎪ ⎩ s

(20)

(h2)

∑ j=1

tj2 −

∑ tj2 = j=1

dim F

∑ j=A+1

tj2 (22)

where tk is the kth KPLS component and dimF is the dimensionality of the feature space. Unlike the linear PLS method, the number of KPLS components can be bigger than the process variable number. However, the dimension of the feature space is arbitrarily high; thus, the confidence limit for the SPE statistic may be unrealistically large.34 To avoid this problem, the dimension of the feature space dimF can be empirically determined, which is usually about twice the number of the retained latent variables. In our experiment, we changed the value of dimF to around twice the number of the retained latent

(19) D

dx.doi.org/10.1021/ie4008776 | Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Industrial & Engineering Chemistry Research

Article

variables and judged whether the monitoring results are good enough. Too large or too small of a value of dimF will both lead to an unreasonable SPE monitoring chart. The confidence limit for the SPE can be calculated from its approximate distribution

K i*, j = K i , j − γ TK ., j − γ TK ., i + γ TKγ K i , i − 2γ TK ., i + γ TKγ Kj , j − 2γ TK ., j + γ TKγ

SPEα ≈ gχh 2

(5) Carry out the KPLS algorithm to obtain spherical score matrix T* and U*; (6) Compute the centered kernel matrix K̃ by

where g is a weighting parameter that accounts for the magnitude of SPE and h accounts for the degrees of freedom. If a and b are the estimated mean and variance, respectively, of the SPE from the training data, then g and h can be approximated by g = b/2a and h = 2a2/b. Note that this method of determining the confidence limit may be wrong when there are outliers in the data; hence, we need to make a robust estimation of the mean and variance. The mean is replaced with the median of the SPE, and the estimator Qn described in section 4.1 is applied to calculate the standard deviation of the SPE; thus, b is the square of Qn. 4.3. Outline of the Online SKPLS Monitoring. The proposed algorithm mainly consists of the normal operating condition (NOC) model development and online monitoring. The framework of the proposed method is exhibited in Figure 2, and detailed procedures are given as follows.

K̃ i , j =

K i , j − γ TK ., i − γ TK ., j + γ TKγ Kj , j − 2γ tK ., j + γ tKγ

(7) For the normal operating data, compute the robust score matrix Trob by Trob = K̃ U*(T*TK*U*)−1; (8) Calculate the robust covariance matrix Δ of score matrix Trob and compute SPE statistic of the normal operating data; (9) Determine the control limits of the T2 and SPE charts. 4.3.2. Online Monitoring. (1) Obtain new data for each sample and scale it by the same method used in the modeling procedure; (2) Given the m-dimensional scaled test data xnew, compute the kernel vector knew ∈ 1×n by [Knew]j = ⟨Φ(xnew), Φ(xj)⟩ = [K(xnew, xj)], where {xj}nj=1 is the normal operating data; ̃ by (3) Compute the new centered kernel vector knew ̃ ]j = [k new

T [k new ]j − γ Tk new − γ TK ., j + γ TKγ

Kj , j − 2γ tK ., j + γ tKγ

where K and γ are obtained from the modeling procedure; (4) For the test data xnew, compute the score vector tnew by tnew ̃ *(T*TK*U*)−1, where T*, U*, K* are also obtained = kU from the modeling procedure; (5) Calculate the monitoring statistics T2 and SPE for the test data; (6) Monitor whether T2 or SPE exceeds the corresponding control limit that is calculated in the modeling procedure.

5. CASE STUDY ON SIMULATION EXAMPLES In this section, to illustrate the performance of the SKPLS approach, two case studies are performed. The monitoring performance of SKPLS was compared with that of KPLS. If SKPLS shows better performance than KPLS, then the superiority of process monitoring using SKPLS will be proven. 5.1. Numerical Example. Consider the following nonlinear numerical example first13,35

Figure 2. Flowchart of the online monitoring based on SKPLS.

4.3.1. Developing the Normal Operating Condition (NOC) Model. (1) Acquire an operating data set during the normal process and implement robust scaling on the data; (2) Given a set of m-dimensional scaled normal operating data {xi}ni=1, compute the kernel matrix K ∈ n×n by [K]ij = Kij = ⟨Φ(xi), Φ(xj)⟩ = K(xi, xj); (3) With the initialized γ(0) = (1/n, ..., 1/n) ∈ n , iteratively compute eqs 8 and 9 until γ(k) converges to γ; (4) Project all feature vectors onto a unit sphere around the spatial median, and the sphered kernel matrix K* is acquired by

x1 = t 2 − t + 1 + e1

(23)

x 2 = sin t + e 2

(24)

x3 = t 3 + t + e3

(25)

y = x12 + x1x 2 + 3 cos x3 + e4

(26)

where t is uniformly distributed over [−1, 1] and ei (i = 1, 2, 3, and 4) are noise components uniformly distributed over [−0.1, 0.1]. In this simulation, we used 200 samples under a normal operation condition to train the monitoring scheme. The fault E

dx.doi.org/10.1021/ie4008776 | Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Industrial & Engineering Chemistry Research

Article

detection rates with contamination fractions from 0 to 20%. As we can see, when the training data set is not contaminated (ε = 0%), both the KPLS and SKPLS methods can achieve a high detection rate. However, with an increasing fraction of outliers in the data, the KPLS method is strongly influenced by outliers and gives a poor detection rate. In contrast, because of its robust property, the SKPLS method possesses a satisfactory detection rate even though the training data set is highly contaminated. When ε = 20%, the data structure may be badly destroyed by the outliers, which results in the low detection rates of both methods; however, the detection rate of the SKPLS method is still much higher than that of the KPLS method. Besides, we can also see from Table 2 that for the SKPLS method the monitoring performance with the T2 statistic is better than that with the SPE statistic. When more outliers are contained in the modeling data, the T2 statistic will show much better performance than the SPE statistic. The reason for this situation is that the calculation of the T2 statistic is based on both robust scores and robust covariance, whereas the calculation of the SPE statistic just involves robust scores. This can also be verified by the monitoring results in the next simulation experiment. Figure 3 shows the monitoring charts of both methods in the case where ε = 15%. The dotted line is the control limit of the computed statistic. From this figure, it is observed that the T2 and SPE charts of KPLS do not exceed the control limits for most samples after sample 200, whereas the monitoring charts of SKPLS show relatively correct fault detection. The T2 chart of SKPLS provides an excellent result; that is, only two points are above the control limit before sample 200 and all points are above the control limit after sample 200. In addition, we can also find from Figure 3 that the threshold of the SPE statistic for the KPLS method is very high, leading to a no false-positive rate and low fault-detection rate, whereas the threshold of the SPE statistic for the SKPLS method is appropriate to achieve good monitoring results. This is because the conventional KPLS method is not robust with respect to contaminated modeling data and cannot obtain robust latent variables. Because the latent variables extracted by KPLS are influenced by outliers and the SPE statistic is distance-based, the decision boundary of the SPE cannot closely encompass the normal region and the threshold

data set with 1000 samples is also generated, and a step change for x2 of −0.8 was introduced starting at sample 201. The fault data are monitored in real time to evaluate the performance. To test the superiority of SKPLS, the training data set is contaminated by outliers by randomly selecting some samples and replacing them with the faulty samples, and we introduce a contamination fraction ε that represents the percentage of outliers. For example, when the number of the training data set is 200 and ε = 5%, it means that 10 outliers exist in the training data. The confidence of the control limits is set to 99%. First, for the training data sets under several contamination fractions, the KPLS and SKPLS models are built respectively. The radial-basis kernel function k(x, y) = exp(−∥x − y∥2/c) is selected as the kernel function, where c is a constant that is determined by considering the process. The selection of kernel parameter c is still an open question in the machine learning area, and there is no optimal parameter-selection theory directed for industrial applications. Usually, with a larger c the monitoring scheme will be less sensitive. In contrast, a small c will increase the sensitivity for detecting incipiently developing faults. Here, by testing the monitoring performance for various values of c, we found that c = 0.06 is appropriate for monitoring this process. The number of latent variables needed to build the models is determined by the cross validation procedure. Here, the number of latent variables is set as A = 12, and the dimension of feature space is set as dimF = 26. Later, the performance of the fault detection is evaluated on the faulty data. Table 2 shows the fault Table 2. Fault Detection Rates (%) of the Two Methods under Different Contamination Fractions KPLS contamination fraction (ε, %)

2

T

0 2.5 5 10 15 20

100 100 99.9 61.6 38.3 21.5

SKPLS SPE

2

T

SPE

100 95.3 89.0 37.8 26.3 17.8

100 100 100 100 100 57.5

100 98.3 98.0 67.9 44.3 35.9

Figure 3. Monitoring charts of both methods in the case where ε = 15%. (a) KPLS and (b) SKPLS. F

dx.doi.org/10.1021/ie4008776 | Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Industrial & Engineering Chemistry Research

Article

will be inflated. The proposed method has the ability to detect the disturbance not only when the training data set is clean but also when the training data set is contaminated by outliers. 5.2. Case Study on the TE Process. 5.2.1. Process Description and Simulation Design. TE process is a wellknown benchmark that has been widely used for testing the performance of various monitoring methods.2,36 This process consists of five principal operation units: a reactor, a product condenser, a vapor−liquid separator, a recycle compressor, and a product stripper. The TE process contains two blocks of variables: 41 measured variables (including 22 continuous process measurements and 19 discrete composition measurements) and 12 manipulated variables, and it can produce 21 programmed fault modes. A detailed description of the TE process is available in a book by Chiang et al.2 In this study, 22 process measurements and 11 manipulated variables are chosen as input variables X, and the composition of G in stream nine is chosen as the quality variable y. The TE process simulator is used to generate normal data and faulty data. Five-hundred normal samples are used as the training data, and the faulty data is used for testing. Each fault type contains 960 observations, and the fault is introduced at sample 161 with a sampling interval of 3 min. To justify the advantageous performance of SKPLS over KPLS in its application to outliercontaminated data, 120 outliers are added artificially by randomly selecting 120 normal samples in the training data and replacing them with the faulty data. The original data (without outliers) and the corrupted data (with 120 outliers) are shown in Figure 4 a,b, respectively. The normal probability plots

Figure 5. Normal plots of the data. (a) The first variable in the original data and (b) the first variable in the corrupted data.

validation procedure. Here, the number of latent variables is set as A = 27, and the dimension of the feature space is set as dimF = 50. When implementing KPLS and SKPLS algorithms, the radialbasis kernel function k(x,y) = exp(−∥x − y∥2/c) is also selected as the kernel function with c = 0.65. For the purpose of process monitoring, the confidence limit of all monitoring statistics in the two models is set as 99%. To test the validation of SKPLS when applied to data without outliers, the KPLS and SKPLS models are first built based on the original data. Figure 6 shows the value comparison for the first

Figure 6. Value comparison for the first scores from KPLS and SKPLS for the original data.

Figure 4. Training data plots. (a) Original data and (b) corrupted data.

of the first variable in the two data sets are plotted precisely in Figure 5. The plot has the sample data displayed with the plot symbol +. If the data does come from a normal distribution, then the plot will appear linear. It is clear that the first variable in the corrupted data is not normally distributed and has heavy tails. Similar results can be obtained if the normal probability plots of the other variables are displayed. Therefore, if we still adopt the classical methods that are based on the assumption that data are normally distributed, then the monitoring performance will be deteriorated. 5.2.2. Results and Discussion. On the basis of the training data set, the KPLS and SKPLS models are developed. Robust scaling is applied to the training data, and the number of latent variables retained for the two models is determined by the cross-

score (the first column of score matrix T) extracted by the two methods, and one can see that for the data without outliers the score extracted by SKPLS is very close to that extracted by KPLS. When using the original data as the training data set, the results of detecting all of 21 faults in the TE process based on the two methods are listed in Table 3. These results show that when the two methods are applied to the normal distributed data for modeling they can provide similar results. It also demonstrates that SKPLS can work well when the modeling data set does not contain outliers and achieve a good monitoring performance, though in the SKPLS approach the original KPLS algorithm is improved and a sphering strategy is integrated into the modeling procedures. The implementation of projecting the feature G

dx.doi.org/10.1021/ie4008776 | Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Industrial & Engineering Chemistry Research

Article

the principal components extracted by SKPLS can describe the majority of the data well. To compare the process monitoring performance of SKPLS with KPLS when applied to contaminated modeling data, all of the 21 faults in the TE process are tested again, and the testing results are tabulated in Table 4. It should be note that outliers are

Table 3. Fault Detection Rates (%) of the Two Methods for the Original Modeling Data KPLS 2

fault mode

T

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

100 98.9 8.5 100 100 100 100 98.9 6.0 91.6 82.3 99.8 95.4 100 17.4 92.6 97.0 90.9 88.9 85.0 59.1

SKPLS 2

SPE

T

SPE

99.8 98.5 25.3 98.9 100 100 100 99.4 22.8 91.3 80.1 99.9 95.4 100 27.3 83.1 93.9 91.8 74.6 87.9 55.6

100 98.9 7.4 100 100 100 100 98.9 4.9 91.3 82.1 99.8 95.4 100 16.0 92.4 96.9 90.6 87.8 83.3 58.1

99.8 98.5 21.6 96.6 100 100 100 99.4 19.4 89.8 77.1 99.9 95.0 100 25.6 80.4 93.0 91.3 65.3 84.0 50.6

Table 4. Fault Detection Rates (%) of the Two Methods for the Contaminated Modeling Data KPLS

vectors onto a sphere can restrict the influence of outliers and will not change the main characteristic of the original data. However, when using the corrupted data as the training data the two methods of KPLS and SKPLS provide very different results. Figure 7 shows the first score obtained by the two methods. In this figure, all dotted lines represent the score from the methods when applied to the original data, and all solid lines represent the score from the methods when applied to the corrupted data. One can see from Figure 7a that the score obtained by KPLS for the corrupted data is very different from that for the original data, so the principal components from KPLS is not reliable and may lead to invalid results. By comparison, as shown in Figure 7b, the score obtained by SKPLS for the corrupted data is close to that for the original data; thus,

SKPLS

fault mode

T2

SPE

T2

SPE

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

93.9 28.6 0.1 2.9 31.5 87.1 100 92.8 0.4 76.1 39.6 98.5 94.8 99.9 0 57.6 95.1 89.8 27.3 57.0 31.6

98.6 95.3 2.5 14.3 99.8 93.4 100 97.9 3.3 86.5 43.8 99.9 95.6 100 6.4 88.3 86.8 89.6 88.4 84.3 46.3

99.9 98.5 7.1 100 100 100 100 98.0 7.1 85.9 72.8 99.9 95.5 100 8.9 77.3 96.8 90.9 83.3 65.8 65.9

99.9 98.8 13.0 50.6 100 100 100 98.3 10.0 92.8 73.0 99.9 95.5 100 17.8 90.0 94.0 90.9 84.4 89.9 54.5

not introduced into the faulty data, and all of the outliers in the test data will be detected and identified as faulty samples. In this table, the fault detection rates for all faults are listed, and the maximum rate for each statistic in each fault case is marked with a bold number. It can be seen that the method based on SKPLS resulted in the highest fault detection rates for most faults. For faults 2, 4, 5, 11, and 19 especially, SKPLS has much better results

Figure 7. Plots of the first scores from both methods. (a) KPLS and (b) SKPLS. H

dx.doi.org/10.1021/ie4008776 | Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Industrial & Engineering Chemistry Research

Article

Figure 8. Online monitoring charts for both methods in the case of fault 4. (a) KPLS and (b) SKPLS.

Figure 9. Online monitoring charts for both methods in the case of fault 5. (a) KPLS and (b) SKPLS.

reactor will increase suddenly. For this fault, as shown in Figure 8a,b, KPLS fails to detect this type of fault because even at very few samples, the T2 and SPE values of KPLS exceed the control limits, whereas the T2 chart of SKPLS shows a distinct change from sample 161 and clearly exceeds the control limit. The SPE chart of SKPLS also indicates the presence of abnormalities, although it is less reliable than the T2 chart. In the case of fault 5, the monitoring charts are shown in Figure 9. From Figure 9a we can see that the SPE chart of KPLS can effectively detect the fault, but the T2 values fall below the control limit after sample 350 even though the fault is still present. This result will mislead engineers in judging the process status. Figure 9b shows the monitoring results of SKPLS for fault 5, and we can see that both T2 and SPE charts exceed the control limit at sample 161 and stay above the control limit for the entire time.

than KPLS. On the basis of the previous literature, some faults, such as faults 3, 9, and 15, are known to be difficult to detect. Though the detection rates of both methods for these three faults are very low, SKPLS still shows a relatively better monitoring performance than KPLS. After comparing Tables 3 and 4, one can find that the results of SKPLS when applied to the contaminated modeling data are nearly consistent with the results of both methods when applied to the original data, whereas the results of KPLS for the contaminated modeling data are quite different. This suggests that the robust approach can result in models that are more accurate than the nonrobust approach when the training data is contaminated. Therefore, SKPLS outperforms KPLS for process monitoring when there are outliers in the training data. The monitoring results of faults 4 and 5 in particular are shown in Figures 8 and 9, respectively. In the case of fault 4, there is a step change in the reactor-cooling water-inlet temperature at sample 161, and after the fault occurs the temperature of the I

dx.doi.org/10.1021/ie4008776 | Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Industrial & Engineering Chemistry Research

Article

(13) Zhang, X.; Yan, W.; Shao, H. Nonlinear Multivariate Quality Estimation and Prediction Based on Kernel Partial Least Squares. Ind. Eng. Chem. Res. 2008, 47, 1120. (14) Zhang, Y.; Fan, Y.; Zhang, P. Combining Kernel Partial LeastSquares Modeling and Iterative Learning Control for the Batch-to-Batch Optimization of Constrained Nonlinear Processes. Ind. Eng. Chem. Res. 2010, 49, 7470. (15) Zhang, Y.; Zhang, Y. Complex Process Monitoring Using Modified Partial Least Squares Method of Independent Component Regression. Chemom. Intell. Lab. Syst. 2009, 98, 143. (16) Zhang, Y.; Teng, Y.; Zhang, Y. Complex Process Quality Prediction Using Modified Kernel Partial Least Squares. Chem. Eng. Sci. 2010, 65, 2153. (17) Zhang, Y.; Hong, Z.; Qin, S. J.; Chai, T. Decentralized Fault Diagnosis of Large-Scale Processes Using Multiblock Kernel Partial Least Squares. IEEE Trans. Ind. Inform. 2010, 6, 3. (18) Zhang, Y.; Hu, Z. Multivariate Process Monitoring and Analysis Based on Multi-Scale KPLS. Chem. Eng. Res. Des. 2011, 89, 2667. (19) Zhang, Y.; Ma, C. Fault Diagnosis of Nonlinear Processes Using Multiscale KPCA and Multiscale KPLS. Chem. Eng. Sci. 2011, 66, 64. (20) Zhang, Y.; Hu, Z. On-Line Batch Process Monitoring Using Hierarchical Kernel Partial Least Squares. Chem. Eng. Res. Des. 2011, 89, 2078. (21) Chen, J.; Bandoni, A.; Romagnoli, J. A. Robust Statistical Process Monitoring. Comput. Chem. Eng. 1996, 20, 497. (22) Doymaz, F.; Chen, J.; Romagnoli, J. A.; Palazoglu, A. A Robust Strategy for Real-Time Process Monitoring. J. Process Control. 2001, 11, 343. (23) Wang, D.; Romagnoli, J. A. Robust Multi-Scale Principal Components Analysis with Applications to Process Monitoring. J. Process Control 2005, 15, 869. (24) Ge, Z.; Yang, C.; Song, Z.; Wang, H. Robust Online Monitoring for Multimode Processes Based on Nonlinear External Analysis. Ind. Eng. Chem. Res. 2008, 47, 4775. (25) Jia, R. D.; Mao, Z. Z.; Chang, Y. Q.; Zhang, S. N. Kernel Partial Robust M-Regression As a Flexible Robust Nonlinear Modeling Technique. Chemom. Intell. Lab. Syst. 2010, 100, 91. (26) Locantore, N.; Marron, J. S.; Simpson, D. G.; Tripoli, N.; Zhang, J. T.; Cohen, K. L.; Boente, G.; Fraiman, R.; Brumback, B.; Croux, C. Robust Principal Component Analysis for Functional Data. Test 1999, 8, 1. (27) De Jong, S. SIMPLS: An Alternative Approach to Partial Least Squares Regression. Chemom. Intell. Lab. Syst. 1993, 18, 251. (28) Daszykowski, M.; Kaczmarek, K.; Stanimirova, I.; Heyden, Y. V.; Walczak, B. Robust SIMCA-Bounding Influence of Outliers. Chemom. Intell. Lab. Syst. 2007, 87, 95. (29) Debruyne, M.; Hubert, M.; Van Horebeek, J. Detecting Influential Observations in Kernel PCA. Comput. Stat. Data. Anal. 2010, 54, 3007. (30) Debruyne, M.; Verdonch, T. Robust Kernel Principal Component Analysis and Classification. Adv. Data. Anal. Classi. 2010, 4, 151. (31) Huber, P. J. Robust Statistics; Wiley: New York, 1981. (32) Rousseeuw, P. J.; Croux, C. Alternatives to the Median Absolute Deviation. J. Am. Stat. Assoc. 1993, 88, 1273. (33) Rousseeuw, P. J.; Van Driessen, K. A Fast Algorithm for the Minimum Covariance Determinant Estimator. Technometrics 1999, 41, 212. (34) Ge, Z.; Yang, C.; Song, Z. Improved Kernel PCA-Based Monitoring Approach for Nonlinear Processes. Chem. Eng. Sci. 2009, 64, 2245. (35) Zhao, S. J.; Zhang, J.; Xu, Y. M.; Xiong, Z. H. Nonlinear Projection to Latent Structures Method and Its Applications. Ind. Eng. Chem. Res. 2006, 45, 3843. (36) Downs, J. J.; Vogel, E. F. A Plant-Wide Industrial Process Control Problem. Comput. Chem. Eng. 1993, 17, 245.

6. CONCLUSIONS In the present Article, a robust online monitoring approach is developed and presented for nonlinear process monitoring, which is based on spherical-kernel partial least squares (SKPLS). Through projecting the feature vectors onto a unit sphere, we obtain new feature vectors and then an ordinary KPLS is applied onto these new feature vectors. Because of the sphering, the influence of outliers is bounded; thus, their negative effect is reduced heavily. When compared with ordinary KPLS, the proposed method incorporates a robustness feature for coping with contaminated modeling data. Moreover, SKPLS is superior to other linear robust methods because it can effectively capture the nonlinear relationships in the process variables. By integrating a robust method into a kernel-based method, the proposed method has both robust and nonlinear characteristics and can be applied to obtain more accurate models for process monitoring purposes. On the basis of the simulation results of a numerical example and a benchmark study of TE process, the superiority of SKPLS has been proven.



AUTHOR INFORMATION

Corresponding Author

*Tel: +86-021-64252189. Fax: +86-021-65252189. E-mail: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This research is supported by the National Natural Science Foundation of China (no. 61 074 079) and Shanghai Leading Academic Discipline Project (no. B504).



REFERENCES

(1) MacGregor, J. F.; Kourti, T. Statistical Process Control of Multivariate Processes. Control Eng. Pract. 1995, 3, 403. (2) Chiang, L. H.; Russell, E. L.; Braatz, R. D. Fault Detection and Diagnosis in Industrial Systems; Springer-Verlag: London, 2001. (3) Qin, S. J. Statistical Process Monitoring: Basics and Beyond. J. Chemom. 2003, 17, 480. (4) Nomikos, P.; MacGregor, J. F. Monitoring Batch Processes Using Multiway Principal Component Analysis. AIChE J. 1994, 40, 1361. (5) Lee, J.; Yoo, C.; Choi, S. W.; Vanrolleghem, P. A.; Lee, I. Nonlinear Process Monitoring Using Kernel Principal Component Analysis. Chem. Eng. Sci. 2004, 59, 223. (6) Alcala, C. F.; Qin, S. J. Reconstruction-Based Contribution for Process Monitoring with Kernel Principal Component Analysis. Ind. Eng. Chem. Res. 2010, 49, 7849. (7) Kruger, U.; Chen, Q.; Sandoz, D. J.; McFarlane, R. C. Extended PLS Approach for Enhanced Condition Monitoring of Industrial Processes. AIChE J. 2001, 47, 2076. (8) Lee, H. W.; Lee, M. W.; Park, J. M. Multi-Scale Extension of PLS Algorithm for Advanced On-Line Process Monitoring. Chemom. Intell. Lab. Syst. 2009, 98, 201. (9) Zhang, Y.; Chai, T.; Li, Z.; Yang, C. Modeling and Monitoring of Dynamic Processes. IEEE Trans. Neural Network Learn. Syst. 2012, 23, 277. (10) Rosipal, R.; Trejo, L. J. Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space. J. Mach. Learn. Res. 2001, 2, 97. (11) Rosipal, R. Kernel Partial Least Squares for Nonlinear Regression and Discrimination. Neural Netw. World 2003, 13, 291. (12) Kim, K.; Lee, J.; Lee, I. A Novel Multivariate Regression Approach Based on Kernel Partial Least Squares with Orthogonal Signal Correction. Chemom. Intell. Lab. Syst. 2005, 79, 22. J

dx.doi.org/10.1021/ie4008776 | Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX