Shrinking Principal Component Analysis for Enhanced Process

Nov 5, 2013 - Industrial Process Monitoring Based on Knowledge–Data Integrated Sparse ... analysis for process supervisory monitoring and fault dete...
0 downloads 0 Views 1005KB Size
Article pubs.acs.org/IECR

Shrinking Principal Component Analysis for Enhanced Process Monitoring and Fault Isolation Lei Xie,*,† Xiaozhong Lin,† and Jiusun Zeng*,‡ †

National Laboratory of Industrial Control Technology, Zhejiang University, Hangzhou 310027, China College of Metrology & Measurement Engineering, China Jiliang University, Hangzhou 310018, China



ABSTRACT: This article proposes a shrinking principal component analysis (ShPCA) for fault detection and isolation. Unlike traditional PCA, ShPCA employs a sparse representation of the loading vectors by incorporating the l1 norm into the original objective function of PCA. An iterative quadratic programming (QP) algorithm is used to solve the optimization problem, and a Bayesian information criterion (BIC) is proposed to automatically determine the weight on the l1 norm. With the sparse representation, ShPCA produces an easy-to-interpret residual space with close relation to the process mechanism. A fault detection and isolation strategy is then proposed based on ShPCA that uses the nonzero variables in the residual components to construct partial PCA models that are sensitive to specific process faults. Studies of the application of ShPCA to a simulation example and to the Tennessee Eastman (TE) process illustrate its applicability.

1. INTRODUCTION Principal component analysis (PCA) is a widely used dimensionreduction technique for the statistical monitoring of complex multivariate processes.1−5 PCA determines an orthogonal transformation of the original variables that captures the maximum variance.6 Let X = [x(1) x(2) ··· x(N)]T ∈ N×n denote a data set storing the variables collected from a process under normal operating conditions, where N and n are the numbers of samples and variables, respectively. Without loss of generality, assume that each column of X is mean-centered and the variance is appropriately scaled. Then, the optimization problem of PCA can be described as follows

principal components difficult. (ii) PCA provides little support for fault isolation. In the area of multivariate statistical analysis, various approaches have been proposed to address the first drawback of PCA.7,10−14 By incorporating the specific norm of p into either the constraint or the objective function of problem 1, these approaches obtain sparse loading vectors with a few nonzero elements. Thus, the loading vector is much easier to interpret. Reference 10 first proposed a natural extension of PCA, called SCoTLASS (PCA based on least absolute shrinkage and selection operator), in which the l1 constraint on p is introduced into PCA optimization problem 1. Zou and Hastie11 introduced a regression-type sparse PCA (SPCA) formulation, in which the sparse loading vectors are obtained by imposing the lasso or elastic net penalty on the regression coefficients. In contrast to SCoTLASS, SPCA is computationally efficient at the cost of the loss of orthogonality among loading vectors. Shen and Huang12 suggested a sequential approach (sPCA-rSVD, sparse principal component analysis via regularized singular value decomposition) to find the sparse loading vectors using regularized singular value decomposition, but the nonorthogonality remains a problem. An ad hoc thresholding approach to obtain a sparse loading vector is to artificially set the elements in p with absolute values smaller than a threshold to zero. This approach, as pointed out by Cadima and Jolliffe,15 is potentially misleading in various respects. More recently, Qi et al.7 proposed a new norm constraint, by mixing the l1 and l2 norms, on p, along with an efficient iterative algorithm to solve the optimization problem. It is problematic to employ the above approaches directly in process monitoring for the following reasons: (i) It is more desirable to have interpretable loading vectors in the PCA residual space for process monitoring. Because the residual space typically

pî = arg max || Xp ||2 2 p

s.t. || p ||2 2 = 1 p Tpĵ = 0,

j = 1, 2, ..., i − 1

(1)

where ||p||2 = (pTp)1/2 denotes the l2 norm. The first K < min(N, n) loading vectors p̂ = [p̂1 p̂2 ··· p̂K] ∈ n×K constitute the PCA model subspace, whereas the n − K discarded vectors p̃ = [p̂K+1 p̂K+2 ··· p̂n] ∈ n(n−K) span the residual subspace. A proper value of K ensures that the model space explains most of the variance in X and the residual space describes the multicollinearity among variables, which is caused by process material and energy balances. In multivariate process monitoring, two statistics, T2 and SPE, are defined in the model and residual spaces to determine whether a process is in statistical control. The use of PCA in process monitoring has the following limitations:7−9 (i) The loading vector elements are typically nonzero, so that each principal component is a linear combination of almost all of the observed variables. Such combination, in turn, shows that the principal components are related to all of the variables, rather than a small number of variables, which renders the interpretation of the physical meaning of the extracted © 2013 American Chemical Society

Received: Revised: Accepted: Published: 17475

April 1, 2013 October 30, 2013 November 2, 2013 November 5, 2013 dx.doi.org/10.1021/ie401030t | Ind. Eng. Chem. Res. 2013, 52, 17475−17486

Industrial & Engineering Chemistry Research

Article

representation of the model space; a comparison among these algorithms can be found in ref 7. This section presents a summary of the widely adopted sparse PCA (SPCA) method. Sparse PCA (SPCA) solves the following regression-type problem11,18

describes the material and energy balances, a violation of the SPE statistic indicates a breakdown of the normal operating conditions that results from process faults, including pipe leakage and sensor faults. Thus, an interpretable residual space can facilitate the diagnosis of incipient faults. The approaches mentioned above, such as SCoTLASS, SPCA, and sPCA-rSVD, however, pay more attention to the model space and seek the variables that dominate the variance in X. This is a natural choice for handling biological or social science data, but it is not suitable for complex industrial process monitoring. (ii) It lacks efficient criteria to determine the parameter controlling the sparsity of the loading vectors. Zou and Hastie11 studied the performance of the SPCA algorithm for different λ values, which balances the variance of the principal component and the sparsity of loading vector, but the selection of λ is still a matter of personal preference. The second limitation of PCA-based process monitoring relates to its limited capability for fault isolation. Contribution charts, variable reconstruction, and structured residuals are the main approaches for isolating a fault root.1 Contribution charts allow for the comparison of the effects of different process variables on the T2 and SPE statistics that are out of statistical control. The variables with large contributions are regarded to have close connections to the faults. Despite the reported successful application of control charts,1,3 the fault isolation results can be unreliable and misleading.1,16 The variable reconstruction approach tries to remove the influence of the fault on the process variables.17 If the reconstructed variable is in statistical control, the corresponding fault is isolated for further analysis. Note that the application of the variable construction approach is restricted to situations in which the fault condition can be described by a linear subspace. In contrast, the structured residuals approach exploits the equivalence between PCA and parity relations.9 A set of residuals, each of which is sensitive to a particular set of faults, is generated in the PCA residual space. Fault isolation then reduces to the task of testing the significance of these residuals. Depending on the dimension of the PCA residual space n − K, one structured residual might be sensitive to several faults simultaneously. In addition, it is also hard to provide a physical interpretation of the structured residuals. To address the above issues, this article proposes a new shrinking principal component analysis (ShPCA) method with the following advantages: (i) ShPCA produces an easy-to-interpret residual space with a close connection to the underlying process mechanism. (ii) The ShPCA problem can be solved efficiently by an iterative quadratic programming algorithm. (iii) A Bayesian information criterion (BIC) is presented to automatically balance between the sparsity of loading vectors and the minimization of the residual variance. (iv) The sparsity of ShPCA loading vectors allows for the natural definition of a set of partial monitoring models that are sensitive to specific process faults. The rest of this article is organized as follows: Section 2 provides a brief summary of recent work on sparse PCA. The shrinking principal component analysis method, including problem formulation, the iterative quadratic programming algorithm, and the Bayesian information criterion, is detailed in section 3. Section 4 discusses the use of ShPCA in process monitoring and fault isolation. This is followed by studies of the application of ShPCA to a simulation study and the TE process in sections 5 and 6, respectively. Finally, section 7 provides a concluding summary.

2

K

K

min || X − XPAT ||2 + λ 2 ∑ || pj || P, A

j=1

s.t. ATA = IK × K

2

2

+

∑ λ1,j || pj ||1 j=1

(2)

where p = [p1 p2 ··· pK] ∈ n×K and ||pj||1 = Σnk=1|pjk| is the l1 norm. For the case of N > n, which typically holds true for data sets recorded from industrial processes, λ2 can be zero by default, and λ1,j > 0 is employed to tune the compromise between the variance and sparsity. The main difference between SPCA and ShPCA is in their objective functions. In eq2, a regression-type error is considered by minimizing the residuals of the SPCA model. On the other hand, eq 3 (see section 3.1) directly considers the minimization of the variance of extracted PC, which is exactly the objective function of PCA. Zou et al.18 developed a computationally efficient algorithm to obtain the optimal vectors A and P in an iterative and sequential manner. It takes the PCA loading matrix p̂ as the initial guess of A, and in each iteration step, P is first regressed for a given estimation of A in the previous step with the elastic net algorithm, and then A is updated by the singular value decomposition of XTXP. As pointed out in ref 19, the standard SPCA approach lacks a criterion for selecting the tuning parameters λ1,j and produces a nonorthogonal set of loading vectors p̂SPCA. In addition, the first term ||X − XPAT|| in the objective function of SPCA problem 2 yields a best approximation of X rather than an interpretable linear relationship among the variables.

3. SHRINKING PRINCIPAL COMPONENT ANALYSIS This section presents the problem formulation of shrinking principal component analysis (ShPCA). The word shrinking is used to highlight that the new approach is designed to obtain a sparse representation of the residual space; that is, both the variance of the principal component and the number of nonzero elements in the loading vectors are minimized. An iterative quadratic programming (QP) approach is proposed to solve the ShPCA optimization problem, and its convergence is established in section 3.2. Section 3.3 introduces a BIC-type criterion to select the ShPCA parameter. 3.1. Definition of ShPCA. The ith loading vector of ShPCA is the solution to the optimization problem

where ||Xp||22 + λ||p||1 is denoted by - (p) and λ > 0 controls the sparsity of p. For the first loading vector (i = 1), the orthogonal constraints vanish. Denote pk as the kth element of p, that is, p = [p1 ··· pk ··· pn]T, and p̂ShPCA,i−1 = [p̂ShPCA,1 p̂ShPCA,2 ··· p̂ShPCA,i−1], and take the definitions of the l1 and l2 norms into account. Then, the ShPCA optimization problem 3 can be reformulated as

2. PRELIMINARIES SPCA,11 SCoTLASS,10 and sPCA-rSVD12 belong to the same category of improved PCA algorithms for the sparse 17476

dx.doi.org/10.1021/ie401030t | Ind. Eng. Chem. Res. 2013, 52, 17475−17486

Industrial & Engineering Chemistry Research

Article

p(m) is also a local-optimal solution of problem 4. This proves the convergence of the objective function. Furthermore, the convergence of p* can be proved. Assume that there exists a constant number δ > 0 and that ||p* − p(m−1)|| > δ. Then, ||p* > (1 + δ2)1/2. Because the objective function also decreases during normalization, this cannot happen, as

n T T

pShPCA, = arg min p X Xp + λ ∑ tk ̂ i p, t

k=1

T

s.t. p p = 1 −tk ≤ pk ≤ tk ,

k = 1, 2, ..., n

T ̂ PShPCA, i−1 p = 0

where t = [t1 t2 ··· tn] collects the augmented variables to address the absolute value terms in the l1 norm. Note that the presence of pTp = 1 renders optimization problem 4 nonconvex. To develop an efficient algorithm for solving ShPCA, the next section approximates the l2 norm constraint with a linear equality. 3.2. Iterative QP Algorithm for ShPCA. Without restriction of generality, only the first loading vector of ShPCA is considered in this section. It is straightforward to extend the discussions to other loading vectors. The proposed iterative QP algorithm proceeds as follows: Algorithm 1 (Iterative QP Algorithm for Solving ShPCA). (i) Start the algorithm by setting the last PCA loading vector as the initial solution, that is, pi(0) = p̂i, i = 1, ..., n. (ii) For any m ≥ 1, based on p(m−1) in the last step, solve the quadratic optimization problem

p* = arg min p TXTXp p T (m − 1)

s.t. p p

(6)

L(β , p) = p TXTXp − β[p Tp(m − 1) − 1] = (N − 1)p TSxx p − β[p Tp(m − 1) − 1]

(7)

Setting the first-order derivative of L(β,p) to zero yields

k=1

s.t. p Tp(m − 1) = 1 k = 1, 2, ..., n

=1

Define the Lagrange function as

p* = arg min p TXTXp + λ ∑ tk

−tk ≤ pk ≤ tk ,

1 + δ2

The proof is complete. It is also interesting to study the behavior of algorithm 1 for λ = 0. Again taking the first loading vector of ShPCA into account, we have the following theorem: Theorem 2. For a nonsingular sample covariance Sxx = [1/(N − 1)]XTX and λ = 0, iterative QP algorithm 1 converges to the last loading vector p̂n for any initial feasible point. Proof. Because λ = 0, problem 5 reduces to

n p, t

-[p(m − 1)]

-[p(m)] >

(4)

∂L = p*T p(m − 1) − 1 = 0 ∂β

(5)

(8)

∂L = 2(N − 1)Sxx p* − β p(m − 1) = 0 ∂p

(iii) Obtain p(m) by normalizing p*: p(m) = p*/||p*||. (iv) Repeat steps ii−iii until convergence, that is, p(m) − p(m−1) ≤ ε, where ε is a predetermined threshold. To obtain the remaining loading vectors, procedures similar to algorithm 1 can be employed. However, a deflation procedure (e.g., projection deflation, Schur complement deflation20) should be included before the normalization stage of step iii to ensure that the orthogonality constraint is satisfied. Algorithm 1 relies on the approximation of the l2 norm of p as pTp(m) and converts nonconvex optimization problem 4 into a series of standard QP problems. A wide variety of computer routines, such as quadprog in MATLAB, are available for solving the QP problem. The convergence of algorithm 1 is established in theorem 1. Theorem 1. Iterative QP algorithm 1 converges to a local optimum of ShPCA optimization problem 4. Proof. In the mth iteration of algorithm 1, the point p(m−1) is feasible for convex problem 5, which suggests that the objective function of problem 5 either decreases or stays the same as in the previous iteration. Moreover, the objective function also decreases in the normalization step, as it always holds that

⇒ p* =

β Sxx −1p(m − 1) 2(N − 1)

(9)

Equation 9 holds true because Sxx is nonsingular. Substituting eq 9 into eq 8 gives β=

2(N − 1) (m − 1)T

p

Sxx −1p(m − 1)

(10)

Then we have p* =

Sxx −1p(m − 1) p(m − 1)TSxx −1p(m − 1)

After normalization step iii of algorithm 1, it can be observed that, for λ = 0, algorithm 1 is the power method for finding the largest eigenvalue of the matrix Sxx−1, and the convergence of the power method has been well studied in the literature.22 Thus, algorithm 1 converges to the last eigenvector corresponding to the smallest eigenvalue of Sxx, and this concludes the proof. 3.3. BIC for ShPCA Parameter Selection. Section 3.2 introduces the iterative QP algorithm for solving ShPCA for a given λ value. Similarly to the parameter λ1,j in SPCA, λ determines the number of nonzero elements in p̂ShPCA,i. A larger λ value produces a sparser loading vector with fewer nonzero elements but increases the variance of the obtained residual. To evaluate the performance of ShPCA for different λ values, this section presents the following Bayesian information criteria (BIC)

⎛ p* ⎞ -(p*) -⎜ ⎟≤ || p*|| ⎝ || p*|| ⎠

According to the monotonic convergence of the real sequence,21 it can be seen that algorithm 1 always converges for any initial point, as the objective function is always positive because λ > 0. Finally, with the method of Lagrange multipliers, it can be validated that problem 4 and problem 5 have the same first-order Lagrange function derivatives, except that the multiplier of the constraint pTp(m) = 1 is twice that of pTp = 1. Therefore, upon convergence, 17477

dx.doi.org/10.1021/ie401030t | Ind. Eng. Chem. Res. 2013, 52, 17475−17486

Industrial & Engineering Chemistry Research

Article

Table 1. Results of Numerical Examples: Variance versus Sparsity explained variance (%) (NNE) PC

PCA

1 2 3 4 5 6 7 8 9 10 total variance

53.98 (10) 35.42 (10) 2.08 (10) 1.88 (10) 1.52 (10) 1.45 (10) 1.29 (10) 0.85 (10) 0.78 (10) 0.75 (10) 100

T BICλ , i = log(pShPCA, S p̂ ) + f (pShPCA, ) ̂ ̂ i xx ShPCA, i i

SPCA 22.72 (4) 25.46 (4) 2.05 (3) 5.09 (3) 4.30 (2) 6.31 (2) 8.78 (2) 0.84 (2) 17.10 (2) 0.84 (2) 93.89

33.66 (6) 33.14 (6) 2.04 (3) 4.86 (3) 1.93 (3) 1.44 (3) 14.04 (3) 0.84 (3) 0.81 (3) 5.62 (3) 98.37

ShPCA 53.98 (10) 35.34 (8) 2.08 (10) 3.62 (2) 1.53 (4) 8.78 (2) 8.52 (2) 0.85 (4) 0.86 (2) 2.00 (2) 79.24

53.91 (10) 35.48 (8) 1.88 (10) 2.04 (2) 1.52 (4) 1.43 (2) 1.32 (2) 0.86 (4) 0.80 (2) 0.76 (2) 100

log N N (11)

where f(p̂ShPCA,i) calculates the number of nonzero elements in the vector p̂ShPCA,i. The optimal λ value is selected as the one minimizing eq 11. Because f(·) is not continuous and p̂ShPCA,I cannot be expressed explicitly with λ, it is difficult to optimize BICλ,i directly. A practical approach is to restrict λ to a finite set, λ ∈ {(N − 1)2l: l∈Z is an integer and lmin ≤ l ≤ lmax}. Empirical esvidence suggests that, if all of the variables are scaled to unit variances, lmin = −6 and lmax = 2 are sufficient to select a proper λ value.

4. PROCESS MONITORING AND FAULT ISOLATION BASED ON SHPCA With the ShPCA algorithm, it is possible to develop a process monitoring and fault isolation method. The process monitoring and fault isolation strategy consists of the following steps: (i) Collect the normal data set X = [X(1) X(2) ··· X(N)]T ∈N×n and perform traditional PCA (total PCA); obtain the confidence limits of the T2 and SPE statistics. (ii) Obtain the residual components representing the minimum variances p̂ShPCA,i, i = 1, ..., KShPCA, by solving the optimization problem in eq 3 using algorithm 1. (iii) Obtain the variables with nonzero coefficients in the ith loading vector for the residual space, denoted as Vi (i = 1, ..., KShPCA). (iv) For each Vi, perform traditional PCA and obtain the partial PCA models; obtain the confidence limits for the T2 and SPE statistics. (v) When a new measurement arrives, calculate the T2 and SPE statistics using the total PCA model and the KShPCA partial PCA models, so that KShPCA + 1 sets of monitoring statistic can be obtained. (vi) If any of the KShPCA + 1 sets of monitoring statistics violate the confidence limit, then a situation that is out of statistical control is reported. The monitoring results can also be used for fault isolation. Note that the loading vectors of ShPCA typically contain a few nonzero elements; thus, the partial PCA models produce naturally structured monitoring statistics,9 and it is convenient to find the root cause of the incipient fault. Moreover, the influence of the fault can be analyzed by inspecting which partial PCA models are capable of detecting the fault and how they do so.

Figure 1. Selection of parameter λ using the BIC.

5. NUMERICAL EXAMPLE This section presents a simulation example that is modified from ref 18 and involves 10 process variables and two hidden variables. The first four variables are the observations of the first hidden variable corrupted by Gaussian-distributed measurement noises, whereas the next four variables relate to the second hidden variable. The last two variables record the linear combinations of the two hidden variables contaminated by Gaussian noises. x = As + e ⎡ 1 1 1 1 0 0 0 0 −0.6 −0.6 ⎤T A=⎢ ⎣ 0 0 0 0 1 1 1 1 0.8 0.8 ⎥⎦ ⎧⎡ 0 ⎤ ⎡ 0.98 0 ⎤⎫ ⎡ s1 ⎤ s = ⎢ ⎥ ≈ N ⎨⎢ ⎥ , ⎢ ⎥⎬ ⎣ s2 ⎦ ⎩⎣ 0 ⎦ ⎣ 0 1 ⎦⎭

(12)

The measurement noise e follows a zero-mean Gaussian distribution of variance Σee = diag([0.09 0.09 0.09 0.09 0.16 0.16 0.16 0.16 0.25 0.25]). To contrast the performance of the proposed ShPCA method with SPCA and traditional PCA, a total of N = 500 samples were generated on the basis of eq 12 as the reference data set. A MATLAB toolbox, SpaSM, developed by Sjöstrand et al.,23 was employed to carry out the SPCA analysis. The SpaSM toolbox allows for the direct control of the loading vector sparsity by setting the number of nonzero elements (NNE) in pj. The corresponding parameters λ1,j in eq 2 are calculated by the toolbox automatically. 17478

dx.doi.org/10.1021/ie401030t | Ind. Eng. Chem. Res. 2013, 52, 17475−17486

Industrial & Engineering Chemistry Research

Article

Table 2. Sparse Loading Vectors Obtained by ShPCA loading vectors p̂shpca,i PC 10 9 8 7 6 5 4 3 2 1

λ −5

2 2−5 2−5 2−5 2−5 2−4 2−1 2−4 2−5 22

x1

x2

x3

x4

x5

x6

x7

x8

x9

x10

0 0.711 −0.500 0 0 0 0 −0.174 −0.397 −0.248

−0.708 0 0.501 0 0 0 0 −0.173 −0.395 −0.247

−0.706 0 0.503 0 0 0 0 −0.174 −0.400 −0.248

0 −0.704 −0.500 0 0 0 0 −0.176 −0.401 −0.250

0 0 0 0.692 0 −0.518 0 0.230 −0.305 0.328

0 0 0 −0.722 0 −0.496 0 0.220 −0.293 0.314

0 0 0 0 −0.702 0.500 0 0.233 −0.310 0.332

0 0 0 0 −0.712 0.489 0 0.230 −0.305 0.328

0 0 0 0 0 0 0.704 −0.581 0 0.408

0 0 0 0 0 0 −0.710 −0.577 0 0.404

variable set {x2, {x1, {x1, {x5, {x7, {x5, {x9, {x1,

x3} x4} x2} x6} x8} x7} x10} x5, x10}

Figure 2. Monitoring results for a sensor bias of 1.2 in x1: (a) total PCA, (b) partial PCA {x1, x4}, (c) partial PCA {x1, x2}.

From the underlying data generation structure and the noise levels in eq 12, it can be observed that there are eight independent linear relationships among the variables, namely, 0.6x1 − 0.8x5 + x9, 0.6x1 − 0.8x5 + x10, x5 − x6, x7 − x8, x5 − x7, x1 − x2, x3 − x4, and x1 − x3. Thus, the intuitive selection of the numbers of nonzero elements for ShPCA loading vectors is NNE = [4 4 3 3 2 2 2 2 2 2] according to the discussion in ref 18. The first two numbers are selected as 4 because the first two principal components (PCs) explained 89.1% of the total variance and it is expected that ShPCA can recover the variance-dominant hidden variables s1 and s2 with x1−x4 and x5−x8, respectively.18

Table 1 summarizes a comparison of the results of PCA, SPCA, and ShPCA. The analysis of Table 1 highlights the following points: (i) In contrast to the PCA and ShPCA approaches, SPCA does not explain 100% of the total variance because of the loss of orthogonality among loading vectors. As a consequence, information in the original data set could be lost, which renders some incipient faults undetectable. (ii) SPCA is sensitive to the choice of NNE. Changing the NNE from 4 to 6 for the first PC allows an increase of nearly 11% in the explained variance. However, SPCA does not have a systematic way to balance sparsity and explained variance. 17479

dx.doi.org/10.1021/ie401030t | Ind. Eng. Chem. Res. 2013, 52, 17475−17486

Industrial & Engineering Chemistry Research

Article

Figure 3. Monitoring results for a step change of 2 in s1: (a) total PCA, (b) partial PCA {x2, x3}, (c) partial PCA {x1, x4}, (d) partial PCA {x1, x2}, (e) partial PCA {x9, x10}, (f) partial PCA {x3, x6, x10}.

of ShPCA, which is around 0.04% of the total variance. Note that the PCs of ShPCA do not rank strictly according to their variances because of the presence of the l1 norm term in eq 3. Figure 1 illustrates the results of using the BIC technique presented in section 3.3 to select the best λ value for the last five loading vectors of ShPCA. It is shown that BICλ,i changes slightly if λ/(N − 1) decreases below the value of 2−1; thus, the range of [2−6(N − 1) 22(N − 1)] is adequate for selecting a proper λ value. The analyses of the first five loading vectors yield similar results and are thus not provided here. Table 2 lists the loading vectors obtained by ShPCA, as well as the λ values selected by the BIC technique. According to the

It is not practical to select NNE by hand for industrial process data with a large numbers of variables. In addition, a priori knowledge of the underlying data structure and NNE does not help ShPCA to obtain a meaningful residual space either. (iii) With the same sparsity level of the loading vectors, ShPCA generally produces principal components with larger variances than SPCA, which can be explained by the difference between the design objectives, eqs 2 and 3, of the two approaches. (iv) Compared with PCA, ShPCA yields a sparser representation of the residual space based on the BIC without sacrificing much explained variance. More precisely, the maximum variance difference appears between the third PC of PCA and the fourth PC 17480

dx.doi.org/10.1021/ie401030t | Ind. Eng. Chem. Res. 2013, 52, 17475−17486

Industrial & Engineering Chemistry Research

Article

Figure 4. TE process and monitored variables.

Table 3. Selected Variables for Fault Detection and Isolation in the TE Process variable no.

description

variable no.

description

variable no.

description

y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 y11

feed A feed D feed E total feed recycle flow reactor feed rate reactor pressure reactor level reactor temperature purge rate product separator temperature

y12 y13 y14 y15 y16 y17 y18 y19 y20 y21 y22

product separator level product separator pressure product separator underflow stripper level stripper pressure stripper underflow stripper temperature stripper steam flow compressor work RCW outlet temperature SCW outlet temperature

u1 u2 u3 u4 u5 u6 u7 u8 u9 u10 u11

feed D flow valve feed E flow valve feed A flow rate total feed flow valve compressor recycle valve purge valve separator pot liquid flow valve stripper liquid product flow valve stripper steam valve RCW flow CCW flow

explained variances given in Table 1, the dimension of the residual space is selected as 8. The third, fifth, and eighth loading vectors contain relatively larger numbers of nonzero elements because of the orthogonal constraints. Their sparsity can be increased further by taking the generation of PCs 4, 6, 7, 9, and 10 into account. More precisely, the minor variances of these five PCs suggest that the following collinearities exist: x2 ≈ x3, x1 ≈ x4, x5 ≈ x6, x7 ≈ x8, and x9 ≈ x10. Therefore, the variables associated with the fifth and eighth PCs can be reduced to {x1, x2}and {x5 , x7 }, respectively. This, in turn, allows the variable set of the third PC to be simplified to {x1, x5, x10}. The last column of Table 2 summarizes the variable sets related to the loading vectors in the residual space. Following the discussion in section 4, eight partial PCA models were established, each one corresponding to one

variable set listed in Table 2. Because each variable set contains only one multicollinearity, the partial PCA models were constructed to have one-dimensional residual spaces. In contrast, the total PCA model retains a two-dimensional model space that explains over 89% of the total variance in the reference data set. The confidence limits of the T2 and SPE statistics were determined by the corresponding F and χ2 distributions1 for a significance level of 1%. To compare the fault detection and diagnosis performances of the partial PCA models with that of the traditional total PCA model, two data sets of 300 samples were simulated using eq 12 to produce the following two cases: (i) a sensor bias of 1.2 superimposed on the first recorded variable after the 150th sample x f1 = As + e + [1.2 0 0 0 0 0 0 0 0 0]T 17481

(13)

dx.doi.org/10.1021/ie401030t | Ind. Eng. Chem. Res. 2013, 52, 17475−17486

Industrial & Engineering Chemistry Research

Article

Figure 3 summarizes the application results of the total and partial PCA approaches to the data set representing case ii. Four partial PCA models are sensitive to the step change added in the hidden variable s1, whereas the other four did not trigger alarms. Inspection of the fault condition given in eq 14 suggests that it does affect the linear dependencies among variables. Unlike in case i, the SPE statistic of the total PCA approach indicated the incipient fault impacts correctly as the partial PCA approach did. Nevertheless, the T2 statistic of the total PCA showed fewer significant responses to the event compared with the partial PCA approaches. More precisely, Figure 3a illustrates that around 12% of the samples violated the 99% confidence limit of the total PCA T2 statistic, which is much less than the 50% for the partial PCA models in Figure 3b−d,f. Note that the T2 statistic of partial PCA model {x9, x10} in Figure 3e is less sensitive because of the fact that the step magnitude of s1 is multiplied by −0.6 to generate x9 and x10. Because the remaining three partial PCA models, namely, {x5, x6}, {x7, x8}, and {x5, x7}, are not sensitive to the fault condition, it can be concluded that some common factors beyond variables x1−x4, x9, and x10 are out of statistical control. Both cases i and ii highlight that the main advantage of the proposed ShPCA approach is the increased interpretability of the monitoring results, which consequently simplifies the task of fault isolation without a priori knowledge of the process. This advantage is further elaborated in the next section, which describes the analysis of data from the TE process.

Table 4. TE Process Faults fault no.

description

type

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

A/C feed ratio component B feed D temperature RCW inlet temperature CCW inlet temperature feed A loss C header pressure loss feed A−C components feed D temperature feed C temperature RCW inlet temperature CCW inlet temperature reaction kinetics RCW valve CCW valve unknown fault unknown fault unknown fault unknown fault unknown fault valve position (stream 4)

step change step change step change step change step change step change step change random variation random variation random variation random variation random variation slow drift sticking sticking unknown unknown unknown unknown unknown constant

Table 5. Loading Vectors Extracted by ShPCA loading vector (LV) no.

variance

variables with nonzero coefficients

1 2 3 4 5 6 7 8 9 10 11 12

4.37 × 10−8 5.00 × 10−8 3.96 × 10−4 0.0047 0.0040 0.0137 0.0241 0.0411 0.0566 0.1749 0.5098 0.3609

y12, u7 y15, u8 y17, u11 y7, y13 y1, u3 y5, y7, y13, y16, u5 y19, u9 y18, y19, u9 y10, u6 y16, y20, u5 y11, y18, y22 y9, u10

6. TE APPLICATION STUDY This section examines the performance of the proposed ShPCAbased fault detection and isolation strategy on the Tennessee Eastman (TE) process. The TE process is a widely used benchmark process for testing the performance of different process control and monitoring techniques.24 It involves a total of five operation units: a reactor, a flash separator, a stripper, a condenser, and a recycle compressor. Two products and one byproduct are produced in the TE process. A flowchart of the TE process is shown in Figure 4. During the operation of the TE process, a total of 41 measured variables and 12 manipulated variables are recorded. A detailed description of the process and control structure can be found in ref 24. For the purposes of process monitoring, 22 measured variables and 11 manipulated variables were selected and are listed in Table 3. In Table 3, RCW is an abbreviation for reactor cooling water, CCW stands for condenser cooling water, and SCW denotes separator cooling water. For the purposes of fault detection and isolation, a normal data set containing N = 960 samples was generated, and 21 faulty data sets, each containing 960 samples, were generated, as in ref 25. During the simulations, the operating conditions were kept constant and started with no fault. The faults were introduced 1 and 8 h into the simulations for the training and test data sets. The data were sampled every 3 min, and the plantwide control structure recommended by ref 26 was used. The faults were introduced into the faulty data sets from the 161st sample. Descriptions of the 21 faults are provided in Table 4. To construct the normal operation model, the ShPCA approach proposed in section 3 was carried out on the normal data set. The BIC was used to determine the value of λ during the optimization of ShPCA. A total of 21 components with relatively large variances were extracted to represent the model space and captured about 96.39% of the total variance. As the residual space typically describes the material

and (ii) a step change of 2 to the first hidden variable after the 100th sample

x f1 = Asf + e sf = s + [2 0]T

(14)

Figure 2 shows the monitoring results of applying the total PCA model with all variables and three partial PCA models to the data set representing case i. The remaining five partial PCA models did not detect the introduced sensor bias, and their results are not given. As depicted in Figure 2a, the total PCA model detected this event in the residual space, whereas the T2 statistic did not show a significant number of violations. Although the fault was successfully detected, only a small portion of the SPE values exceeded the confidence limit, namely, 24.7%. In contrast, the partial PCA models, as shown in Figure 2b,c, correctly revealed the impact of the fault on the residual space. More precisely, the presence of the sensor bias resulted in a larger departure of the faulty samples from the reference model planes of {x1, x4}and {x1, x2}, namely, 82.5% and 76.8%. Note that the two partial PCA models share one common variable, x1, which facilitates the isolation of the incipient fault type. 17482

dx.doi.org/10.1021/ie401030t | Ind. Eng. Chem. Res. 2013, 52, 17475−17486

Industrial & Engineering Chemistry Research

Article

Figure 5. Monitoring results for fault 5: (a) total PCA, (b) partial PCA 3 {y17, u11}, (c) partial PCA 7 {y19, u9}, (d) partial PCA 11 {y11, y18, y22}.

Figure 6. Impacts of fault 5 on (a) y18, (b) y22, (c) u9, (d) u11.

vectors was achieved. Table 5 gives the variables with nonzero coefficient in each of the 12 loading vectors, as well as the variance explained along each loading vector.

and energy balances, the 12 loading vectors were used for constructing partial PCA models for process monitoring. With a penalty on the l1 norm, a sparse representation of the loading 17483

dx.doi.org/10.1021/ie401030t | Ind. Eng. Chem. Res. 2013, 52, 17475−17486

Industrial & Engineering Chemistry Research

Article

Figure 7. Monitoring results for fault 10: (a) total PCA, (b) partial PCA 7 {y19, u9}, (c) partial PCA 8 {y18, y19}, (d) partial PCA 11 {y11, y18, y22}.

Figure 8. Impacts of fault 10 on (a) y11, (b) y18, (c) y19, (d) u9.

the projected variance along LV3 is close to zero, meaning that the dominant variables in LV3 are perfectly correlated. This can be easily explained, as the stripper underflow (y17) is affected by

As shown in Table 5, the ShPCA algorithm proposed in this article obtained a sparse residual space with some loading vectors (LVs) closely related to the process mechanism. For example, 17484

dx.doi.org/10.1021/ie401030t | Ind. Eng. Chem. Res. 2013, 52, 17475−17486

Industrial & Engineering Chemistry Research

Article

the operators into believing that the fault has been corrected by the control operations. The monitoring results of most partial PCA models are similar, except for {y17, u11}. As illustrated in Figure 5b, partial PCA model {y17, u11} provides a different picture, and the fault remains detectable after the 370th sample. Figure 6 shows some process variables affected by fault 5. The manipulated variable u9 (stripper steam valve) underwent adjustment at sample 161 and returned to normal after the 360th sample. The control action pulled the measured variables y18 (stripper temperature) and y22 (separator cooling water outlet temperature) back to normal. In contrast, only the manipulated variable u11 (condenser cooling water flow) was still higher than its normal value, which is also in accordance with the monitoring results in Figure 5. Thus, it can be concluded that, by setting a different condenser cooling water flow rate, the control system can recover normal process operation, and the incipient fault is most likely caused by the change in the cooling water temperature. On the other hand, fault 10 is a random variation in the feed C temperature in stream 4. The variation in the feed C temperature causes a change in the conditions of the stripper and then the condenser. The monitoring results of total PCA and ShPCA are shown in Figure 7. Figure 7a shows that the random variation in the feed C temperature can be detected by the total PCA, but with a lower sensitivity. On the other hand, the monitoring results in Figure 7c reveal that the fault can also be detected by the partial PCA models {y18, y19} with a higher sensitivity. Note that only three partial PCA models are constantly sensitive to fault 10, and this reveals important information on how the fault affects the process, as the three partial PCA models are connected by the variable y18 (the stripper temperature). This can be easily explained as follows, the change in the feed C temperature influenced the stripper temperature first and the stripper steam valve was then manipulated to reduce the disturbance. This, in turn, influences the steam flow rate y19. Figure 8 illustrates the impacts of fault 10 on the variables y11, y18, y19, and u9. A larger fluctuation can be observed for all four variables after the 161st sample, in accordance with the above analysis. By analyzing the monitoring results of the partial PCA models, it is possible to divide the faults into two categories, namely, plantwide faults and local faults. Plantwide faults affect a large number of variables and, hence, can be detected by many partial PCA models. Local faults, on the other hand, can only be detected by several partial PCA models. Table 6 gives the specific fault detection results for the partial PCA models. Note that the bold numbers highlight that the corresponding partial PCA models are constantly sensitive to the faults; that is, at least one of the T2 and SPE statistics keeps violating the confidence limit, and the control system cannot compensate the fault effects. It can be seen that faults 3, 4, 9, 11, 17, and 19 can be detected only by a few of partial PCA models, so that they can be classified as local faults. On the other hand, faults 6, 7, 8, 12, 13 and 18 can be detected by almost all of the partial PCA models except {y12, u7} and {y15, u8}, so that they can be classified as plantwide faults.

Table 6. Fault Detection Results of Partial PCA Models fault no.

detected by partial PCA models

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

3, 4, 5 6, 7, 8, 9, 10, 11 3, 7, 8, 9, 10, 11 7, 8, 10, 11 12 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 3, 7, 10, 11 3, 4, 5, 6, 7, 8, 10, 11 4, 7, 8, 12 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 12 3, 4, 7, 8 3, 4, 6, 7, 8, 10, 11 4, 6, 8, 10, 11 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 6, 7, 10 3, 4, 6, 8, 10, 11 3, 4, 6, 7, 8, 10, 11

the condenser cooling water flow (u11). Similarly, the variance along LV4 is also close to zero, which reflects the relation between the reactor pressure (y7) and the product separator pressure (y13). LV6 and LV10 also exhibit relationships between the pressure in the process tanks, and LV11 reflects the temperature in the tanks. It should be noted that some of the loading vectors are related to feedback links. For example, LVs 1, 2, 5, 7, and 9 exhibit very high correlations (close to 1) between the variables that are not related to any process mechanisms. They are probably caused by the controller effects. In this case, operators might need to intervene and make a judgment as to whether they reflect a true process relationship. It should be noted, however, that even the equations related to feedback links will be helpful for fault detection and could provide some clues for fault isolation. As it is known that the controller is related to the process, a stable relation in the controller is also related to the process itself. This clearly illustrates one of the most important advantages of ShPCA, namely, that the LVs extracted are easy to interpret. Note that the sparsity of LV6 and LV8 can be increased by removing the bold variables as the collinearities of y7 ≈ y13 and y19 ≈ u9 exist. For the purpose of fault detection, 12 partial PCA models were then constructed using the dominant variables in each LV, each model corresponding to one variable set in Table 5. Again, the total PCA model, which retains a 14-dimensional model space explaining over 85% of the total variance in the reference data set. The confidence levels of the T2 and SPE statistics were set as 99%. The monitoring results of two typical faults, namely, fault 5 and fault 10, are considered following the discussion in ref 27. Fault 5 is a step change in the CCW inlet temperature, which causes a change in the cooling ability and influences the vapor/ liquid ratio of the input flow of separator. The monitoring results of the total PCA and partial PCA models are shown in Figure 5. As shown in Figure 5a, the total PCA successfully detected the fault at an early stage but went undetected after the 370th sample as a result of control compensation. This might mislead

7. CONCLUSIONS Process monitoring and fault isolation using traditional PCA has been well-defined in the literature. However, the main disadvantage of traditional PCA is that the principal components are difficult to interpret and, hence, it is not straightforward to use the 17485

dx.doi.org/10.1021/ie401030t | Ind. Eng. Chem. Res. 2013, 52, 17475−17486

Industrial & Engineering Chemistry Research

Article

(23) Sjöstrand, K.; Clemmensen, L. H.; Larsen, R.; Ersbøll, B. SpaSM: A Matlab Toolbox for Sparse Statistical Modeling. J. Stat. Software, manuscript submitted. Details available at http://www2.imm. dtu.dk/projects/spasm/. (24) Downs, J. J.; Vogel, E. F. Comput. Chem. Eng. 1993, 17, 245− 255. (25) Chiang, L. H.; Russel, E. L.; Braatz, R. D. Chemom. Intell. Lab. Syst. 2000, 50, 243−252. (26) Lyman, P. R.; Georgakis, C. Comput. Chem. Eng. 1995, 19, 321− 331. (27) Ge, Z. Q.; Song, Z. H. Ind. Eng. Chem. Res. 2013, 52, 1947− 1957.

monitoring results for fault isolation. This article proposes a sparse form of PCA (ShPCA) for fault detection and isolation. ShPCA extracts the residual components and achieves sparsity in the loading vectors by imposing a constraint of the l1 norm in the objective function of PCA. The optimization problem can be solved using an iterative QP algorithm, and the weight on the l1 norm is determined by a BIC. A fault detection and isolation strategy is then proposed using ShPCA by introducing a set of partial PCA models. A simulation example and application to the TE process show that the proposed strategy is more efficient than traditional PCA.



AUTHOR INFORMATION

Corresponding Authors

*E-mail: [email protected]. *E-mail: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work was supported by the National Natural Science Foundation of China (Grants 61374121, 61203088, and 61134007), Science and Technology Project of Zhejiang Province (Grant 2012C01026-1), and Natural Science Foundation of Zhejiang Province (Grant LQ12F03015).



REFERENCES

(1) Kruger, U.; Xie, L. Advances in Statistical Monitoring of Complex Multivariate Processes: With Applications in Industrial Process Control; Wiley: Chichester, U.K., 2012. (2) Narasimhan, S.; Shah, S. L. Control Eng. Pract. 2008, 16, 146− 155. (3) Qin, S. J. J. Chemom. 2003, 17, 480. (4) Yu, J.; Qin, S. J. AIChE J. 2008, 54, 1811−1829. (5) Cheng, H.; Nikus, M.; Jamsa-Jounela, S.-L. Chemom. Intell. Lab. Syst. 2008, 92, 186−199. (6) Anderson, T. W. An Introduction into Multivariate Statistical Analysis, 3rd ed.; John Wiley & Sons: New York, 2003. (7) Qi, X.; Luo, R. Y.; Zhao, H. Y. J. Multivariate Anal. 2013, 114, 127−160. (8) Gertler, J. Control Eng. Pract. 1997, 5, 653−661. (9) Gertler, J.; Cao, J. J. Process Control 2005, 15, 585−603. (10) Jolliffe, I. T.; Trendafilov, N. T.; Uddin, M. J. Comput. Graph. Stat. 2003, 12, 531−547. (11) Zou, H.; Hastie, T. J. R. Stat. Soc., Ser. B 2005, 67, 301−320. (12) Shen, H.; Huang, J. Z. J. Multivariate Anal. 2008, 99, 1015− 1034. (13) Yu, J. J. Process Control 2012, 22, 778−788. (14) Yu, J. Chem. Eng. Sci. 2012, 68, 506−519. (15) Cadima, J.; Jolliffe, I. T. J. Appl. Stat. 1995, 22, 203−214. (16) Ge, Z.; Kruger, U.; Lamont, L.; Xie, L.; Song, Z. Mech. Syst. Signal Process. 2010, 24, 2972−2984. (17) Dunia, R.; Qin, S. J. Comput. Chem. Eng. 1998, 22, 927−943. (18) Zou, H.; Hastie, T.; Tibshirani, R. J. Comput. Graph. Stat. 2006, 15, 265−286. (19) An, B.; Guo, J.; Wang, H. Comput. Stat. Data Anal. 2013, 62, 93−107. (20) Mackey, L. Deflation Methods for Sparse PCA. Presented at Neural Information Processing Systems (NIPS ‘08), Vancouver, Canada, Dec 8−11, 2008. (21) Trench, W. F. Introduction to Real Analysis; Prentice Hall/ Pearson Education: Upper Saddle River, NJ, 2003. (22) Golub, G. H.; van Loan, C. F. Matrix Computation, 3rd ed.; John Hopkins Studies in Mathematical Sciences; John Hopkins University Press: Baltimore, MD, 1996. 17486

dx.doi.org/10.1021/ie401030t | Ind. Eng. Chem. Res. 2013, 52, 17475−17486