Process Monitoring with GlobalâLocal Preserving Projections

Article pubs.acs.org/IECR

Process Monitoring with Global−Local Preserving Projections Lijia Luo* College of Mechanical Engineering, Zhejiang University of Technology, Hangzhou 310014, China ABSTRACT: A novel dimensionality reduction algorithm named “global−local preserving projections” (GLPP) is proposed. Different from locality preserving projections (LPP) and principal component analysis (PCA), GLPP aims at preserving both global and local structures of the data set by solving a dual-objective optimization function. A weighted coefficient is introduced to adjust the trade-off between global and local structures, and an efficient selection strategy of this parameter is proposed. Compared with PCA and LPP, GLPP is more general and flexible in practical applications. Both LPP and PCA can be interpreted under the GLPP framework. A GLPP-based online process monitoring approach is then developed. Two monitoring statistics, i.e., D and Q statistics, are constructed for fault detection and diagnosis. The case study on the Tennessee Eastman process illustrates the effectiveness and advantages of the GLPP-based monitoring method.

1. INTRODUCTION Process safety and high product quality are always two important issues of industrial processes.1 With the development of modern technology, industrial processes work under larger scales, at higher speeds, and under more complex and even more dangerous operating conditions.2 Therefore, process safety and quality consistency attract more and more attention. Process monitoring is an efficient technique to ensure safe production and high quality products through detecting and diagnosing faults induced by process upsets, equipment malfunctions, or other special events as early as possible. The effective process monitoring can provide the opportunity to take corrective actions before the fault disrupts production or damages equipment. However, monitoring of industrial processes is a challenging problem, because it suffers from complicated process characteristics, such as multiparameter, multimode, heavy coupling, nonlinearity, time-variant, high uncertainty, and so on. In the past two decades, with the quick development and application of modern measurement technology, large amounts of process data have been recorded and collected, and also datadriven process monitoring or multivariate statistical process monitoring (MSPM) methods have become more and more popular in industrial processes.1,3 A common feature of MSPM methods is to perform dimensionality reduction on process data to obtain latent variables and then develop the monitoring model using latent variables.4 So far, many MSPM methods have been proposed based on principal component analysis (PCA),5 partial least squares (PLS),6 independent component analysis (ICA),7 etc. Extensions of these methods to deal with nonlinear, time-variant, and multimode process data are also available, such as kernel PCA (KPCA),8 kernel PLS (KPLS),9 kernel ICA (KICA),10 recursive PCA,11 recursive PLS,12 dynamic PCA and PLS,13 window PCA,14 etc. A detailed investigation of MSPM methods can be found in the recent survey papers.1,3 The PCA-based process monitoring method and its various extension versions is one kind of the most widely used MSPM approaches. A major drawback of these methods is that PCA only focuses on the preserving of variance information, i.e., global Euclidean structure, of the data set, while it ignores the local neighborhood structure. However, the local structure is © 2014 American Chemical Society

also an important aspect of data characteristics, because it defines the internal organization of data points in the data set. The loss of such crucial information may make PCA-based methods more easily affected by outliers and noises, and thus have less discriminating power.15 As a result, the process monitoring efficiency is reduced. As opposed to PCA, manifold learning algorithms in the pattern recognition area aim to preserve the local structure of the data set, e.g., locality preserving projections (LPP),15 isometric feature mapping,16 locally linear embedding (LLE),17 and Laplacian eigenmap (LE).18 In particular, LPP proposed by He and Niyogi15 is a novel linear projection algorithm that can optimally preserve the neighborhood structure of the data set. Several LPP-based process monitoring methods have been proposed, and they exhibited better performance than conventional PCA-based methods.19,20 However, a main drawback of LPP-based methods is that they do not explicitly take the global structure of the data set into account, which may lead to the loss of variance information. Therefore, if the data set has significant directions of variance information, LPP-based methods may perform poorly. To develop an efficient process monitoring method, both the global Euclidean structure and the local neighborhood structure of the data set need to be well considered in data projection. However, either the PCA-based or the LPP-based method only considers one aspect of them. By combining PCA and LPP, Zhang et al.4 and Yu21 have proposed global−local structure analysis (GLSA) as well as local and global principal component analysis (LGPCA) for process monitoring, respectively. Their performances have been proved to be better than PCAbased and LPP-based monitoring methods.4,21 However, these two methods just simply bond PCA and LPP together, but fail to reveal their intrinsic relationships with PCA and LPP. In this paper, a new data projection algorithm named “global−local preserving projections” (GLPP) is proposed. GLPP aims at preserving global and local structures of the data set Received: August 16, 2013 Accepted: April 5, 2014 Published: April 6, 2014 7696

dx.doi.org/10.1021/ie4039345 | Ind. Eng. Chem. Res. 2014, 53, 7696−7705

Industrial & Engineering Chemistry Research

Article

where yi is the projection of xi⃗ , W is an adjacency weighting matrix, and D is a diagonal matrix with Dii = ∑j Wij. Each element Wij of the matrix W denotes the neighborhood relationship between xi⃗ and xj⃗ , which incurs a heavy penalty if neighboring points x⃗i and xj⃗ are mapped far apart. The construction of W can be found in the literature.24 The optimization problem eq 3 can be transformed into a generalized eigenvalue problem:

simultaneously. The objective function of GLPP consists of two subobjective functions, where each subobjective corresponds to one aspect of data structure preserving. A weighted coefficient is introduced to adjust the trade-off between two subobjectives. Both PCA and LPP are interpreted under the GLPP framework. Based on GLPP, a process monitoring method is then developed. Similar to PCA, two statistics, i.e., SPE (Q) and Hotelling’s T2 (D) statistics, are built for fault detection. The contribution plot is used for fault diagnosis. The performance of the GLPPbased monitoring method is tested in the Tennessee Eastman process (TEP). The results indicate that it outperforms PCA- and LPP-based methods.

XLXTa ⃗ = λXDXTa ⃗

where L = D − W is the Laplacian matrix. Calculating eigenvectors and eigenvalues of eq 4 yields a set of vectors a1⃗ , a⃗2, ..., al⃗ , which are ordered according to their eigenvalues, λ1 < λ2 < ... < λl. Thus, the projection is constructed as

2. BACKGROUND TECHNIQUES 2.1. Principal Component Analysis. Principal component analysis (PCA) is a classical dimensionality reduction algorithm. The basic idea of PCA is to find a low-dimensional representation of the data set that describes as much of the data variance as possible. Let x⃗1, x2⃗ , ..., x⃗n ∈ ℜm denote a set of samples. PCA seeks to find a projection axis p⃗, such that the mean square of Euclidean distances between all pairs of projected points y1, y2, ..., yn (yi = p⃗Txi⃗ ) is maximized: JPCA (p ⃗ ) = max p⃗

1 n

∑ (yi − y ̅ )(yi − y ̅ )T i

xi⃗ → yi ⃗ = ATxi⃗ ,

(1)

p⃗T p⃗ = 1

where y ̅ = ∑i yi/n. For the data matrix X = [x1⃗ , x2⃗ , ..., xn⃗ ]T ∈ ℜn × m, PCA decomposes it as l

∑ tip⃗ i⃗ T

+ E = TPT + E (2)

i=1

where ti⃗ is the score vector that is also called the principal component (PC) vector, T is the score matrix, p⃗i is the loading vector, P is the loading matrix, and E is the residual matrix. By retaining the first l PCs, most of the variance information on the data set is extracted and the dimension is reduced. Cumulative percent variance (CPV) and cross-validation methods are widely used to determine the most appropriate retained number of PCs. More detailed introductions of PCA can be found in refs 22 and 23. 2.2. Locality Preserving Projections. Locality preserving projections (LPP) is a newly developed dimensionality reduction algorithm.15 It is a linearized version of Laplacian eigenmaps.24 Different from PCA, LPP finds a mapping that optimally preserves the local neighborhood structure of the data set. For the m-dimensional data set X = [x⃗1, x2⃗ , ..., xn⃗ ] ∈ ℜm × n with n samples, LPP seeks a transformation matrix A ∈ ℜm × l (l ≤ m) that maps X to Y = [y1⃗ , y2⃗ , ..., yn⃗ ] ∈ ℜl × n , such that yi⃗ represents xi⃗ , i.e., yi⃗ = ATxi⃗ . Considering the problem of mapping X to a vector y⃗ = [y1, y2, ..., yn]T by a transformation vector a⃗, i.e., yT⃗ = aT⃗ X, the objective function of LPP is 1 JLPP (a ⃗) = min a⃗ 2

∑ (yi − yj )2 Wij ij

A = [a1⃗ , a 2⃗ , ..., al⃗ ]

(5)

3. GLOBAL−LOCAL PRESERVING PROJECTIONS 3.1. Algorithm Description. Both PCA and LPP are capable of projecting the high-dimensional data into a lowerdimensional space, while they are based on totally different projection principles. PCA focuses on preserving the global Euclidean structure (variance information) of the data set without considering local neighborhood relationships among data points, which may destroy the internal geometric structure of the data set. By contrast, LPP optimally preserves the local neighborhood structure of the data set, while it does not guarantee a good projection for the global Euclidean structure, which may map distant points into a compact area, leading to the loss of variance information and the break of the global structure of the data set. In order to overcome drawbacks of PCA and LPP, a new dimensionality reduction algorithm named “global−local preserving projections” (GLPP) is proposed. GLPP aims to find a mapping that preserves both local and global structures of the data set. Given the m-dimensional data set X = [x⃗1, x2⃗ , ..., xn⃗ ] ∈ ℜm × n with n samples, GLPP seeks a transformation matrix A ∈ ℜm × l that projects X to Y = [y1⃗ , y2⃗ , ..., yn⃗ ] ∈ ℜl × n (l ≤ m), where yi⃗ = ATxi⃗ , and makes sure that Y well retains global and local structures of X. Considering the problem of mapping the data set X to a one-dimensional vector y⃗ = [y1, y2, ..., yn]T, i.e., yT⃗ = a⃗TX, where a⃗ is a transformation vector, the objective function of GLPP is constructed as follows:

s.t.

X=

(4)

min{JLocal (a ⃗), JGlobal (a ⃗)} a⃗

(6)

where the subobjective function JLocal(a⃗) = (1/2)∑ij (yi − yj)2Wij corresponds to the preserving of local structure, and the global structure is preserved by the subobjective function JGlobal(a⃗) = (−1/2)∑ij (yi − yj)2W̅ ij. W is an adjacency weighting matrix with each element Wij representing the adjacent relationship between x⃗i and xj⃗ , where a nonzero Wij is specified for points xi⃗ and x⃗j that are adjacent to each other in the space. A possible way of defining Wij is ⎧ − ⎪e Wij = ⎨ ⎪ ⎩0

(3)

xi⃗ − xj⃗

2

/σ

if xj⃗ ∈ Ω(xi⃗ ) or xi⃗ ∈ Ω(xj⃗ )

(7)

otherwise

where σ is a suitable constant to be determined empirically.24 The function exp(−∥xi⃗ − xj⃗ ∥2/σ) is the heat kernel.18 ∥·∥ is the Euclidean norm. Ω(x⃗) denotes the neighborhood of x⃗, which can be defined in two ways: k nearest neighbors and ε neighbors.15,17

s.t.

a ⃗ TXDXTa ⃗ = 1 7697



Article

where Rij = ηWij − (1 − η)W̅ ij, and its corresponding matrix R obtained by R = ηW − (1 − η)W̅ can be considered as a complete relationship matrix for any pair of xi⃗ and xj⃗ in the data set X. 3.2. Selection of Parameter η. The value of η has a great effect on the performance of GLPP, since it determines the importance of two subobjective functions JLocal(a⃗) and JGlobal(a⃗). In other words, η controls the trade-off between the preserving of local structure and the preserving of global structure of the data set. Inspired by ref 4, scales of JLocal(a)⃗ and JGlobal(a)⃗ can be defined as follows:

In this paper, the k nearest neighbors is used. As opposed to W, W̅ is a nonadjacent weighting matrix with each element W̅ ij representing the nonadjacent relationship between xi⃗ and xj⃗ , where a nonzero W̅ ij is specified for points xi⃗ and xj⃗ that are not adjacent to each other in the space. W̅ ij is defined as ⎧ − ⎪e Wij̅ = ⎨ ⎪ ⎩0

2

xi⃗ − xj⃗

/σ

if xj⃗ ∉ Ω(xi⃗ ) and xi⃗ ∉ Ω(xj⃗ )

(8)

otherwise

Clearly, both Wij and W̅ ij fall into the interval [0,1] for any pair of x⃗i and xj⃗ , and they decrease monotonically with the increase of the distance between x⃗i and xj⃗ . The subobjective function JLocal(a⃗) can be rewritten as JLocal (a ⃗) = =

1 2

ηSLocal = (1 − η)SGlobal

y T − ∑ yW yT ∑ yD i ii i i ij j

η=

ij

∑ a⃗

xi⃗ Diixi⃗ Ta ⃗

−

∑ a⃗

i

T

xi⃗ Wijx j⃗ Ta ⃗

(9)

= a ⃗ TX(D − W )XTa ⃗ = a ⃗ TXLXTa ⃗

where X = [x1⃗ , x2⃗ , ..., xn⃗ ], D is a diagonal matrix with Dii = ∑j Wij, and L = D − W is the Laplacian matrix. Similarly, the subobjective function JGlobal(a⃗) can be reduced to

a⃗

a⃗

ij

y ∑ yW i ij̅ j

a⃗

T

∑ a ⃗ Txi⃗ Wij̅ xj⃗ Ta ⃗

i T

i

ij T

xi⃗ Hiixi⃗ Ta ⃗

−

i

∑ a ⃗ Txi⃗ R ijxj⃗ Ta ⃗}

(15)

ij

a⃗

where H is a diagonal matrix with Hii = ∑j Rij, and M = H − R is the Laplacian matrix. Because a bigger value of Hii denotes a more “important” yi, and, also, to avoid the singularity of XHXT in the undersampled situation, the following constraint is imposed on the objective function of GLPP:

T

T = −a ⃗ TXLX ̅ a⃗

where D̅ is a diagonal matrix with D̅ ii = ∑j W̅ ij, and L̅ = D̅ − W̅ is the Laplacian matrix. Equation 6 is a dual-objective optimization problem. Generally, it is difficult to get an absolutely optimal solution for such a problem due to the conflict between two subobjectives. In this paper, the weighted sum method is applied to solve this dual-objective optimization problem. By introducing a weighted coefficient η ∈ [0,1], the dual-objective function eq 6 is transformed to a single objective function:

a ⃗ T(ηXHXT + (1 − η)I )a ⃗ = a ⃗ TNa ⃗ = 1

(16)

Finally, the optimization problem of GLPP is

min a ⃗ TXMXTa ⃗

(17)

a⃗

s.t. a ⃗ TNa ⃗ = 1

Thus, the optimal transformation vector a⃗ can be found by solving a generalized eigenvector problem:

JGLPP (a ⃗) = min{ηJLocal (a ⃗) + (1 − η)JGlobal (a ⃗)} a⃗

1 {η ∑ (yi − yj )2 Wij 2 ij

− (1 − η) ∑ (yi − yj )2 Wij̅ }

yT} ∑ yR i ij j

= min a ⃗ TXMXTa ⃗

= −a ⃗ X(D̅ − W̅ )X a ⃗

XMXTa ⃗ = λNa ⃗

(18)

Let eigenvectors a1⃗ , a2⃗ , ..., al⃗ are the solutions of eq 18, which are ordered according to their eigenvalues λ1 < λ2 < ... < λl, the desired transform matrix A for preserving both global and local structures of the data set X can be constructed as follows:

(11)

ij

a⃗

ij

a⃗

(10)

ij

1 2

∑ (yi − yj )2 R ij

= min a ⃗ TX (H − R )XTa ⃗

ij

= −∑ a ⃗ Txi⃗ Dii̅ xi⃗ Ta ⃗ +

= min

1 2

= min{∑ a ⃗

i

a⃗

(14)

yT − = min{∑ yH i ii i

∑ (yi − yj ) Wij̅

= −∑ yD y + i ii̅ i

= min

ρ(L̅ ) ρ(L) + ρ(L̅ )

JGLPP (a ⃗) = min

2

T

(13)

Actually, except for eq 14, η can be chosen based on other selection principles according to features of different data sets, which makes GLPP more flexible in practical applications. 3.3. Algorithm Formulation. After the weighted coefficient η was selected, the objective function of GLPP described by eq 11 can be reduced to

ij

1 JGlobal (a ⃗) = − 2

(12)

Thus, the parameter η is

ij

T

SGlobal = ρ(L̅ )

where ρ(·) denotes the spectral radius of the matrix. To balance JLocal(a)⃗ and JGlobal(a)⃗ , η is selected based on the principle

∑ (yi − yj )2 Wij

i

=

SLocal = ρ(L),

∑ (yi − yj )2 R ij

xi⃗ → yi ⃗ = ATxi⃗ ,

ij

7698

A = [a1⃗ , a 2⃗ , ..., al⃗ ]

(19)



Article

3.4. Algorithm Analysis. Similar to LPP and PCA, GLPP is a linear dimensionality reduction algorithm that provides a linear mapping from the original data space to a lowerdimensional space, while their starting points and objective functions are different. To give a better understanding of relationships of GLPP with PCA and LPP, a deeply theoretical perspective of GLPP is revealed. 1. Relationship with PCA. The relationship between GLPP and PCA can be expressed as the following lemma. Lemma 1: If η = 0 and k (or ε) is sufficiently small such that neighborhood relationships among data points are not considered, then GLPP is converted to the weighted PCA. If further the parameter σ is sufficiently large or the nonadjacent weighting matrix W̅ = 1n1Tn , then GLPP is reduced to the normal PCA. Proof: The objective function eq 1 of PCA can be transformed to JPCA (p ⃗ ) = max p⃗

1 n

⎡1 = max⎢ p⃗ ⎢ ⎣n = max p⃗

= max p⃗

= max p⃗

is considered. In this case, the objective of GLPP is to minimize JLocal(a⃗), and its constraint is aT⃗ XDXTa⃗ = 1. This is exactly the optimization problem of LPP. Thus, LPP can be viewed as the other limiting case of GLPP, which only considers the preserving of local structure. Furthermore, LPP solves the generalized eigenvector problem of XLXTa⃗ = λXDXTa⃗, which may suffer from the singularity of XDXT in the undersampled situation. However, GLPP overcomes this deficiency by adding an item (1 − η)I in the constraint eq 16. 3. Computation of GLPP. The computation of GLPP is mainly divided into three steps: search of k nearest neighbors, calculation of the parameter η, and solving a generalized eigenvector problem. The computational complexity of each step is O[(m + k)n2], O(2n2), and O[lm2], respectively. As a result, the computational complexity of GLPP is about O[(m + k + 2)n2 + lm2]. In contrast, the computational complexities of PCA and LPP are O(lm2) and O[(m + k)n2 + lm2], respectively. Therefore, the computational complexity of GLPP is at the same level as LPP, but larger than PCA.

∑ (yi − y ̅ )(yi − y ̅ )T i

⎤

∑ yyi iT − y ̅ y ̅ T ⎥

4. PROCESS MONITORING WITH GLPP 4.1. Development of NOC Model with GLPP. Based on the proposed GLPP algorithm, a process monitoring method can be developed. Let X = [x1, x2, ..., xn]T ∈ ℜn × m denote a training data set obtained under normal operation conditions (NOC), where n and m are numbers of samples and measured process variables, respectively. Performing GLPP on the data set X, the NOC model is built as

⎥⎦

i

1 T − (∑ yi )(∑ yjT )] [n ∑ yy i i n2 i i j 1 T T − 2 ∑ yy [2n ∑ yy ] i i i j 2n2 i ij 1 2n2

1 = max 2 p ⃗ 2n

(20)

X = YAT + E

∑ (yyi iT − 2yyi jT + yj yjT )

Y = XA(ATA)−1

ij

∑ (yi − yj )

E = X − YA

ij

where Y = [y1⃗ , y2⃗ , ..., yn⃗ ]T ∈ ℜn × l is the latent variable matrix, A = [a⃗1, a2⃗ , ..., al⃗ ] ∈ ℜm × l is the transformation matrix, l is the number of latent variables, and E ∈ ℜn × m is the residual matrix. Matrices Y and A are similar to the score matrix and the loading matrix of PCA. For a new data sample xn⃗ ew ∈ ℜm, it can be projected onto the NOC model:

It is obvious that, under the assumption of lemma 2, the optimization problem of GLPP will be reduced to JGLPP (a ⃗) = max a⃗

1 2

∑ (yi − yj )2 Wij̅ ij

(22)

T

2

(21)

s.t.

xnew ⃗ = Aynew ⃗ + enew ⃗

a ⃗ Ta ⃗ = 1

T −1 T ynew ⃗ = (A A) A xnew ⃗

which is the form of weighted PCA. If further the parameter σ is sufficiently large or the nonadjacent weighting matrix W̅ = 1n1Tn , such that W̅ ij = 1, then JGLPP(a)⃗ is proportional to JPCA(p⃗), i.e., JGLPP(a⃗) ∝ JPCA(p⃗), which means that GLPP is consistent with the normal PCA. Lemma 1 indicates that PCA could be viewed as a limiting case of GLPP, which seeks to retain the global structure of the data set to the maximum extent and totally ignores the preserving of local structure. Lemma 1 also proves that the global characteristic of GLPP is completely in accord with PCA. 2. Relationship with LPP. The relationship between GLPP and LPP can be expressed as the following lemma. Lemma 2: If the weighted coefficient η is set to be its maximum value 1, then GLPP will be reduced to LPP. Proof: GLPP introduces a weighted coefficient η to adjust the trade-off between local structure preserving and global structure preserving. If η = 1, only the local structure preserving

(23)

enew ⃗ = xnew ⃗ − Aynew ⃗

4.2. Fault Detection and Diagnosis. Similar to the PCAbased monitoring method, the Hotelling’s T2 (D) and SPE (Q) statistics are applied for GLPP-based process monitoring. The D statistic is defined as T −1 Dnew = ynew ⃗ S ynew ⃗

(24)

where S is the covariance matrix of the latent variable matrix Y of the training data set, i.e., S = YTY/(n − 1). The Q statistic is defined as Q new = enew ⃗

2

= xnew ⃗ − Aynew ⃗

2

(25)

To determine whether the process is operated under normal operating conditions, the control limits of D and Q statistics must be calculated. Under the assumption that samples follow 7699



Article

Figure 1. (a) Swiss-roll data set. (b) The 2-D projection result of PCA. (c) The 2-D projection result of LPP. (d) The 2-D projection result of GLPP.

Figure 2. (a) Intersecting data set. (b) The 2-D projection result of PCA. (c) The 2-D projection result of LPP. (d) The 2-D projection result of GLPP.

Q statistic can be obtained by a weighted χ2-distribution.27

the multivariate normal distribution, the D statistic follows the F-distribution.25 Thus, the control limit of the D statistic is calculated as26,27 Dc, α

l(n2 − 1) Fα(l , n − l) ∼ n(n − l)

In this paper, a normal distribution is used to approximate the χ2-distribution of the Q statistic:26,28

(26)

Q c, α

where Fα(l,n − l) is an F-distribution with l and n − l degrees of freedom at significance level α. The control limit of the 7700

⎡ ⎛1 − h ⎞ 0 ⎟+ = θ1⎢1 − θ2h0⎜ 2 ⎢ θ ⎠ ⎝ 1 ⎣

⎤1/ h0 zα(2θ2h0 2) ⎥ ⎥ θ1 ⎦

(27)



Article

Figure 3. The 2-D projection results of GLPP with different parameters. (a and b) Swiss-roll data set; (c and d) intersecting data set.

where h0 = 1 − [(2θ1θ3)/(3θ22)], θ1 = tr(V), θ2 = tr(V2), θ3 = tr(V3), V is the covariance matrix of E, and zα is the standardized normal variable with (1 − α) confidence limit, having the same sign as h0. Therefore, faults will be detected if Dnew > Dc,α or Qnew > Qc,α. After a fault has been successfully detected, it is important to diagnose the root cause of the fault. In this paper, the contribution plot26,29,30 is used for fault diagnosis. The contribution plot shows the contribution of each process variable to the monitoring statistic. The process variable with the highest contribution is usually responsible for the fault. Equation 24 can be reduced to

Table 1. Monitoring Variables in the Tennessee Eastman Process

−1 T Dnew = ynew ⃗ S ynew ⃗ −1 T T T −1 T = ynew ⃗ S [xnew ⃗ A (A A ) ] m

=

−1 T ynew ⃗ S

∑ [xnew,jaj⃗ (ATA)−1]T j=1

m

=

(28)

m

∑ cjD

where a⃗j is the row vector of matrix A. Thus, the contribution of process variable xnew,j to the D statistic is defined as follows: (29)

The contribution of xnew,j to the Q statistic is calculated as follows: cQj = enew, j 2 = (xnew, j − xnew, ̂ j)2

no.

variable name

1 2 3 4 5 6 7 8 9 10 11 12

A feed (stream 1) D feed (stream 2) E feed (stream 3) A and C feed (stream 4) recycle flow (stream 8) reactor feed rate (stream 6) reactor press. reactor level reactor temp purge rate (stream 9) product separator temp product separator level

18 19 20 21 22 23 24 25 26 27 28 29

13

product separator press.

30

14

product separator underflow stripper level stripper press. stripper underflow (stream 11)

31

stripper temp stripper steam flow compress work reactor cooling water outlet temp separator cooling water outlet temp D feed flow (stream 2) E feed flow (stream 3) A feed flow (stream 1) A and C feed flow (stream 4) compressor recycle value purge valve (stream 9) separator pot liquid flow (stream 10) stripper liquid product flow (stream 11) stripper steam valve

32 33

reactor cooling water valve condenser cooling water flow

where enew,j is the residual of xnew,j, and x̂new,j is the part of xnew,j that is explained by the NOC model, i.e., x̂new,j = aj⃗ yn⃗ ew. 4.3. Process Monitoring Procedure. A complete GLPPbased process monitoring procedure is presented as follows. NOC Model Developing Procedure. 1. Construct the training data set X under the normal operation conditions, and then normalize it to zero mean and unit variance. 2. Select appropriate parameters of GLPP, and perform GLPP on X to obtain the transformation matrix A. 3. Develop the GLPP-based NOC model. 4. Calculate the control limits of D and Q statistics.

j=1

T −1 T −1 T c jD = ynew ⃗ S [xnew, jaj⃗ (A A) ]

variable name

15 16 17

−1 T T −1 T ⃗ S [xnew, jaj⃗ (A A) ] ∑ ynew j=1

=

no.

(30) 7701



Article

roll and η = 0.439 for intersecting data, respectively. Figure 1 shows projection results of the Swiss roll generated by GLPP, LPP, and PCA, where data points are coded with different colors according to their positions. Similarly, projection results of intersecting data are shown in Figure 2. It is evident that GLPP well preserves the original geometric structure of the data set by combining global and local information extractions, since projected points are well arranged in the two-dimensional (2-D) space. However, either PCA or LPP only exhibits one aspect of the original geometric structure of the data set. PCA projects the data set along directions with maximum variance, while it fails to preserve the manifold structure of the data set. By contrast, LPP well captures the manifold structure of data set at the cost of ignoring variance information. To further illustrate relationships of GLPP with PCA and LPP, Figure 3 shows projection results of the Swiss roll and intersecting data generated by GLPP with different values of η, k, and σ. It can be found that these parameters have a great effect on the performance of GLPP. If we set η = 0, k = 0, and σ → ∞, the projection result of GLPP is the same as that of PCA. However, when we set η = 1, GLPP exhibits the same projection result as LPP. These results are completely in accord with the theoretical analysis in section 3.4. In a word, GLPP with η calculated by eq 14 gives the best projection result. This proves that the proposed selection strategy of η is efficient. 5.2. Tennessee Eastman Process. The Tennessee Eastman (TE) process developed by Down and Vogel31 is a well-known benchmark process for testing the performance of various monitoring methods. This process consists of five major unit operations: a reactor, a product condenser, a vapor−liquid separator, a recycle compressor, and a product stripper. It has 12 manipulated variables and 41 measurements, and it can generate 21 programmed known fault modes. As suggested by Lee et al.,32 33 variables are selected for process monitoring in this paper, which are listed in Table 1. Table 2 shows detailed information on 21 process faults. A total of 22 sets of data are collected under different operating conditions, including one normal data set and 21 fault data sets. Each data set contains 960 samples. The normal data set is used as the training data set to develop the NOC model. The other 21 fault data sets are used as test data sets. All faults were introduced into the process at sample 160. The GLPP-based monitoring method is compared with those based on PCA and LPP. The first nine PCs are selected for PCA by cross-validation, and the same number of latent variables is retained for LPP and GLPP for a fair comparison. A neighbor number k = 10 is chosen for both LPP and GLPP.

Table 2. Process Faults for the Tennessee Eastman Process no.

process variable

1

A/C feed ratio, B composition constant (stream 4) B composition, A/C ratio constant (stream 4) D feed temp (stream 2) reactor cooling water inlet temp condenser cooling water inlet temp A feed loss (stream 1) C header press. loss reduced availability (stream 4) A, B, C feed composition (stream 4) D feed temp (stream 2) C feed temp (stream 4) reactor cooling water inlet temp condenser cooling water inlet temp reaction kinetics reactor cooling water valve condenser cooling water valve unknown unknown unknown unknown unknown valve for stream 4 fixed at steady state position

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

type step step step step step step step random variation random variation random variation random variation random variation slow drift sticking sticking unknown unknown unknown unknown unknown step constant position

Online Monitoring Procedure. 1. Normalize a new sample xn⃗ ew to zero mean and unit variance. 2. Use eq 23 to project the new sample xn⃗ ew onto the NOC model. 3. Compute D and Q statistics using eqs 24 and 25, respectively. 4. Monitor whether D and Q statistics exceed their control limits Dc,α and Qc,α, respectively. 5. Once a fault is detected, diagnose which process variable causes the fault by contribution plots.

5. CASE STUDIES 5.1. Synthetic Examples. The dimensionality reduction ability of GLPP is tested using the Swiss roll and the intersecting data sets, where each data set contains 1800 points. LPP and PCA are used for comparison. The neighbor number k is selected to be 15 for both LPP and GLPP. The weighted coefficient η in GLPP is calculated as η = 0.433 for the Swiss

Table 3. Fault Detection Rates/False Alarm Rates (%) of Three Methods for the TE Process PCA

LPP

GLPP

PCA

LPP

GLPP

fault no.

D

Q

D

Q

D

Q

fault no.

D

Q

D

Q

D

Q

1 2 3 4 5 6 7 8 9 10 11

99/0 98/1 2/1 7/1 25/1 99/1 42/0 97/0 2/1 32/1 22/1

100/1 96/0 1/1 100/0 18/0 100/0 100/1 89/2 1/1 17/1 72/0

99/1 98/0 2/0 8/1 100/1 100/0 94/1 98/1 1/1 64/1 43/1

99/0 98/0 1/0 81/0 24/0 99/0 100/0 97/0 1/1 38/0 54/0

100/1 98/1 1/1 63/0 100/1 100/1 100/1 98/1 2/1 83/1 73/0

99/0 98/1 4/0 77/1 24/1 100/1 100/0 97/0 1/1 38/0 53/0

12 13 14 15 16 17 18 19 20 21 22

97/1 93/0 81/0 1/1 14/4 74/1 89/0 1/0 32/0 34/0 −

90/1 95/1 100/2 2/1 16/1 93/2 90/2 29/0 45/2 46/0 −

100/2 94/0 99/1 3/0 87/2 87/0 89/1 79/0 69/0 33/3 −

98/1 94/0 100/0 3/0 18/6 86/0 89/0 2/0 42/0 37/0 −

100/1 94/0 99/1 5/2 88/1 88/1 90/1 85/1 90/0 40/2 −

99/1 94/0 100/0 3/0 18/5 87/0 89/0 2/0 43/0 40/0 −

7702



Article

Table 4. Comparison of Average Fault Detection Rates/False Alarm Rates (%) among Different Methods PCA

LPP

GLPP

LGPCA

GLSA

D

Q

D

Q

D

Q

D

Q

D

Q

49.6/0.7

61.9/0.9

68.9/0.8

60.0/0.4

76.0/0.9

60.3/0.5

73.7/1.4

60.2/0.6

64.2/−

62.5/−

Figure 4. Monitoring charts of (a) PCA, (b) LPP, and (c) GLPP for fault 5.

Figure 5. Monitoring charts of (a) PCA, (b) LPP, and (c) GLPP for fault 10.

The control limits of all monitoring statistics of three methods are set as 99%. Fault detection rate (FDR) and false alarm rate (FAR) are used to quantify the performance of different methods. FDR is defined as the percentage of samples outside the control limit in fault conditions. FAR is the percentage of samples outside the control limit before the introduction of faults. FDRs and FARs of PCA-, LPP-, and GLPP-based monitoring methods for all 21 faults are listed in Table 3. It can be seen that the GLPP-based method outperforms the other two methods in most of the fault cases. In the fault

cases of 1, 2, 6, 13, and 18, FDRs of the three methods are almost the same, because the fault magnitude is large. However, for the fault cases of 5, 10, 16, 19, and 20, FDRs of the GLPPbased method are larger than those of PCA or LPP, which indicates that the GLPP-based method is more sensitive to these small faults that are difficult to detect by PCA or LPP. The average FDRs of three methods for all 21 faults are compared in Table 4. To further verify the performance of GLPP, results of GLSA- and LGPCA-based monitoring methods obtained from the literature4,21 are also listed in Table 4. 7703



Article

variable in Table 2. These results validate the effectiveness of the GLPP-based method for fault diagnosis.

It is evident that the GLPP-based method has the best performance. The D statistic of the GLPP-based monitoring method shows the largest average FDR. To illustrate the superiority of GLPP-based monitoring method, monitoring charts of GLPP, LPP, and PCA for faults 5 and 10 are compared in Figures 4 and 5, respectively. Figure 4 shows that the PCA-based D and Q statistics successfully detect faults from sample 160 to 345, while failing to detect faults after sample 346. Both GLPP- and LPP-based D statistics detect most of the faults from sample 160 to the end of the process, while the GLPP-based D statistic has a higher FDR and a lower FAR. Figure 5 also indicates that the GLPP-based method is more efficient for the detection of fault 10. After faults have been detected, the contribution plot is used for fault diagnosis. One step fault (fault 5) and one sticking fault (fault 14) are selected for testing. The contribution plots of GLPP-based D and Q statistics at sample 210 are shown in Figure 6. In each subplot, the variable that provides the highest

6. CONCLUSION In this paper, a new dimensionality reduction method called “global−local preserving projections” (GLPP) was proposed. Different from PCA and LPP, GLPP aims to preserve both global and local structures of the data set, and introduces a parameter η to keep a proper trade-off between them. It has been proven that PCA and LPP could be viewed as two limiting cases of GLPP. Therefore, with appropriate parameter selections, GLPP can preserve more meaningful information after data projection and gives a more faithful representation of the data set in the lower-dimensional space than PCA and LPP. In comparison with PCA and LPP, GLPP is more general in practical applications, because its parameters can be chosen flexibly according to features of different data sets. A GLPP-based process monitoring method was then developed. Its performance was tested in the TE process. Monitoring results of GLPP-based method were compared with those obtained from PCA- and LPP-based methods. It is indicated that the GLPP-based monitoring method outperforms the other two methods in terms of higher fault detection rates and higher fault sensitivity. The results showed that the GLPP-based monitoring method has good potential in practical applications.

■

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Notes

The authors declare no competing financial interest.

■

ACKNOWLEDGMENTS This study was supported by the National Natural Science Foundation of China (No. 61304116) and the Zhejiang Provincial Natural Science Foundation of China (No. LQ13B060004).

■

REFERENCES

(1) Ge, Z.; Song, Z.; Gao, F. Review of recent research on data-based process monitoring. Ind. Eng. Chem. Res. 2013, 52, 3543−3562. (2) Jiang, Q.; Yan, X.; Zhao, W. Fault detection and diagnosis in chemical processes using sensitive principal component analysis. Ind. Eng. Chem. Res. 2013, 52, 1635−1644. (3) Qin, S. J. Survey on data-driven industrial process monitoring and diagnosis. Annu. Rev. Control 2012, 36, 220−234. (4) Zhang, M.; Ge, Z.; Song, Z.; Fu, R. Global−local structure analysis model and its application for fault detection and identification. Ind. Eng. Chem. Res. 2011, 50, 6387−6848. (5) Nomikos, P.; MacGregor, J. F. Monitoring batch processes using multiway principal component analysis. AIChE J. 1994, 40, 1361− 1375. (6) Paul, N.; MacGregor, J. F. Multi-way partial least squares in monitoring batch processes. Chemom. Intell. Lab. Syst. 1995, 30, 97− 108. (7) Kano, M.; Tanaka, S.; Hasebe, S.; Hashimoto, I.; Ohno, H. Monitoring independent components for fault detection. AIChE J. 2003, 49, 969−976. (8) Choi, S. W.; Lee, C.; Lee, J.; Park, J. H.; Lee, I. Fault detection and identification of nonlinear processes based on kernel PCA. Chemom. Intell. Lab. Syst. 2005, 75, 55−67.

Figure 6. Contribution plots of GLPP-based D and Q statistics for (a) fault 5 and (b) fault 14.

contribution is identified to be the root cause of the fault. Therefore, variable 33 (condenser cooling water flow rate) is identified to be the root cause of fault 5, and variable 32 (reactor cooling water valve) is the root cause of fault 14. Fault 5 involves a step change in the condenser cooling water inlet temperature. A great influence of fault 5 is to lead to a step change in the condenser cooling water flow rate. Since the data set of the TE process does not include the data of condenser cooling water inlet temperature, it is reasonable to identify the condenser cooling water flow rate as the fault variable. A similar result has also been reported in the literature.21 For fault 14, the identified fault variable is exactly consistent with the preset fault 7704



Article

(9) Zhang, Y.; Zhou, H.; Qin, S.; Chai, T. Decentralized fault diagnosis of largescale processes using multiblock kernel partial least squares. IEEE Trans. Ind. Inf. 2010, 6, 3−10. (10) Zhang, Y.; An, J.; Zhang, H. Monitoring of time-varying processes using kernel independent component analysis. Chem. Eng. Sci. 2013, 88, 23−32. (11) Li, W.; Yue, H. H.; Valle-Cervantes, S.; Qin, S. J. Recursive PCA for adaptive process monitoring. J. Process Control 2000, 10, 471−486. (12) Qin, S. J. Recursive PLS algorithms for adaptive data monitoring. Comput. Chem. Eng. 1998, 22, 503−514. (13) Chen, J.; Liu, K. On-line batch process monitoring using dynamic PCA and dynamic PLS models. Chem. Eng. Sci. 2002, 14, 63− 75. (14) Wang, X.; Kruger, U.; Irwin, G. W. Process monitoring approach using fast moving window PCA. Ind. Eng. Chem. Res. 2005, 44, 5691−5702. (15) He, X. F.; Niyogi, P. Locality preserving projections. In Proceedings of the Conference on Advances in Neural Information Processing Systems, Dec 8−13, 2003, Vancouver, Canada; MIT Press: Cambridge, MA, 2004. (16) Tenenbaum, J. B.; Silva, V.; Langford, J. C. A global geometric framework for nonlinear dimensionality reduction. Science 2000, 290, 2319−2323. (17) Roweis, S. T.; Saul, L. K. Nonlinear dimensionality reduction by locally linear embedding. Science 2000, 290, 2323−2326. (18) Belkin, M.; Niyogi, P. Laplacian eigenmaps and spectral techniques for embedding and clustering. Neural Inf. Process. Syst. 2002, 1, 585−592. (19) Hu, K. L.; Yuan, J. Q. Multivariate statistical process control based on multiway locality preserving projections. J. Process Control 2008, 18, 797−807. (20) Shao, J. D.; Rong, G.; Lee, J. M. Generalized orthogonal locality preserving projections for nonlinear fault detection and diagnosis. Chemom. Intell. Lab. Syst. 2009, 96, 75−83. (21) Yu, J. Local and global principal component analysis for process monitoring. J. Process Control 2012, 22, 1358−1373. (22) Jackson, J. E. A User’s Guide to Principal Components; Wiley: New York, 1991. (23) Jolliffe, I. T. Principal Component Analysis; Springer: New York, 2002. (24) Belkin, M.; Niyogi, P. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 2003, 15, 1373− 1396. (25) Qin, S. J. Statistical process monitoring: basics and beyond. J. Chemom. 2003, 17, 480−502. (26) Westerhuis, J. A.; Gurden, S. P.; Smilde, A. K. Generalized contribution plots in multivariate statistical process monitoring. Chemom. Intell. Lab. Syst. 2000, 51, 95−114. (27) Nomikos, P.; MacGregor, J. F. Multivariate SPC charts for monitoring batch processes. Technometrics 1995, 37, 41−59. (28) Jackson, J. E.; Mudholkar, G. S. Control procedures for residuals associated with principal component analysis. Technometrics 1979, 21, 341−349. (29) Miller, P.; Swanson, R. E.; Heckler, C. E. Contribution Plots: The missing link in multivariate quality control. Appl. Math. Comput. Sci. 1998, 8, 775−782. (30) Kourti, T.; MacGregor, J. F. Multivariate SPC methods for process and product monitoring. J. Qual. Technol. 1996, 28, 409−428. (31) Down, J. J.; Vogel, E. F. A plant-wide industrial process control problem. Comput. Chem. Eng. 1993, 17, 245−255. (32) Lee, J. M.; Qin, S. J.; Lee, I. B. Fault detection and diagnosis based on modified independent component analysis. AIChE J. 2006, 52, 3501−3514.

7705


Process Monitoring with GlobalâLocal Preserving Projections

Recommend Documents

Process Monitoring with GlobalâLocal Preserving Projections