Global–Local Structure Analysis Model and Its Application for Fault

Apr 5, 2011 - Global–Local Structure Analysis Model and Its Application for Fault ...... A knowledge-driven approach for process supervision in chem...
0 downloads 0 Views 4MB Size
ARTICLE pubs.acs.org/IECR

GlobalLocal Structure Analysis Model and Its Application for Fault Detection and Identification† Muguang Zhang, Zhiqiang Ge,* Zhihuan Song,* and Ruowei Fu State Key Laboratory of Industrial Control Technology, Institute of Industrial Process Control, Zhejiang University, Hangzhou 310027, Zhejiang, People’s Republic of China ABSTRACT: In this paper, a new fault detection and identification scheme that is based on the globallocal structure analysis (GLSA) model is proposed. By exploiting the underlying geometrical manifold and simultaneously keeping the global data information, the GLSA model constructs a dual-objective optimization function for dimension reduction of the process dataset. It combines the advantages of both locality preserving projections (LPP) and principal component analysis (PCA), under a unified framework. Meanwhile, GLSA can successfully avoid the singularity problem that may occur in LPP and shares the orthogonal property of the PCA method. In order to balance the two subobjectives (corresponding to global and local structure preservings), a tuning parameter is introduced, and an energy-function-based strategy is proposed to determine the value of the introduced tuning parameter. For the purpose of fault detection, two statistics are constructed, based on the GLSA model. Furthermore, the Bayesian inference algorithm is introduced upon the two monitoring statistics for fault identification. Two case studies are provided to demonstrate the efficiencies of the GLSA model.

1. INTRODUCTION Because of the increased attention on plant safety and quality consistency, early fault detection and fault diagnosis has become more and more important in chemical processes. Particularly, as a typical data-based method, multivariate statistical process control (MSPC) has already been widely used for process monitoring in industrial processes,16 such as principal component analysis (PCA)7,8 and partial least-squares (PLS).911 A common feature of the MSPC-based methods is to perform dimension reduction on the original process data, which tries to extract most data variations through use of the first few components. Through this point, the performance of dimension reduction will directly affect the feature extraction step, based on which the monitoring performance could also be impacted. Although the extracted principal components of the PCA method can retain most data variations, they can only capture the global structure of the process data. In other words, the detailed local structure information among the process data has been ignored.12 However, the inner data structure represents the detailed relationship between different data samples, which is also an important aspect of the dataset. The loss of this crucial information may have great impact on dimension reduction performance, and thus the process monitoring efficiency will also be influenced. Fortunately, a class of dimension reduction techniques, which is known as manifold learning, has been developed in the pattern recognition area, including isometric feature mapping (Isomap),13 locally linear embedding (LLE),14 Laplacian eigenmaps (LE),15 locality preserving projections (LPP),16 etc. These techniques are mainly focused on local structure preserving of the dataset, such as image data, documental data, etc.1317 Among them, LPP is well-known a linear projection method that can exploit the underlying geometrical manifold of the dataset. Recently, it has been introduced for process monitoring, successful applications have been made for batch processes monitoring and nonlinear fault detection.18,19 However, one critical r 2011 American Chemical Society

drawback of the LPP method is that it does not explicitly consider the global structure of the process data. It only defines the neighborhood relationship between different samples. Without preserving the variance information, the outer shape of the dataset may be broken after the dimension reduction procedure. Hence, the modeling performance of LPP may be degraded if a process dataset has significant directions for variance information. As a compromise, both local and global data information should be well considered for dimension reduction. While the global structure defines the outer shape of the process dataset, the local structure provides its inner organization. However, PCA or LPP only considers one of these two aspects. In this paper, a new dimension reduction method—called global-local structure analysis (GLSA)—is proposed. Different from LPP, the objective of which is to preserve the local structure and PCA that focuses on the global structure of the process dataset, GLSA takes both two aspects into account. Compared with the wavelet-based approaches such as Multiscale-PCA,20 which balances local and global information by modeling low- and high-frequency content in multimodes, in GLSA, we intend to balance local and global structure in one mode, from the spatial perspective. In order to solve this problem, we first construct a dual-objective function for GLSA, then a tuning parameter is introduced to balance the effect of two types of preservings, which takes advantage of both PCA and LPP. Based on the developed GLSA model, a fault detection and identification scheme is then constructed in this paper. Similar to PCA and LPP, two statistics are built for the purpose of fault detection, which correspond to the latent variable space and the residual space, respectively. To identify the type of the detected Received: December 23, 2010 Accepted: April 5, 2011 Revised: April 4, 2011 Published: April 05, 2011 6837

dx.doi.org/10.1021/ie102564d | Ind. Eng. Chem. Res. 2011, 50, 6837–6848

Industrial & Engineering Chemistry Research

ARTICLE

fault, the Bayesian inference21 is introduced upon the two monitoring statistics, based on which the posterior probability of the current fault, corresponding to each possible process fault, can be calculated. Hence, the one with the largest posterior probability value should be deemed as the fault type of the current abnormal situation. The remainder of this paper is organized as follows. First, we provide a brief review and analysis of PCA and LPP in section 2. The GLSA model then is developed in the next section, which is followed by some analyses of the GLSA model. The GLSA-based fault detection and identification method is developed in section 5. Then, two case studies are demonstrated in section 6. Finally, conclusions are presented.

2. ANALYSES OF PRINCIPAL COMPONENT ANALYSIS (PCA) AND LPP Suppose X is a mean centered D-dimensional dataset with size n (X = {xi}ni=1 ∈ RD), the purpose of linear dimension reduction techniques is to find a transformation matrix A ∈ RDd to map the original space X to a lower d-dimensional space Y = {yi}ni=1 ∈ Rd, where yi = ATxi, and make sure that Y can well represent X. Different definitions of “representation” drive different objective functions and dimension reduction techniques. 2.1. Principal Component Analysis (PCA). The basic idea of PCA is to find a low-dimensional representation, in which the data variance is maximal, which is given as follows: JðaÞPCA ¼ max a

n

ðyi  yÞ2 ∑ i¼1

a

n

∑ ðyi  yjÞ2 Sij i¼1

3. GLOBALLOCAL STRUCTURE ANALYSIS (GLSA) By taking the advantages of both PCA and LPP, GLSA intends to construct a dual-objective optimization function that provides a unified framework to combine the merits of these two algorithms. It is expected that, after projection, the nearby points, which represent neighborhood relationships of each other, are still close in the latent space, while the faraway points are distinct from each other, which could retain the variance information. However, these two objectives are difficult to achieve at the same time, because they usually project the dataset along different directions. To address this problem, a parameter η is introduced to balance the two subobjectives, corresponding to global and local structure preservings. The parameter η is important in the construction of the GLSA model, since it describes the different roles of the two types of preservings in the process dataset, and thus determines their effects on the extracted latent variables. Hence, the objective function of GLSA can be given as follows: JðaÞGLSA ¼ η max a

n

n

ðyi  yÞ2  ð1  ηÞ min ∑ ðyi  yj Þ2 Sij ∑ a i¼1 i¼1

¼ ηJðaÞGlobal  ð1  ηÞJðaÞLocal

ð1Þ

where y = (1/n)∑ni=1yi and a is the transformation vector. The low-dimensional space Y shares the similar outer shape with the original space X, since they have same directions of the maximal variance. However, without considering the local relationship between different data points, PCA cannot preserve the intrinsic geometrical structure of the dataset. Suppose there is a trail of points x1, x2, ..., xm1, xm, and [xl1, xl] are near points in the original space X, where l = 2, 3, ..., m. Since PCA ignores these neighborhood relationships in the projection step, the ordering of the corresponding points y1, y2, ..., ym1, ym in the low-dimensional space may be destroyed. 2.2. Locality Preserving Projections (LPP). Different from PCA, which focuses on preserving the global structure, LPP take local structure preserving into account and try to find the embedding geometrical manifold;16 the objective function of LPP is given as JðaÞLPP ¼ min

outer shape of the dataset may be broken. Furthermore, if the faraway points belong to different classes, the discriminating efficiency of LPP will also be affected, since it can hardly distinguish these points after projection.

ð3Þ

where J(a)Global is used to preserve the global structure of the dataset, and the local structure is preserved using J(a)Local. The value of η is constrained as 0 e η e 1, a discussion of parameter setting will be discussed in the next section (section 3.2). 3.1. Reformulations of Global and Local Structure Preservings. First, we consider J(a)Global, which is given as JðaÞGlobal

n

¼

max

∑ ðyi  yÞ2 i¼1

¼

max

aT ðx i  xÞðx i  xÞT a ∑ i¼1

¼

max aT Ca

a

n

a

ð4Þ

a

 x)(xi  x)T, x = (1/n)∑ni=1xi. This where C = objective function is the same as J(a)PCA, which only preserves the global structure. In contrast, the local structure preserving method can be reformulated as follows: (1/n)∑ni=1(xi

ð2Þ

JðaÞLocal

where S is the n  n proximity matrix, and its element Sij represents the neighborhood relationship between xi and xj. The value of Sij is proportional to the distance between xi and xj. Suppose xi and xj are two nearby data points that represent a neighborhood structure; minimizing the objective function of LPP can guarantee that their projections yi and yj are also nearby. This is because Sij has been assigned to a big value, based on the distance value. However, without giving a constraint for the faraway points, which represent the directions of significant variances in the space of X, LPP may lose the variance information and put the distant points into a small region. As a result, the

¼ min

∑ij ðyi  yj Þ2 Sij

¼ min

∑ij ðyi  yjÞðyi  yjÞT Sij

a

a

a

n

∑ yi DiiyTi  2 ∑ij yi SijyTj i¼1

¼ min 2

ð5Þ

¼ min aT XðD  SÞX T a a

¼ min aT XLX T a a

¼ min aT L0 a a

6838

dx.doi.org/10.1021/ie102564d |Ind. Eng. Chem. Res. 2011, 50, 6837–6848

Industrial & Engineering Chemistry Research

ARTICLE

where D is a n  n diagonal matrix, Dii = ∑jSij L = D  S. Sij represents the weighted neighborhood relationships between different data samples, calculated as 8 ! 2 > jj x  x jj i j < exp  if xi , xj ∈ Lðx i ; xj Þ t Sij ¼ > :0 otherwise s:t: 1 e i e n, 1 e j e n

ð6Þ

where t is a parameter for adjusting Sij. L(xi;xj) denotes xi and xj define the “neighborhood”, which can be expressed two ways.1416 The first way is called K nearest neighbors (where K ∈ R): if xj is among the K nearest neighbors of xi, or xi is among the K nearest neighbors of xj, xi,xj ∈ L(xi;xj). The other way is based on ε neighbors (where ε ∈ R): if ||xi xj||2 e ε, xi,xj ∈ L(xi;xj). In the present paper, we adopt K nearest neighbors to define L(xi;xj), since it is difficult to choose a proper ε in practical applications.22 3.2. Setting of Parameter η Generally, the dual-objective optimization problem usually fails to get an absolutely optimal solution, because of the conflict between the two subobjectives. Although we cannot optimize both of the two subobjectives simultaneously, there is a way to obtain a relatively optimal solution by keeping equilibrium between them. To achieve this, two important issues should be noted here:23,24 • Different subobjective functions may have different scales. • Different subobjective functions may have different speeds of convergence.Therefore, proper parameters should be chosen carefully to unify the scales in the initialization step and parallel the speeds during the optimization searching procedure. Otherwise, one subobjective may overpower the other and the desired balance could be broken. In this study, the optimization solution can be obtained by resolving an eigenvector problem, in which the iterative searching is not involved. Therefore, the convergence speed problem can be successfully avoided. In other words, only the scales of different subobjective functions should be considered. Inspired by ref 25, the scales of J(a)Global and J(a)Local can be defined as follows: SGlobal ¼ FðCÞ 0

SLocal ¼ FðL Þ

ð7Þ

Figure 1. The 1-D projection results of PCA, LPP, and GLSA on a twodimensional (2-D) dataset x.

maximum problem with a new criterion: JðaÞGLSA

¼ η max aT Ca  ð1  ηÞ min aT L0 a

FðL0 Þ FðCÞ þ FðL0 Þ

¼ max fηa Ca  ð1  ηÞa L ag a

ð11Þ

0

¼ max a ðηC  ð1  ηÞL Þa T

a

¼ max aT Ma a

where M = ηC  (1  η)L0 . We impose this expression on the objective function of GLSA for the following reasons: • Orthogonal constraint owns superiority in statistics construction and reconstruction error calculation, which are desirable for fault detection. • Orthogonal constraint can avoid singularity problem in LPP. • Orthogonal constraint can effectively enhance the discriminating power of dimension reduction methods,26 which is desirable for fault identification. max aT Ma a

ð8Þ

ð12Þ

s.t. aT a ¼ 1 The transformation vector a can be found by resolving an eigenvector problem:

ð9Þ

Ma ¼ λa

ð13Þ

Suppose a0, a1, ..., ad1 are eigenvectors, which correspond to the d largest eigenvalues λ0, λ1, ..., λd1 of (13), where λ0 > λ1 > 3 3 3 λd1. The projection that we design for keeping global and local structures can be constructed as follows:

ð10Þ

Simulations in the case study part show that this strategy can balance global and local manners very well; thus, the dimension reduction performance of GLSA can also be guaranteed. In fact, η is an adaptive parameter that can be changed with different process datasets. Therefore, GLSA is flexible for practical applications. 3.3. Formulation of GlobalLocal Structure Analysis (GLSA). Substituting J(a)Global and J(a)Local into the objective function described by eq 3, J(a)GLSA can be converted to a

T 0

T

Therefore, the parameter η could be set as η¼

a

a

where F( 3 ) is the spectral radius of the corresponding matrix. To unify the different scales of global and local structure preservings, these two aspects must be considered equivalently in the objective function of GLSA, which can be achieved by ηSGlobal ¼ ð1  ηÞSLocal

¼ ηJðaÞGlobal  ð1  ηÞJðaÞLocal

x i f y i ¼ AT x i A ¼ ½a0 , a1 , ... , ad  1 

ð14Þ

4. ALGORITHM ANALYSES To give a deep perspective of proposed algorithm, some theoretical analyses of GLSA are provided here. First, the main 6839

dx.doi.org/10.1021/ie102564d |Ind. Eng. Chem. Res. 2011, 50, 6837–6848

Industrial & Engineering Chemistry Research

ARTICLE

Figure 2. (a) The Swiss-roll dataset, which contains 1800 points. (b) The 2-D projection result of LPP. (c) The 2-D projection result of PCA. (d) The 2-D projection result of GLSA.

Figure 3. (a) The S-curve dataset, which contains 1800 points. (b) The 2-D projection result of LPP. (c) The 2-D projection result of PCA. (d) The 2-D projection result of GLSA.

characteristics of the proposed method are highlighted as follows: • Local structure: GLSA shares similar properties with LPP, which can preserve the local structure of the dataset; thus,

the underlying geometrical manifold of the dataset can be exploited. • Global structure: GLSA shares similar properties with PCA, which can preserve the global structure of the dataset. 6840

dx.doi.org/10.1021/ie102564d |Ind. Eng. Chem. Res. 2011, 50, 6837–6848

Industrial & Engineering Chemistry Research

ARTICLE

Figure 4. The 2-D projection result of GLSA with different values of η: (a) Swiss-roll dataset and (b) S-curve dataset.

Therefore, the variance information of the dataset will be obtained after projection. • Computation: GLSA can successfully avoid the singularity problem in LPP, because the orthogonal constraint is involved. Besides, we do not need to solve a generalized eigenvector problem. • Linear mapping: Similar to LPP and PCA, GLSA is a linear projection method that explicitly provides a linear mapping from original space to the low-dimensional space. 4.1. Relationships between GLSA and PCA/LPP. As have been demonstrated, GLSA takes both global and local data manners into account, while LPP and PCA can only focus on

one aspect of them. The relationships between GLSA and PCA/ LPP can be explored as the following lemmas. Lemma 1: If a small value is assigned to parameter K (or ε), which means that the relationships among different data points are not considered, the GLSA algorithm will be reduced to the normal PCA method. Proof 1: When no neighbor has been assigned to each data point, the matrix S becomes as a diagonal matrix; thus, the matrix D = ∑jSij = S and L = D  S = 0. Therefore, the objective function J(a)GLSA becomes exactly the form of the PCA method J(a)PCA. Lemma 2: If we set η = 1 in the objective function of GLSA, the GLSA algorithm will be converted to PCA. Otherwise, if the value 6841

dx.doi.org/10.1021/ie102564d |Ind. Eng. Chem. Res. 2011, 50, 6837–6848

Industrial & Engineering Chemistry Research

ARTICLE

Figure 5. TennesseeEastman (TE) benchmark process.

Table 1. Monitoring Variables in the TennesseeEastman (TE) Process sample number

measured variables

sample number

measured variables

sample number

manipulated variables D feed flow valve

1

A feed

12

product separator level

23

2

D feed

13

product separator pressure

24

E feed flow valve

3

E feed

14

product separator underflow

25

A feed flow valve

4

total feed

15

stripper level

26

total feed flow valve

5 6

recycle flow reactor feed rate

16 17

stripper pressure stripper underflow

27 28

compressor recycle valve purge valve

7

reactor pressure

18

stripper temperature

29

separator pot liquid flow valve

8

reactor level

19

stripper steam flow

30

stripper liquid product flow valve

9

reactor temperature

20

compressor work

31

stripper steam valve

10

purge rate

21

reactor cooling water outlet temperature

32

reactor cooling water flow

11

product separator temperature

22

separator cooling water outlet temperature

33

condenser cooling water flow

of η is set as zero, the GLSA algorithm will be converted to LPP with orthogonal constraint. Therefore, as a matter of fact, PCA and LPP can be considered as two particular forms of GLSA. 9 4.2. Computational Complexity Analyses. In this section, we focus our study on the complexity analysis of the GLSA algorithm. The computational complexity of GLSA can be basically divided into three parts: search of the K nearest neighbors, calculation of the parameter η, and resolving an eigenvector problem. First, we consider the K-nearest-neighbors search problem. The complexity is given as27 TKNN search ¼ OðD þ kÞn2

computational complex can be calculated as Tη ¼ Oð2D2 Þ

To resolve the eigenvector problem in the GLSA algorithm Ma = λa, where the M ∈ RDD, the complexity is the same as PCA, given as Teig problem ¼ TPCA ¼ OðdD2 Þ

ð17Þ

As a result, the complexity of GLSA is calculated as follows: TGLSA

ð15Þ

For calculation of the parameter η, the largest eigenvalues of two D  D matrices should be calculated. Therefore, the

ð16Þ

¼ TKNN search þ Tη þ Teig problem ¼ O½ðD þ kÞn2 þ ðd þ 2ÞD2 

ð18Þ

In contrast, LPP must solve a K nearest neighbors search problem and a generalized eigenvector problem; thus, its 6842

dx.doi.org/10.1021/ie102564d |Ind. Eng. Chem. Res. 2011, 50, 6837–6848

Industrial & Engineering Chemistry Research

ARTICLE

Table 2. Process Disturbances fault number 1

process variable

Table 3. Fault Detection Results of the Three Methods in the TE Process

type

A/C feed ratio, B composition

LPP

step

constant (stream 4)

2

PCA 2

GLSA 2

fault

T

SPE

T

constant (stream 4)

1

0.99

0.99

0.99

1

1

0.99

3

D feed temperature (stream 2)

step

2

0.99

0.98

0.99

0.96

0.96

0.99

4

reactor cooling water inlet temperature

step

5

condenser cooling water inlet

step

3 4

0.02 0.08

0.01 0.84

0.02 0.07

0.02 1

0.03 0.04

0.04 0.92

5

0.98

0.22

0.25

0.19

1

0.26

6

temperature A feed loss (stream 1)

step

6

1

1

0.99

1

1

0.99

7

C header pressure loss-reduced

step

7

0.40

1

0.42

1

0.41

1

8

0.97

0.96

0.97

0.90

0.95

0.97

random variation

9

0.02

0.01

0.02

0.02

0.01

0.04

0.41

0.29

0.32

0.20

0.89

0.35

0.22 0.97

0.74 0.91

0.12 1

0.64 0.98

2

B composition, A/C ratio

availability (stream 4) 8

SPE

T

SPE

step

A, B, C feed composition (stream 4)

9

D feed temperature (stream 2)

random variation

10

10

c feed temperature (stream 4)

random variation

11

reactor cooling water inlet temperature

random variation

11 12

0.23 0.97

0.61 0.98

12 13

condenser cooling water inlet temperature random variation reaction kinetics slow drift

13

0.94

0.94

0.93

0.95

0.94

0.94

14

0.83

1

0.81

1

0.29

1

sticking

15

0.02

0.03

0.01

0.02

0.06

0.03

sticking

16

0.29

0.16

0.14

0.20

0.93

0.21

unknown

17

0.73

0.88

0.74

0.94

0.76

0.90

unknown

18

0.89

0.89

0.89

0.90

0.90

0.90

19 20

0.01 0.27

0.03 0.40

0.01 0.32

0.34 0.47

0.90 0.91

0.09 0.46

21

0.37

0.39

0.34

0.47

0.39

0.43

14 15 16 17

reactor cooling water valve condenser cooling water valve unknown unknown

18

unknown

unknown

19

unknown

unknown

20 21

unknown valve position constant (stream 4)

unknown constant position

complexity is given as TLPP ¼ O½ðD þ kÞn2 þ ðD þ dÞD2 

ð19Þ

Usually, a value of D . 2 is always held in practical applications. Therefore, the computational complexities of the three methods can be ranked as follows: TPCA < TGLSA < TLPP

ð20Þ

5. FAULT DETECTION AND IDENTIFICATION BASED ON GLOBALLOCAL STRUCTURE ANALYSIS (GLSA) 5.1. Fault Detection. Based on the proposed GLSA model, a fault detection method can be developed. The implementation procedures of the fault detection method are given as follows. First, normal operating data are collected as training samples; then, use the GLSA algorithm for dimension reduction and divide the original space into two spaces: feature space (FS) and residual space (RS). Thus,

^ þX ~ X ¼ X

~ ¼ AY þ X

¼

~ AAT X þ X

by the value of η, which is calculated in (10). As can be seen, PCA cannot detect fault point x1, and LPP is insensitive to fault point x2. In contrast, GLSA can successfully detect both the x1 and x2 points. This indicates that the one-structure-based fault detection method may omit the fault that only violates the restriction of the other structure. Therefore, to enhance the fault detection performance, we need to monitor the process on both global and local structures, which is exactly the idea of the GLSA model. Next, Hotelling’s T2 and squared prediction error (SPE) statistics, which corresponding to FS and RS, are calculated for fault detection purposes. Depending on the GLSA model, the T2 and SPE statistics can be constructed as follows. Given a new data sample xnew, it can be divided into two parts, which are calculated as xnew ¼ ^x

new

~A ~ T x new þ ~xnew¼ AAT x new þ A

¼ Ay new þ ð1  AAT Þxnew

~ = [ad, adþ1, ..., aD] represents the where ynew = ATxnew, A residual eigenvectors of Ma = λa. The T2 statistic can be constructed as Tnew 2 ¼ y Tnew Λ1 y new

ð21Þ

~ ^ is the projection result in the FS, where FS is Sf = span{A}, and X X is the projection result in the RS. To give an insight into why a GLSA-based fault detection method can achieve better performance, let us consider the simple example in Figure 1. This figure shows one-dimensional (1-D) projection results of PCA, LPP, and GLSA on a two-dimensional (2-D) dataset X. The projection direction of GLSA is in the middle, because GLSA can be considered as a compromise between PCA and LPP. The angle θ is determined

ð22Þ

ð23Þ

where Λ = YY /(n  1) is the covariance matrix of the projection vectors in the training dataset. The SPE statistic is used to measure variability that breaks the normal process correlation, which can be computed as T

SPEnew ¼ jj~x new jj 2 ¼ jj ð1  AAT Þx new jj 2

ð24Þ

To determine whether the process is operated under normal conditions, a proper control limit must be defined. Similar to the PCA 6843

dx.doi.org/10.1021/ie102564d |Ind. Eng. Chem. Res. 2011, 50, 6837–6848

Industrial & Engineering Chemistry Research

ARTICLE

Figure 6. Performance comparisons of different approaches for five process faults.

method, the control limit of the T2 statistic can be calculated as7,28 T2 e

dðn  1Þ Fd, ðn  dÞ, R nd

ð25Þ

where Fd,(nd),R represents the upper R percentile of the F-distribution. Similarly, the control limit of the SPE statistic can be obtained using a weighted χ2-distribution:7 SPE e gχ2h, R ;

g ¼

θ2 ; θ1



θ1 2 θ2

ð26Þ

i where θi = ∑D j=dþ1 λj (where i = 1, 2, 3). 5.2. Fault Identification. When a fault has been successfully detected, the fault identification issue is considered. In the present paper, a Bayesian inference algorithm is proposed for fault identification, which is developed based on the GLSA-based fault detection method. Suppose that there are totally F faulty modes in the process: the detected faulty data sample is denoted as xfault. First, the GLSA models corresponding to all of the known process fault modes should be developed beforehand, the corresponding eigenvector and eigenvalue matrices of which are represented as   A1 , A2 , ... , Af , ... , AF

and



Λ1 , Λ2 , ... , Λf , ... , ΛF



Based on these GLSA models, the T2 and SPE statistic values of the faulty data sample under each faulty operation mode can be calculated using procedures given in the last subsection, which are y f , fault ¼ Af T xfault

T2f , fault ¼ y f , fault T Λf 1 y f , fault SPEf , fault ¼ jj ð1  Af Af T Þx fault jj 2

ð27Þ

where f = 1, 2, ..., F. Based on the Bayesian inference, the posterior probability of the faulty sample belonging to a specific faulty mode f can be calculated as Pðf jx fault Þ ¼

Pðf , x fault Þ Pðx fault jf ÞPðf Þ ¼ F Pðx fault Þ Pðxfault jiÞPðiÞ



ð28Þ

i¼1

where P(f) is the prior probability of each faulty mode, which can be assigned as an equal value. P(xfault|f) is the conditional

Figure 7. Average fault detection rates of different approaches.

probability of the faulty data sample in each specific fault mode and is defined as follows:29 ( ) T2f , fault PT2 ðxfault jf Þ ¼ exp  2 Tf , lim ( ) ð29Þ SPEf , fault PSPE ðx fault jf Þ ¼ exp  SPEf , lim Pðx fault jf Þ ¼ ωPT2 ðx fault jf Þ þ ð1  ωÞPSPE ðx fault jf Þ where 0 e ω e 1 is a weighted parameter between the T2 and SPE statistics. Therefore, based on the posterior probability of the faulty data sample, the mode of the current detected fault can be identified.

6. CASE STUDIES In this section, two case studies are performed, to evaluate the proposed GLSA algorithm. First, we use two synthetic examples to show the effectiveness of GLSA in dimension reduction; then, the TE benchmark process is used to exhibit the fault detection and identification ability of GLSA. The performance of the propose method in both case studies are compared with LPP and PCA. 6.1. Synthetic Examples. The Swiss roll and the S-curve are two synthetic datasets that are wildly used to test the dimension reduction ability of different methods.30,31 Here, each dataset contains 1800 points and K = 15 is selected for both LPP and GLSA. Figure 2 shows the projection results of PCA, LPP, and GLSA on a Swiss roll, where the parameter η is calculated as η = 0.176, which indicates that the local structure must be emphasized in Swiss-roll dataset. Similarly, comparisons of PCA, LPP, and GLSA on the S-curve are exhibited in Figure 3 with η = 0.11, which has placed the emphasis on the local structure of the S-curve dataset. Based on these two results, it can be found that GLSA can well-preserve the original geometric structures, since the special inner orders that are illustrated by different colors have been well-arranged in the 2-D space. However, the PCA and LPP methods fail to exhibit these features.32,33 Figure 4 shows the projection results of the Swiss roll and the S-curve by GLSA with different choices of η value. It can be seen that, even we slightly change the value of η (from 0.11 or 0.176 to 0.2), the performance will be seriously degraded. Therefore, the value of η is a crucial parameter to the method, which has a great impact on dimension reduction performance. Besides, the 6844

dx.doi.org/10.1021/ie102564d |Ind. Eng. Chem. Res. 2011, 50, 6837–6848

Industrial & Engineering Chemistry Research

ARTICLE

Figure 8. Detection results of fault 10: (a) GLSA, (b) LPP, and (c) PCA.

Figure 9. Detection results of fault 10: (a) GLSA, (b) LPP, and (c) PCA.

usefulness of the proposed strategy for calculating η in eq 10 has also been verified.

6.2. TennesseeEastman Benchmark Process. The TennesseeEastman (TE) process is a well-known benchmark for 6845

dx.doi.org/10.1021/ie102564d |Ind. Eng. Chem. Res. 2011, 50, 6837–6848

Industrial & Engineering Chemistry Research

ARTICLE

Figure 10. Fault identification results based on mean values of the posterior probabilities: (a) fault 2, (b) fault 8, (c) fault 13, (d) fault 14, and (e) fault 20.

testing the performance of various fault detection methods.3235 This process consists of five major unit operations: a reactor, a condenser, a compressor, a separator, and a stripper. The TE process has 41 measured variables and 12 manipulated variables, and it can produce 21 programmed fault modes. A flowchart of the TE process is shown in Figure 5. In the present paper, we select 33 variables for monitoring, which are listed in Table 1.35 The simulation data that we have collected was separated into two parts: the training datasets and the testing datasets, where each operating mode (1 normal and 21 fault) contains 960 observations. All of the faults are introduced into the process at sample 160. Detailed descriptions of the 21 process faults are given in Table 2. Based on the normal process dataset, the GLSA, LPP, and PCA models are developed. Here, the numbers of latent variables have been selected as the same value as d = 9 for all of the three methods. A value of K = 10 is chosen for both the LPP and GLSA methods. As a result, the value of the balance parameter η is determined to be 0.63, which indicates that the

global structure must be emphasized in this process. For the purpose of fault detection, the confidence limits of all monitoring statistics in the three methods are set as 99%. To compare the fault detection performance of GLSA with LPP and PCA, all of the 21 faults in the TE process have been tested. The fault detection results of all these 21 faults are tabulated in Table 3. It can be seen that the GLSA-based method performs better than the other two methods in most fault cases. For example, significant improvements have been made for five faults, which are listed in Figure 6, where only the best detection rates of the monitoring statistics have been illustrated. Compared to the results of LPP and PCA, the fault detection rates of GLSA for these five process faults are much higher, which means GLSA has a higher sensitivity to detect these faults. The mean detection rates of the three methods for all of the 21 process faults are compared in Figure 7, where the results of both two monitoring statistics have been provided. It can be seen that the T2 statistic has performed much better than the SPE statistic. Particularly, 6846

dx.doi.org/10.1021/ie102564d |Ind. Eng. Chem. Res. 2011, 50, 6837–6848

Industrial & Engineering Chemistry Research

ARTICLE

It is noted that the “ordering of the data” we consider in this paper is the spatial structure, which is described by the Euclidean distances among different samples. In contrast, the canonical variate analysis (CVA) based approaches36,37 model serial correlations among different time instances. Therefore, CVA addresses a different issue from the proposed GLSA. However, both the orders of the spatial structure and time instances are important. How to efficiently combine CVA and LPP deserves further investigations in future work.

’ AUTHOR INFORMATION Corresponding Author

*E-mail addresses: [email protected] (G.Z.), zhsong@iipc. zju.edu.cn (S.Z). Notes Figure 11. Fault identification results among the five selected fault cases.

the detailed fault detection results of faults 10 and 19 are shown in Figures 8 and 9, respectively. Depending on these two comparison results, it can be found that the fault detection performance has been greatly improved by the GLSA method, especially by the T2 statistic. After the fault has been detected, the next step is to identify it. Therefore, the GLSA models for all 21 faults in the TE process have been developed. To demonstrate the performance of the fault identification method, 5 representative faults are selected, which correspond to one step fault (fault 2), one random variation fault (fault 8), one slow drift fault (fault 13), one sticking fault (fault 14), and one unknown fault (fault 20). The value of the weighted parameter is selected to be 0.5 in eq 29. Detailed descriptions of these 5 faults can be found in Table 2. Based on the Bayesian posterior probability, the fault identification result can be generated for each data sample under different fault conditions. The mean value of all data samples under each fault condition is shown in Figure 10ae. It can be seen that all 5 of these faults have been correctly identified, since the posterior probability value of the corresponding mode for each fault is more significant than that of other fault cases. Furthermore, the detailed comparison results of these five selected faults are provided in Figure 11, through which we can also conclude that the right fault modes have been identified.

7. CONCLUSIONS In the present paper, a new dimension reduction method, which is called globallocal structure analysis (GLSA), has been developed and is used for fault detection and identification. Different from principal component analysis (PCA) and locality preserving projections (LPP), GLSA takes both the global and local data structures into account, and it introduces a parameter η to keep a proper tradeoff between them. Hence, GLSA can extract more meaningful information in the low-dimensional latent variable space. With appropriate parameter selections, LPP and PCA can be considered as two special forms of the GLSA model. Hence, compared to the two conventional methods, GLSA is more general; therefore, it is more flexible for practical applications. Based on the simulations results of two case studies, it can be concluded that the GLSA model is more efficient for data information extraction, based on which the fault detection sensitivity has also been improved by the corresponding method.



Part of this paper was presented at the 49th IEEE Conference on Decision and Control, December 1517, 2010, Atlanta, GA, USA.

’ ACKNOWLEDGMENT This work was supported in part by the National Natural Science Foundation of China (No. 60974056). and the National High Technology Research and Development (863) Program of China (No. 2009AA04Z154) ’ REFERENCES (1) Kresta, J.; MacGregor, J. F.; Marlin, T. Multivariate statistical monitoring of process operating performance. Can. J. Chem. Eng. 1991, 69, 35–47. (2) Chiang, L. H.; Russell, E. L.; Braatz, R. D. Fault Detection and Diagnosis in Industrial Systems; SpringerVerlag: London, 2001. (3) Kano, M.; Nagao, K.; Hasebe, H.; Hashimoto, I; Ohno, H.; Strauss, R.; Bakshi, B. R. Comparison of multivariate statistical process monitoring methods with applications to the Eastman challenge problem. Comput. Chem. Eng. 2002, 26, 161–174. (4) Venkatasubramanian, V.; Rengaswamy, R.; Kavuri, S. N.; Yin, K. A review of process fault detection and diagnosis. Part III: Process history based methods. Comput. Chem. Eng. 2003, 27, 327–346. (5) Ge, Z. Q.; Yang, C. J.; Song, Z. H. Improved kernel PCA-based monitoring approach for nonlinear processes. Chem. Eng. Sci. 2009, 64, 2245–2255. (6) Zhao, S. J.; Zhang, J.; Xu, Y. M. Monitoring of processes with multiple operation modes through multiple principle component analysis models. Ind. Eng. Chem. Res. 2004, 43, 7025–7035. (7) Qin, S. J. Statistical process monitoring: basics and beyond. J. Chemom. 2003, 17, 480–502. (8) Hotelling, H. Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 1933, 24, 417–441. (9) Nomikos, P.; MacGregor, J. F. Multi-way partial least square in monitoring batch processes. Chem. Intell. Lab. Syst. 1995, 30, 97–108. (10) Ergon, R. Reduced PCR/PLSR models by subspace projections. Chem. Intell. Lab. Syst. 2006, 81, 68–73. (11) Kruger, U.; Dimitriadis, G. Diagnosis of process faults in chemical systems using a local partial least squares approach. AIChE J. 2008, 54, 2581–2596. (12) He, X.; Min, W. Statistical and computational analysis of locality preserving projection. Int. Conf. Mach. Learn. 2005, 119, 281–288. (13) Tenenbaum, J. B.; Silva, V.; Langford, J. C. A global geometric framework for nonlinear dimensionality reduction. Science 2000, 290, 2319–2323. (14) Roweis, S. T.; Saul, L. K. Nonlinear dimensionality reduction by locally linear embedding. Science 2000, 290, 2323–2326. 6847

dx.doi.org/10.1021/ie102564d |Ind. Eng. Chem. Res. 2011, 50, 6837–6848

Industrial & Engineering Chemistry Research

ARTICLE

(15) Belkin, M.; Niyogi, P. Laplacian eigenmaps and spectral techniques for embedding and clustering. Neural Inform. Process. Syst. 2002, 1, 585–592. (16) He, X.; Niyogi, P. Locality preserving projections. Neural Inform. Process. Syst. 2003, 16, 153–160. (17) Cai, D.; He, X.; Han, J. Document Clustering using Locality Preserving Indexing. IEEE Trans. Knowledge Data Eng. 2005, 17, 1624–1637. (18) Hu, K. L.; Yuan, J. Q. Multivariate statistical process control based on multiway locality preserving projections. J. Process Control 2008, 18, 797–807. (19) Shao, J. D.; Rong, G.; Lee, J. M. Generalized orthogonal locality preserving projections for nonlinear fault detection and diagnosis. Chem. Intell. Lab. Syst. 2009, 96, 75–83. (20) Bakshi, B. R. Multiscale PCA with applications to multivariate statistical process monitoring. AIChE J. 1998, 44, 1596–1610. (21) Bishop, C. M. Pattern Recognition and Machine Learning; Springer: New York, 2006. (22) He, X.; Cai, D.; Yan, S.; Zhang, H. J. Neighborhood preserving embedding. In Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV 2005), Vol. 2; IEEE: Piscataway, NJ, 2005; pp 12081213. (23) Lu, X. F. Application Basis of an Optimization Method; Tongji University Press: Shanghai, PRC, 2003. (24) Zhao, C. H.; Gao, F. R.; Wang, F. L. An Improved Independent Component Regression Modeling and Quantitative Calibration Procedure. AIChE J. 2010, 56, 1519–1535. (25) Liu, Q.; Tang, X.; Lu, H.; Ma, S. Face Recognition Using Kernel Scatter-Difference-Based Discriminant Analysis. IEEE Trans. Neural Network 2006, 17, 1081–1085. (26) Cai, D.; He, X.; Han, J.; Zhang, J. Orthogonal Laplacian Faces for Face Recognition. IEEE Trans. Image Process 2006, 15, 3608–3614. (27) He, X. Locality preserving projections. Ph.D dissertation, The University of Chicago, Chicago, IL, USA, 2005. (28) Westerhuis, J. A.; Gurden, S. P.; Smilde, A. K. Generalized contribution plots in multivariate statistical process monitoring. Chem. Intell. Lab. Syst. 2000, 51, 95–114. (29) Ge, Z. Q.; Song, Z. H. Multimode process monitoring based on Bayesian method. J. Chemom. 2009, 23, 636–650. (30) Zhang, T.; Yang, J.; Zhao, D.; Ge, X. Linear local tangent space alignment and application to face recognition. Neurocomputing 2007, 70, 1547–1553. (31) Saul, L. K.; Roweis, S. T. Think globally, fit locally: unsupervised learning of low dimensional manifolds. J. Mach. Learn. Res. 2003, 4, 119–155. (32) Downs, J. J.; Vogel, E. F. A plant-wide industrial process control problem. Comput. Chem. Eng. 1993, 17, 245–255. (33) Lyman, P. R.; Georgakist, C. Plant-wide Control of the Tennessee Eastman Problem. Comput. Chem. Eng. 1995, 19, 321–331. (34) Maurya, M. R.; Rengaswamy, R.; Venkatasubramanian, V. Fault diagnosis by qualitative trend analysis of the principal components. Chem. Eng. Res. Des. 2005, 83, 1122–1132. (35) Ge, Z. Q.; Song, Z. H. Process Monitoring Based on Independent Component Analysis-Principal Component Analysis (ICA-PCA) and Similarity Factors. Ind. Eng. Chem. Res. 2007, 46, 2054–2063. (36) Russell, E.; Chiang, L.; Braatz, R. Fault Detection in Industrial Processes Using Canonical Variate Analysis and Dynamic Principal Component Analysis. Chem. Intell. Lab. Syst. 2000, 51, 81–93. (37) Juricek, B. L.; Seborg, D. E.; Larimore, W. E. Fault Detection Using Canonical Variate Analysis. Ind. Eng. Chem. Res. 2004, 43, 458–474.

6848

dx.doi.org/10.1021/ie102564d |Ind. Eng. Chem. Res. 2011, 50, 6837–6848