A sparse reconstruction strategy for online fault diagnosis in

Jun 1, 2017 - Fault detection and diagnosis for nonstationary processes is a difficult task since the fault signal may be buried in the nonstationary ...
0 downloads 13 Views 2MB Size
Subscriber access provided by CORNELL UNIVERSITY LIBRARY

Article

A sparse reconstruction strategy for online fault diagnosis in nonstationary processes with no priori fault information He Sun, Shumei Zhang, Chunhui Zhao, and Furong Gao Ind. Eng. Chem. Res., Just Accepted Manuscript • Publication Date (Web): 01 Jun 2017 Downloaded from http://pubs.acs.org on June 2, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Industrial & Engineering Chemistry Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

A sparse reconstruction strategy for online fault diagnosis in nonstationary processes with no priori fault information He Sun,1 Shumei Zhang,1 Chunhui Zhao,1* Furong Gao2 1 State Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou, 310027, China 2 Department of Chemical and Biomolecular Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong Special Administrative Region

Abstract Fault detection and diagnosis for nonstationary processes is a difficult task since the fault signal may be buried in the nonstationary trends of variables. Most of the traditional multivariate statistical methods, such as PCA, PLS, etc., cannot describe the relations among nonstationary variables due to their stationary assumption. The motivation of the paper is to address the fault diagnosis problem for nonstationary industrial processes. A sparse reconstruction strategy is proposed to online isolate and diagnose the faulty variables based on cointegration analysis (CA) which does not require any historical fault data. First, nonstationary variables are distinguished and separated from stationary variables so that the cointegration models are constructed to describe the long-run equilibrium relation among the nonstationary variables. Second, a faulty variables isolation strategy is formulated for online fault diagnosis by integrating least absolute shrinkage and selection operator (LASSO) into cointegration analysis. The selection procedure is iteratively implemented which covers the sparse idea of variable selection and the trick of fault reconstruction. The proposed method can automatically and real-time isolate multiple faulty variables that are responsible for the abnormal operation for the nonstationary processes without using any historical fault data.

1

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The comparison of reconstruction-based contribution shows that the proposed method can effectively isolate the major faulty variables. The performance of the proposed method is illustrated with the Tennessee Eastman process. Besides, the proposed method has been successfully applied to a real industrial process of the thermal power plant. Keywords: fault diagnosis, nonstationary process, cointegration analysis, least absolute shrinkage and selection operator, variable selection.

1. INTRODUCTION Process monitoring and fault diagnosis play a very important role in the modern industrial plants to ensure productivity, energy efficiency, product quality, and plant safety. Any abnormal situations should be recognized in time to avoid the occurrence of major accident in the industrial plants.1 When a fault is detected, fault diagnosis is employed in order to identify the variables which are mainly responsible for the fault. Then appropriate actions are taken to eliminate the fault effect and return the plant to the normal situation. Considering the large number of process variables, the multivariate statistical methods 2-10 have been widely used in the field of process monitoring and fault diagnosis because of their simplicity. Among those methods, the multivariate statistical methods such as principal component analysis (PCA),11,12 partial least squares (PLS),13 fisher discriminant analysis (FDA), 14 are mainly used. Many of their extensions have been proposed to solve the nonlinear 15-17 and multimode 18,19 problem. However, these methods assume that the process is stationary. However, many process variables are nonstationary whose statistical characteristics are changing with time due to many factors such as throughput changes, equipment aging, unmeasured disturbances, and human interventions, etc.20 Box et al.21 have stated that if the statistical characteristics of a time series such as mean and variance change with time, this time series is nonstationary. Fault detection and diagnosis for nonstationary process monitoring is a difficult task due to the fault signal may be buried by nonstationary trends resulting in low fault detection rate.

2

ACS Paragon Plus Environment

Page 2 of 49

Page 3 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

The characteristic of the nonstationary process has been described in the recent work.22-24 Byon et al 24 have mentioned that the process variables and output of the process change with time due to the internal and external factors so that the process has the nonstationary properties. Compared with the nonstationary process, the amplitude and spectral content of the stationary process may not vary with time.25 The time series xt is said to be weakly stationary if it shows a time-varying mean or a time-varying variance or both.26 The conventional multivariate statistical approaches may not be able to describe the relation among the nonstationary variables. Fault detection and diagnosis for nonstationary processes has been a great problem which should be paid special attentions. Most of the conventional methods calculated the difference of the original nonstationary time series to deal with nonstationary behavior. After calculation of difference, the nonstationary time series can become stationary ones whose statistical properties are constant.21 Although the data-differencing process can turn the nonstationary time series into stationary one, this preprocessing can cause loss of dynamic information and fault features in the data that leads to poor fault detection and diagnosis performance. As an effective method to investigate the relationship between nonstationary variables, the concept of cointegration analysis (CA) was developed by Engel and Granger.27 Meanwhile, they proved that a set of nonstationary series may become stationary if the nonstationary variables have the same order of integration. The cointegration analysis has been widely used in the economic field by economists and statisticians over last three decades.28-31 Then, the CA method was introduced by Chen et al.20 to the field of process monitoring for the nonstationary industry processes. They showed that the nonstationary variables in the nonstationary processes correlate to each other, and those nonstationary variables maintain a long-run dynamic equilibrium relation governed by the physical mechanisms at the designed operating situation. The cointegration model was constructed for these nonstationary variables, and the residual variable was

3

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

obtained from the cointegration model which is stationary when process is running in the normal situation. If a fault occurs, the long-run equilibrium relationship among the nonstationary variables will be broken and the residual variable becomes nonstationary. According to their work, the CA method can be used to analyze the nonstationary processes. However, they only used one residual variable for fault detection which may not capture all disturbances. Li et al.32 derived multiple cointegration vectors using cointegration analysis and calculated the orthogonal space of cointegration vectors at the same time to effectively detect the abnormal behaviors. Though the fault detection has been considered for nonstationary industry processes, how to isolate the nonstationary faulty variables in the nonstationary process is a more challenging problem. It is very difficult to isolate the faulty variables for the nonstationary processes since the abnormal signals may not be clearly distinguished from normal nonstationary trend whose statistical characteristics are changing with time. How to realize the fault diagnosis and isolation for the nonstationary industry is very meaningful to ensure productivity, plant safety and so on. However, few work have been reported to address this problem. Different methods have been proposed for fault diagnosis. When a new sample indicates abnormal, the fault classification methods such as fisher discriminant analysis (FDA) 4,33 and support vector machine (SVM)

18

can identify this faulty sample belong to which fault.

However, these fault classification methods have to base on historical fault data. In practice, it is difficult to obtain sufficient historical fault data. The method of contribution plot 8,34 has been widely used to isolate the faulty variables since this approach does not need priori fault knowledge. However, the contribution plot method may result in confusing results by the smearing effect. To solve the problem of the contribution plot, Dunia and Qin 35 proposed a reconstruction-based contribution (RBC) method for fault isolation. The RBC method reconstructs the fault along single variable, which may not effectively isolate the faulty

4

ACS Paragon Plus Environment

Page 4 of 49

Page 5 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

variables that have caused out-of-control statistics. This paper mainly focuses on the diagnosis problem in the nonstationary process. A sparse reconstruction strategy based on sparse cointegration analysis is proposed to isolate and diagnose the faulty variables without use of historical fault data by adopting least absolute shrinkage and selection operator (LASSO). First, the nonstationary variables are distinguished from stationary variables. Then, the cointegration models are built from those nonstationary variables for describing the long run dynamic equilibrium relationship among the nonstationary variables. The least absolute shrinkage and selection operators (LASSO) strategy is integrated with the cointegration models to online isolate the major faulty variables. A sparse variable selection problem is formulated here in which it recognizes the following fact. Only some variables are disturbed and thus be responsible for the monitoring statistics alarming after the occurrence of a fault; and if they are correctly isolated and removed, the out-of-control statistics would go back to the normal region. Different from previous fault diagnosis work, the proposed online fault diagnosis method is conducted for nonstationary processes. The proposed method can deal with the nonstationary issue and real-time isolate multiple faulty variables simultaneously without any priori historical fault data. The performance of the proposed method is illustrated with the Tennessee Eastman process and has been successfully applied to a real industrial process of thermal power plant. The remainder of this paper is organized as follows. In Section 2, the revisit of cointegration analysis is introduced. In Section 3, a reconstruction strategy based on sparse cointegration analysis is proposed to isolate and diagnose the faulty variables for the nonstationary process. In Section 4, the fault diagnosis capability of the proposed method is demonstrated on the Tennessee Eastman process and then it is applied to a real industrial process of thermal power plant. Finally, conclusions are given in the last section.

2. REVISIT of COINTEGRATION ANALYSIS 5

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 49

Cointegration analysis (CA) is an effective method to investigate the relationship between nonstationary variables, which was first developed by Engel and Granger.27 It has been widely used in the economic field in the last three decades. Here it is simply revisited. If a nonstationary time series ωt becomes stationary during differentiated d times, the nonstationary time series is said to be integrated of order d, which is denoted as ωt ~ I ( d ) . Engle

Granger27

and

showed

that

if

a

Ζ ( M × N ) = [ z1 , z 2 ,L , z N ] , z t = ( z1 , z2 ,L , z N )

T

set

of

nonstationary

time

series

hold a long-run equilibrium relation, where

N is the number of nonstationary time series and M is the number of samples, there exists a vector β = ( β1 , β 2 ,L , β N )

T

and the linear of combination of those nonstationary time series

can be described as follows ζ t = β1 z1 + β 2 z2 + L + β N z N = βT z t , t = 1, K , M

(1)

where ζ t is the residual sequence. If ζ t = βT z t ~ I ( d − b ) , d ≥ b > 0 , z t is said to be cointegrated of order (d, b), denoted by

z t ~ CI ( d , b ) , and β is called the cointegration vector, and zi is said to be cointegrated variable. Therefore, the aim of the cointegration analysis is to find the cointegration vector

β. Johansen

36

introduced a method to obtain the cointegration vector based on vector

autoregressive (VAR) model, which is applied to a set of nonstationary variables and integrated of order I(1). Given a set of nonstationary time series X ( M × N ) = [ x1 , x 2 ,L , x N ] ,

xt = ( x1 , x2 ,L , xN ) , xt ~ I (1) , where N is the number of nonstationary time series and M is T

the number of samples. The VAR model of xt can be described as follows

xt = Π1xt −1 + L + Π p xt − p + c + µ t

6

ACS Paragon Plus Environment

(2)

Page 7 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

where Π i ( N × N ) is the coefficient matrix, µ t ( N ×1) is the vector of white noise distributed as N ( 0, Ξ) , c ( N ×1) is the constant and p is the order of the VAR model. Subtract xt -1 from both side of Eq. (2) and the vector error-correction (VEC) model can be obtained p -1

∆xt = ∑ Ωi ∆xt −i + Γxt −1 + µ t

(3)

i =1

p

p

i=1

j = i +1

where Γ = −I N + ∑ Π i and Ωi = − ∑ Π j , i = 1, 2,K p − 1 .

The Γ can be decomposed into two column full rank matrices Γ = ΑΒT , where

Α ( N × R ) and Β ( N × R ) . Then the Eq. (3) change into the following equation p -1

∆xt = ∑ Ωi ∆xt −i + ΑΒT xt −1 + µ t

(4)

i =1

According to the Eq. (4), the residual sequence γ t-1 is obtained as follows 20 p -1 -1   γ t −1 = ΒT xt −1 = ( ΑT Α ) ΑT  ∆xt − ∑ Ωi ∆xt −i − µ t  i =1  

(5)

The components in the xt are cointegrated of order one, i.e., xt ~ I (1) , ∆xt and ∆xt −i are stationary. Obviously, the right components in the Eq. (5) are stationary. ΒT xt −1 denotes the linear combination of the nonstationary variables, and the components in ΒT xt −1 are stationary according to the Eq. (5). Thus, the columns in Β are the cointegration vectors.20 To effectively obtain the cointegration vectors, Johansen 36 proposed a maximum likelihood method to estimate the parameters in the Eq. (4) using the probability density function of the

∆xt at instance t N

f ( ∆xt ) = ∏ f ( ∆xi ) = ( 2π )

−N 2

Ξ

i =1

−1 2

 1  exp  − µTt Ξ−1µ t  2  

(6)

The set of nonstationary time series X include M samples, so the probability density 7

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 49

function of the total samples can be described as follows M

M

N

t =1

t =1 i =1

f ( ∆X M ) = ∏ f ( ∆xt ) = ∏∏ f ( ∆xi ) = ( 2π )

− MN 2

Ξ

−M 2

(7)

 1 M  exp  − ∑ µTt Ξ−1µ t   2 t =1 

p -1

where µ t = ∆xt − ∑ Ωi ∆xt −i + ΑΒT xt −1 . i =1

Apply the natural logarithm to the Eq. (7) to construct the likelihood function as follows L ( Ω1 ,L , Ω p −1 , Α, Β, Ξ) = −

MN M 1 M ln ( 2π ) − ln Ξ − ∑ µTt Ξ−1µ t 2 2 2 t =1

(8)

The parameter Β , i.e., the cointegration vectors matrix can be estimated through maximize the likelihood function L. Johansen36 proved that the maximum likelihood estimation of Β can be obtained by solving the eigenvalue equation −1 λS11 − S10S 00 S 01 = 0

(9)

p -1

p -1

i =1

i =1

where Sij = 1 M ei eTj , i, j = 0,1 ; e0 = ∆xt − ∑ Θi ∆xt −i ; and e1 = xt −1 − ∑ Φi ∆xt −i . The coefficient Θ i and Φi can be estimated by OLS. N eigenvalues, λ1 ≥ λ2 ≥ L ≥ λN are obtained from Eq. (9), and the eigenvector matrix is denoted as V . The cointegration vectors are contained in the V , and the number of cointegration vectors can be determined by a test method which is named Johansen test.36 Assuming that the number of the cointegration vectors is R through the Johansen test, then the cointegration vectors matrix

Β ( N × R ) = [ β1 ,L , β R ] can be obtained. Therefore, the residual sequence γ ti can be obtained as follows γ ti = βTi xt = β i1 x1 + β i 2 x2 + L β iN xN , i = 1,L R

(10)

The Eq. (10) is the cointegration model, and the residual sequence γ ti is stationary. The cointegration model is the linear combination of the nonstationary variables, and it can 8

ACS Paragon Plus Environment

Page 9 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

exactly reflect the relationship among those nonstationary variables.

3. METHODOLOGY In this section, a sparse reconstruction strategy based on cointegration analysis is developed for fault detection and diagnosis of nonstationary processes. It constructs a constrained optimization problem which can be solved by LASSO algorithm for online fault diagnosis without any priori fault information. The sparse idea is used to isolate the faulty variables which may be part of measurement variables. It includes three components, the identification of nonstationary variables, CA modeling for fault detection and online faulty variable isolation.

3.1 Identification of nonstationary variables In the nonstationary processes, not all variables are nonstationary and some of them may present stationary characteristics which may distort the CA model accuracy. Although Chen et al.20 used the nonstationary variables to construct the CA model, they didn’t analyze why the nonstationary variables should be distinguished, and few work have talked about how the stationary variables affect the CA model. The stationary variables should be separated from nonstationary ones and excluded from the following CA. The proof is shown as below. For stationary variables, they do not hold a long-run equilibrium relation. Therefore, the VEC model calculated for stationary variables may not be accurate according to the Eq. (3). Besides, considering the cointegration vectors are obtained through the method of maximum likelihood estimation according to Eq. (8) and Eq. (9) which demands the maximum of the probability density function shown in Eq. (7), the inclusion of stationary variables may directly result in a bad CA model. In specific, the stationary residual sequence can be described as below by the cointegration model,

γ ti = βTis xt = βi1 xn1 + L + βin xnn + βin +1 xs1 L β iN xss , i = 1,L R

(11)

where β is is the cointegration vector, xn1 ,L , xnn denote the nonstationary variables and the 9

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

xs1 ,L , xss denote the stationary variables. From the Eq. (11), it can be seen that the residual sequence γ ti is always stationary whatever the coefficients of the stationary variables would be. That is, the inclusion of stationary variables do not affect the stationarity of the residual sequence γ ti , and the corresponding cointegration vectors may not exactly describe the relationship among the stationary variables. Besides, the inclusion of the stationary variables may affect the accurate characterization of relationships among the nonstationary variables since the information of the nonstationary variables may be hidden by stationary variables during the estimation of maximum likelihood. Therefore, the nonstationary variables should be distinguished from the stationary variables before cointegration analysis. That is, it is essential to verify whether the variables are nonstationary or not in advance. The Augmented Dickey-Fuller (ADF) 37 test strategy, as a very popular tool, is used for judging whether each variable is nonstationary. The nonstationary variables are selected and denoted as

X ( M × N ) = [ x1 , x 2 ,L , x N ] ,

xt = ( x1 , x2 ,L , xN ) , where N is the number of nonstationary variables and M is the number T

of samples. It is noted that this paper mainly focuses on the nonstationarity issue in the industrial processes and only the nonstationary variables are considered for fault diagnosis in this work. For the stationary variables, the conventional methods such as PCA 6 and PLS 10 can be used for fault diagnosis.

3.2 Cointegration analysis based fault detection In the industry process, the cointegration model can describe the long-run equilibrium of the process when the process operates under the normal situation. In other words, the residual sequence of the cointegration model is stationary as long as the process works under the normal situation. The long-run equilibrium would be broken if a fault occurs in the process

10

ACS Paragon Plus Environment

Page 10 of 49

Page 11 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

and the residual sequence becomes nonstationary, so that the fault can be identified. Cointegration analysis, which is described in Section 2 is used here to construct the fault detection model for N nonstationary variables

γ t = ΒT x t

(12)

where Β is the cointegration vectors matrix. The multiple cointegration vectors can be evaluated by solving an eigenvalue problem shown in Section 2. The eigenvector matrix V can be obtained from Eq. (9) and the cointegration vectors are contained in V . A key issue is to determine how many cointegration vectors should be retained. The Johansen test 35 was proposed for testing the nonstationary variables in the economic field. However, Johansen test is only suitable for twelve nonstationary variables. When the number of the nonstationary variables is larger than twelve, the hypothesis testing method can’t be used.36 Here, a determination strategy is developed to determine the retained number of cointegration vectors. First, the Johansen test is used to determine whether the number is larger than 12. If the test is rejected for the assumption that the number is 12, it means there are at least 12 stationary components and then ADF test can be used to determine whether the following residuals are stationary or not starting from the 13th residuals. If the Johansen test is accepted for the assumption that the number is 12, it means the number is less than 12 and then Johansen test can be used to find the specific number of cointegration vectors. The number of retained residuals is determined to be K in the cointegration model which covers all residuals that has passed the stationary test. The cointegration models Β ( N × K ) can describe the relation among nonstationary variables which come from industrial process. When the industry process is in the normal situation, the elements in ΒT xt are stationary and keep the long-run equilibrium. The long-run equilibrium of the elements in the ΒT xt would be broken if a fault occurs in the industry process. Here, the T 2 statistics proposed by

11

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 49

Li et al.32 are calculated based on equilibrium residual series in order to construct the fault detection system. For a new sample vector xtn , the T 2 statistics are calculated as follows ξ t = ΒT xtn  2 T −1 T = ξ t Λ ξ t

where

Λ

Λ = ( XΒ )

T

is

the

covariance

of

normal

(13)

data,

which

is

calculated

( XΒ ) ( M − 1) . From Eq. (13), clearly, the fault may result in change of the

by T2

statistics. When the change is significant enough to cause T 2 statistics to go beyond the confidence regions, the fault can be identified. The confidence limits of T 2 statistics 38 can be obtained by a F-distribution with significance α . 2

T ~

K ( M 2 − 1) M (M − K )

FK , M − K ,α

(14)

3.3 cointegration analysis and sparse reconstruction strategy for fault diagnosis After the detection of abnormal behaviors, isolation of faulty variables should be conducted to diagnose the fault cause. A sparse reconstruction strategy based on cointegration analysis (CA) is proposed here to isolate and diagnose the faulty variables which are most responsible for the abnormal operation. It transforms the online fault isolation problem to a constrained optimization function which can be solved by the least absolute shrinkage and selection operator (LASSO) algebra. The basic idea is described as below. When a fault occurs, the T 2 statistics will be beyond the confidence limits. In fact, not all the variables are responsible for the fault, and only a few variables are disturbed and lead to the T 2 statistics alarming. In other words, the faulty variables are sparse. When the effects of the faulty variables are removed, the T 2 statistics will be back to the normal region.

12

ACS Paragon Plus Environment

Page 13 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Therefore, the faulty variables should be isolated for fault diagnosis, and the procedure is equivalent to the variable selection. LASSO 39 is a popular tool for variable selection. In this paper, the procedure of faulty variables selection can be transformed into the form of LASSO based on the cointegration analysis. For a fault observation x f , it can be decomposed as follows

x f = x*f + Ue

(15)

where x*f denotes the fault free part, U is an orthonormal matrix that contains the fault directions, e contains the information of fault magnitude so that e represents the fault magnitude. In the previous discussion, the T 2 statistics will go back to the normal region when the effects of the faulty variables are removed. Meanwhile, the faulty variables are often sparse in nature considering that not all the variables are responsible for the fault. Therefore, the faulty variables selection procedure can be formulated as the optimization problem as follows

min ( x f − Ψ ) ΒΛ −1ΒT ( x f − Ψ ) T

(16)

subject to Ψ 1 ≤ µ where Ψ =Ue , · 1 denotes l1 -norm , and µ is a constant. In the Eq. (16), the represents the fault free part in Eq. (15), and

(x

(x

f

− Ψ)

− Ψ ) ΒΛ −1ΒT ( x f − Ψ ) denotes T 2 T

f

statistics which are calculated by Eq. (13). The optimization in the Eq. (16) can be described as the LASSO problem. The method of LASSO is based on the regression model y = xφ + υ

(17)

where x ( L × J ) and y ( L × 1) , L is the number of samples, J is the number of variables, φ is the regression coefficients which are unknown, and υ is the residual. The LASSO method

13

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 49

can effectively estimate the regression coefficient φ through solving the optimization problem as follows

min ( y − xφ ) ( y − xφ ) T

(18)

subject to φ 1 ≤ µ

According to the Eq. (18), due to the constraint of l1 -norm , LASSO can shrink the estimated regression coefficients. Thus, the elements in the φ are sparse, i.e., some coefficients may be zero. In this way, LASSO can achieve automatic variable selection. The smaller the value of µ is, the more zero elements in φ there are. In other words, the less variables are selected. The optimization problem in Eq. (16) is similar to the LASSO form in the Eq. (18). Here, the objective function in Eq. (16) can be represented as that in the Eq. (18). Perform the Cholesky decomposition on Λ −1 , Λ −1 = ZZT

(19)

where Z is a lower triangular matrix. Then, the Eq. (16) becomes

(

min ( ΒZ ) x f − ( ΒZ ) Ψ T

T

) (( ΒZ ) T

T

x f − ( ΒZ ) Ψ

subject to Ψ 1 ≤ µ

T

)

(20)

Let y = ( ΒZ ) x f , and η = ( ΒZ ) , and then the Eq. (20) is changed into the form of Eq. T

T

(18). Thus, the fault diagnosis problem is transformed into the variable selection problem based on the cointegration analysis. According to the Eq. (20), the regression coefficients Ψ change with different values of

µ . The larger value of µ is, the more variables are selected as faulty variables. In other words, there are more nonzero elements in Ψ . If the value of µ is too small, most of the coefficients in Ψ may become zero. In that case, some faulty variables may be ignored. The least angle regression (LARS) algorithm which is proposed by Efron et al.40 can be used for 14

ACS Paragon Plus Environment

Page 15 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

solving the LASSO problem. In this paper, the LARS algorithm is modified for isolating and diagnosing the faulty variables. The details of the proposed faulty variable selection algorithm are described as following: Step 1. Initialize the procedure with i = 0, Ψ 0 = 0 , and the active set Α 0 =∅ . Step 2. Calculate the current correlation vector as follows c = ΒZZT ΒT ( x f − Ψ i )

(21)

and the active set Αi is the set of the greatest absolute current correlation value

C = max{ c j } j

and Α i = { j : c j = C}

(22)

where c j is the j-th element in c Step 3. Letting s j = sign{c j } for j ∈ Α i , s j = ±1 . Then the equiangular vector is calculated by µ i = Siωi

(23)

where S i = (L s j η j L) j ∈ Α i , η j is the j-th element in the

(

ωi = 1Ti ( STi S i ) 1i −1

)

−1 2

(S S ) T i

−1

i

( ΒZ )

T

= η . And the

1i , and 1i being a vector of 1’s of the length equal to the size

of Α i . Step 4. Update the regression coefficients as follows

( ΒΖ )

T

where

 

γ i = min +  j∈Α Ci

C − cj

 (1 S S i 1i ) T i

T i

−1 2

−α j

,

Ψ i +1 = ( ΒΖ ) Ψ i + γ i µ i T

   , −1 2 T T 1 S S 1 + α ( i i i i) j   C + cj

(24)

and

the

Α Ci

is

the

complementary set of Α i . The α j is the j-th element in the α i = ( ΒΖ ) µ i . T

Step 5. Calculate the T 2 statistics based on

(x

− Ψ i ) ΒΛ −1ΒT ( x f − Ψ i ) . If the T 2 T

f

15

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 49

statistic is less than the control limit, then the elements in the active set are considered as faulty variables, and the selection of faulty variables stops. Otherwise, repeat Step 2 through Step 4 until the T 2 statistic goes below the control limit. The output of the algorithm is Ψ , i.e., the selected faulty variables. In practice, the number of selected faulty variables is smaller than the total number of the process variables. The proposed algorithm is directly conducted on each fault sample, so that no historical fault data is needed. After the proposed method is conducted on all fault samples, the frequency that each variable has been selected is calculated. The larger the frequency of selection is, the more possibly the corresponding variable is responsible for the fault. The sorted variables are removed one by one until the value of missing reconstruction ratio (MRR) less than or equal to the threshold value α which is defined as below,

Rm = where Nmf

N mf Nf

×100

(25)

reveals the missing reconstruction which denotes the number of out-of-control

statistics after removal of faulty variables. Nf is the number of fault samples. In the next section, the sparse reconstruction strategy based on cointegration analysis is conducted on the two nonstationary processes, the Tennessee Eastman process and a real industry process of thermal power plant.

4. APPLICATION AND RESULTS 4.1 Tennessee Eastman process (TEP) As a well-known process, the TEP was first introduced by Downs and Vogel 41 which has been widely used to test the methods for process monitoring.42,43 The process contains five major units: a reactor, a product condenser, a vapor-liquid separator, a recycle compressor and a product stripper. It contains 41 measured variables and 11 manipulated variables. The sample time of the process variable is 3 minutes. Different types of faults are included in the 16

ACS Paragon Plus Environment

Page 17 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

TEP. The fault set contains 960 samples for each fault, and the fault occurs after the 160th sample. In this section, two fault cases including Fault #10 and Fault #12 are introduced to investigate the performance of the proposed method. The process variables are listed in the Table 1 and the flowchart of TEP is shown in Fig. 1. (Insert Figure 1 here) (Insert Table 1 here) The ADF test method is used on the 480 samples which are collected under the normal operation. From the test result, nine variables are recognized as nonstationary and the nonstationary set is X = [x 7 , x11 , x13 , x16 , x18 , x19 , x31 , x38 , x 46 ] . Next, the cointegration model are constructed for the nonstationary set X . In the TE process, there are five cointegration vectors which are denoted as Β ( 9 × 5) .

The Fault #10 is a random variation in C feed temperature of stream 4. Fig. 2 shows the fault detection result based on the proposed method. As shown in Fig. 2, the fault is detected timely. Then the proposed fault diagnosis procedure is performed to isolate and diagnose the faulty variables for the first fifty fault samples from 161st to 210th. Fig. 3(a) shows the frequency of every variable being selected. The threshold value α is defined as 10% for MRR in the present work. This fault occurs in Stream 4 which directly connects with the stripper so that the stripper temperature ( x18 ) get terribly abnormal and have the largest selection frequency. When variables x18 (stripper temperature), x38 (Component E), x11 (separator temperature) and x31 (Component C) are removed, the monitoring results are shown in Fig. 3(b) where the value of MRR is 4% less than the threshold value α . Thus, these four variables are mainly responsible for this fault. Fig. 4(a) shows that the frequency of every variable being selected in the second fifty fault samples from 211st to 260th sample. From Fig. 4(a), it can be seen that x31 (Component C) and x11 (product separator 17

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

temperature) have larger selection frequency than that in the first fifty fault samples, which indicates that the fault effects may spread in the process. The monitoring results after removing the five faulty variables are shown in Fig. 4(b). The value of MRR is 6% when the variables x18 , x38 , x11 , x31 and x46 are removed, which means that forty seven alarming monitoring statistics can go back to the normal region for the second fifty fault samples, and these five variables are mainly responsible for this abnormity. (Insert Figure 2 here) (Insert Figure 3 here) (Insert Figure 4 here) The Fault #12 is a random variation in condenser cooling water inlet temperature. Fig. 5 shows the fault detection result based on the proposed method. As shown in the Fig. 5, the T2 statistics go beyond the control limit as soon as the fault occurs. Then the proposed fault diagnosis procedure is performed. Fig. 6(a) shows the selection frequency of every variable in the first fifty fault samples from 161st to 210th. When the variables x38 (Component E), x11 (product separator temperature), x18 (stripper temperature), x46 (compressor recycle) and x31 (Component C) are removed, the T 2 statistics are plotted in Fig. 6(b). The value of MRR is 4% which is less than the threshold value α . Obviously, forty eight T 2 statistics go back to the normal region, which indicates that these five variables are mainly effected by this fault for the first fault samples. Further, the second fifty fault samples from 211st to 260th sample are chosen to identify the faulty variables. In Fig. 7(a), the selection frequency of every variable is shown. When the first seven variables with the largest frequency are removed, the monitoring results are shown in Fig. 7(b) where the value of MRR is 8% less than the threshold value α (10% here). (Insert Figure 5 here) (Insert Figure 6 here) 18

ACS Paragon Plus Environment

Page 18 of 49

Page 19 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

(Insert Figure 7 here)

4.2 Thermal power plant In this subsection, the proposed fault diagnosis method is applied to a real industrial process of the thermal power plant. Thermal system of a power plant mainly includes two subsystems, steam turbine system and boiler system. The water is heated to the steam with high pressure and temperature by the boiler system, and then the steam is transported to the steam turbine system that drives the electrical generator. The boiler system is used to transform the chemical energy of coal to the heat energy. The steam turbine system drives the electrical generator and transforms the heat energy to the mechanical energy, and electrical generator in the steam turbine system transforms the mechanical energy to the electrical energy.44 According to the description of nonstationary process by Byon et al,24 the process of thermal power plant is deemed here to have the nonstationary properties since the process variables and output (power) change with time due to the internal factors (e.g., wear and tear on the steam turbine) and external effects (e.g., variable load conditions). Fig. 8 shows the plot of power from the actual operation of a thermal power plant. From Fig. 8, the value of power changes with time, revealing the nonstationary characteristics. The flowchart of thermal power plant 45 is shown in Fig. 9. Two cases from the real process of thermal power plant are selected to illustrate the proposed method. (Insert Figure 8 here) (Insert Figure 9 here) In Case #1, the data come from the N0.8 unit in the thermal power plant whose power output is 1000MW. 159 variables are contained, which involve pressure, temperature, water level, flow etc. 2880 normal samples are used to construct the cointegration models. 960 fault samples are used to test the proposed method, in which the press of cooling water increases from the 46th sample in the condenser. The sample time is 60s. After ADF test, 51 variables

19

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

are assumed to be nonstationary, which are denoted as X = [x1 , x 2 ,L , x51 ] . Then, the 51 nonstationary variables are used to construct the cointegration models and the cointegration vectors Β ( 51× 42 ) are obtained. The fault detection result based on the proposed method is shown in Fig. 10. From Fig. 10, the fault can be correctly detected at the 46th sample. (Insert Figure 10 here) After fault detection, the proposed method is used to isolate the faulty variables at each sample. Fig. 11 shows the fault diagnosis results using the proposed method for the first fifty fault samples from 46th to 95th sample. In Fig. 11(a), the selection frequency of every variable is shown for the fifty samples. For 10% MRR, only nine variables are selected by the proposed method. As shown in Fig. 11(b), forty eight monitoring statistics in the first fifty fault samples go back to the normal region after removing these faulty variables with the MRR as low as 4%. These first nine variables are x38 (high pressure heater B trap ), x2 (condenser A left chamber circulating water inlet pressure), x32 (high pressure heater A inlet steam temperature), x36 (high pressure heater A trap), x3 (circulating water pump outlet pressure), x1 (condenser A right chamber circulating water inlet pressure), x6 (low pressure heater A trap), x45 (low pressure heater outlet trap) and x15 (deaerator trap). All these selected variables are closely related with the disturbed condenser. Besides, the variable x2 (condenser A left chamber circulating water inlet pressure) belongs to the condenser where the fault happened and shows the second largest selection frequency using the proposed method. The faulty variable selection results well agree with the real case. The selection frequency obtained by the proposed method for every variable in the second fifty fault samples is plotted in Fig. 12(a). When ten variables x32 , x40 , x3 , x45 , x2 , x1 , x19 , x38 , x36 and x24 are removed, the monitoring results are shown in Fig. 12(b). The value of MRR is 8%, i.e., forty six alarming monitoring statistics in the second fifty fault samples go 20

ACS Paragon Plus Environment

Page 20 of 49

Page 21 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

back to the normal region. (Insert Figure 11 here) (Insert Figure 12 here) In Case #2, the data are obtained from the N0.3 unit in thermal power plant whose power output is 600MW. 154 variables are included, which involve pressure, temperature, water level, flow, etc. 2880 normal samples are used to construct the cointegration model. 960 fault samples are used for online monitoring, in which the outlet press of the circulating water pump increases from the 501st sample. The ADF test is used on the normal data and 76 variables are identified nonstationary which are denoted as X = [x1 , x 2 , L , x 76 ] . The fault detection result based on the cointegration analysis is shown in Fig. 13. The T2 statistics are under the control limit for the first 500 samples and then the statistics go beyond the control limit as soon as the fault occurs. (Insert Figure 13 here) The same fault diagnosis strategy is employed for Case #2. The proposed method is performed on the first fifty fault samples, and the results of fault diagnosis are shown in Fig. 14. Fig. 14(a) shows the selection frequency obtained by the proposed method for every variable in the first fifty fault samples. The variables are sorted by their selected frequency and the removed one by one until the value of MRR is less than or equal to the threshold value α . When α is equal to 10%, the first twelve variables are selected by the proposed method. As shown in Fig. 14(b), the forty six monitoring statistics are under the control limit after removing the first twelve variables with the MRR as low as 8%. These twelve variables are involved with the condenser, high pressure heater, low pressure heater, deaerator and circulating water pump. The fault occurred in circulating water pump which connects with the condenser, high pressure heater, low pressure heater and deaerator, and the variable x74 (circulating water pump outlet press) with the largest selection frequency is the location 21

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

where the occurred. The second fifty fault samples from 551st to 600th are used to further identify the faulty variables in Case #2. The selection frequency obtained by the proposed method for every variable is shown in Fig. 15(a). These variables are sorted by their value of selection frequency and removed one by one from the original data set. The value of MRR is 4% after removing the first thirteen variables when the α is equal to 10%. The monitoring results after removing these faulty variables are shown in the Fig. 15(b), and forty eight monitoring statistics go back to the normal region. It reveals that the proposed method can isolate the variables that are mainly responsible for the fault. (Insert Figure 14 here) (Insert Figure 15 here) (Insert Table 2 here) Table 2 shows the comparison results evaluated by the MRR between the proposed approach and the reconstruction-based contribution (RBC) 35 method for the first fifty fault samples and two fault cases. For the RBC method, the mean contribution of each variable is calculated for the first fifty fault samples. Then, the variables are sorted and selected according to their contributions. To compare the performance of the two methods, different threshold values ( α ) are used, including 30%, 20% and 10%. The comparison results for the first fifty fault samples are summarized in Table 2 evaluated by MRR. When the threshold value α is equal to 30%, the value of MRR is 30% after the first two variables x38 and x2 are removed for the proposed method. However, the value of MRR is less than the threshold value α until the first five variables ( x38 , x36 , x18 , x31 and x48 ) are removed for the RBC method. For the same threshold value α of MRR, the number of faulty variables selected by the proposed approach is less than the RBC method. It indicates that the proposed approach can effectively isolate those variables which are mainly responsible for the fault. Besides, the proposed sparse reconstruction strategy can select multiple faulty variables 22

ACS Paragon Plus Environment

Page 22 of 49

Page 23 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

simultaneously, which can effectively isolate the concerned nonstationary variables that have caused out-of-control statistics.

5. CONCLUSIONS In this paper, a sparse reconstruction strategy based on cointegration analysis is proposed to isolate and diagnose the faulty variables for nonstationary industrial processes in which many process variables are nonstationary due to various factors such as throughput changes, equipment aging, unmeasured disturbances, human interventions and so on. The proposed method can describe the long-run equilibrium relation among the nonstationary variables, and can automatically select multiple faulty variables simultaneously for the nonstationary processes without any priori fault data and fault directions. The performance of the proposed method is illustrated with the Tennessee Eastman process. And finally, the proposed method has been successfully applied to a real industrial process of the thermal power plant. The faulty variables can be correctly identified by the proposed method which agrees well with the real case. In comparison with the reconstruction-based contribution fault diagnosis method, the proposed method can effectively isolate the faulty variables. The issue of fault diagnosis for nonstationary processes is critical which demands consistent devotion in particular for the location of the root-cause variables. The results of this study have provided the basis for further work and improvement.

 ACKNOWLEDGMENTS This work is supported by the National Natural Science Foundation of China (Nos. 61422306 and 61433005) and Open Research Project of the State Key Laboratory of Industrial Control Technology, Zhejiang University, China (No. ICT170334).

23

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

REFERCENCES (1) Chiang, L. H.; Russell, E. L.; Braatz, R. D. Fault Detection and Diagnosis in Industrial System; Springer Verlag: London, 2001.

(2) Zhang, S. M.; Zhao, C. H.; Wang, S.; Wang, F. L. Pseudo Time-slice Construction Using Variable Moving Window-k Nearest Neighbor (VMW-kNN) Rule for Sequential Uneven Phase Division and Batch Process Monitoring. Ind. Eng. Chem. Res. 2017, 56, 728-740. (3) Chiang, L. H.; Russell, E. L.; Braatz, R. D. Fault Diagnosis in Chemical Processes Using Fisher Discriminant Analysis, Discriminant Partial Least Squares, and Principal Component Analysis. Chemom. Intell. Lab. Syst. 2000, 50, 243-252. (4) Zhao, C. H.; Sun, Y. X. Subspace Decomposition Approach of Fault Deviations and its Application to Fault Reconstruction. Ctrl. Eng. Pract. 2013, 21, 1396-1409. (5) Hsu, C. C.; Su, C. T. An Adaptive Forecast-Based Chart for Non-Gaussian Processes Monitoring: With Application to Equipment Malfunctions Detection in a Thermal Power Plant. IEEE Trans. Ctrl. Syst. Tech. 2011, 19, 1245-1250. (6) Alcala, C. F.; Qin, S. J. Reconstruction-based Contribution for Process Monitoring with Kernel Principal Component Analysis. Ind. Eng. Chem. Res. 2010, 49, 7849-7857. (7) Zhao, C. H.; Gao, F. R. Critical-to-fault-degradation Variable Analysis and Direction Extraction for Online Fault Prognostic. IEEE Trans. Ctrl. Syst. Tech. 2017, 25, 842-854. (8) Choi, S. W.; Lee, I. B. Multiblock PLS-based Localized Process Diagnosis. J. Process Control 2005, 15, 295-306.

(9) Chen, Q.; Kruger, U. Analysis of Extended Partial Least Squares for Monitoring Large-scale processes. IEEE Trans. Ctrl. Syst. Tech. 2005, 13, 807-813. (10) Kruger, U.; Kumar, S.; Litter, T. Improved Principal Component Monitoring Using the Local Approach. Automatica 2007, 43, 1532-1542.

24

ACS Paragon Plus Environment

Page 24 of 49

Page 25 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

(11) Zhao, C. H.; Gao, F. R. Fault-relevant Principal Component Analysis (FPCA) Method for Multivariate Statistical Modeling and Process Monitoring. Chemom. Intell. Lab. Syst. 2014, 133, 1-16. (12) Jackson J. E. A User’s Guide to Principal Components; Wiley: New York, 1991. (13) Burnham, A. J.; Viveros, R.; MacGregor, J. F. Frameworks for Latent Variable Multivariate Regression. J. Chemometrics 1996, 10, 31-45. (14) Duda, R. O.; Hart, P. E. Pattern Classification and Scene Analysis; Wiley: New York, 1973. (15) Yu, J. Nonlinear Bioprocess Monitoring Using Multiway Kernel Localized Fisher Discriminant Analysis. Ind. Eng. Chem. Res. 2011, 50, 3390-3402. (16) Chiang, L. H.; Kotanchek, M. E.; Kordon, A. K. Fault Diagnosis Based on Fisher Discriminant Analysis and Support Vector Machine. Comput. Chem. Eng. 2004, 28, 1389-1401. (17) Yu, J. A New Fault Diagnosis Method of Multimode Processes Using Bayesian Inference Based Gaussian Mixture Contribution Decomposition. Eng. Appl. AI. 2013, 26, 456-466. (18) Zhao, C. H.; Wang, W.; Qin, Y.; Gao, F. R. Comprehensive Subspace Decomposition with Analysis of between-mode relative Changes for Multimode Process Monitoring. Ind. Eng. Chem. Res. 2015, 54, 3154-3166.

(19) Yoo, C. K.; Villez, K.; Lee, I. B.; Rosen, C. Multi-model Statistical Process Monitoring and Diagnosis of a Sequencing Batch Reactor. Biotech. Bioeng. 2007, 96, 687-701. (20) Chen, Q.; Kruger, U.; Leung, A. Y. T. Cointegration Testing Method for Monitoring nonstationary process. Ind. Eng. Chem. Res. 2009, 48,3533-3543. (21) Box, G. E. P.; Jenkins, G. M.; Reinsel, G. G. Time Series Analysis Forecasting and Control, 3rd ed; Prentice-Hall: Englewood Cliffs, NJ, 1994.

(22) Qiu, L. Measure of Instability. J. Control and Decision 2015, 2, 87-98.

25

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(23) Alwan, M. S.; Liu, X. Z. Recent Results on Stochastic Hybrid Dynamical Systems, J. Control and Decision 2016, 1, 68-103.

(24) Byon, E.; Choe, Y. J.; Yampikulsakul, N. Adaptive Learning in Time-variant Processes with Application to the Wind Power Systems. IEEE Trans. Automation Science and Eng. 2016, 13, 997-1007. (25) Huang, G. Q.; Su, Y. W.; Kareem, A.; Liao, H. L. Time-frequency Analysis of Nonstationary Process Based on Multivariate Empirical Mode Decomposition. J. Eng. Mech. 2016, 142, 04015065-1-04015065-15.

(26) Brockwell, P.J.; Davis, R. A. Time Series: Theory and Methods; Springer Science Business Media: New York, 2006. (27) Engle, R. F.; Granger, C. W. J. Cointegration and Error-correction: Representation, Estimation and Testing. Econometrica 1987, 55, 251-276. (28) Narayan, P. K. The Saving and Investment Nexus for China: Evidence from Cointegration Tests. Applied Economics 2005, 37, 1979-1990. (29) Asafu-Adjaye, J. The Relationship between Energy Consumption, Energy Prices and Economic Growth: Time Series Evidence from Asian Developing Countries. Energy Economics 2000, 22, 615-625.

(30) Lee, C. C. Energy Consumption and GDP in Developing Countries: A Cointegrated Panel Analysis. Energy Economics 2005, 27, 415-427. (31) Stem, D. I. A Multivariate Cointegration Analysis of the Role of Energy in the US Macroeconomy. Energy Economics 2000, 22, 267-283. (32) Li, G.; Qin, S. J.; Yuan, T.

Nonstationary and Cointegration Tests for Fault Detection of

Dynamic Process. IFAC, 10616-10621, August 24-29, 2014. (33) Yu, J. Localized Fisher Discriminant Analysis Based Complex Chemical Process Monitoring. AIChE J. 2011, 57, 1817-1828.

26

ACS Paragon Plus Environment

Page 26 of 49

Page 27 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

(34) Westerhuis, J.; Gurden, S.; Smilde, A. Generalized Contribution Plots in Multivariate Statistical Process Monitoring. Chemom. Intell. Lab. Syst. 2000, 51,95-114. (35) Alcala, C. F.; Qin, S. J. Reconstruction-based Contribution for Process Monitoring. Automatic 2009, 45, 1593-1600.

(36) Johansen, S.; Juselius, K. Maximum Likelihood Estimation and Inference on Cointegration with Applications to the Demand for Money. Oxford Bulletin of Economics and Statistics 1990, 52, 169-210.

(37) Dickey, D. A.; Fuller, W. A. Likelihood Ratio Statistics for Autoregressive Time Series with a Unit Root. Econometrics 1981, 49, 1057-1072. (38) Lowry, C. A.; Montgomery, D. C. A Review of Multivariate Control Charts. IIE Trans. 1995, 27, 800-810. (39) Tibshirani, R. Regression Shrinkage and Selection via Lasso. Journal of the Royal Statistical Society. Series B (Methodological) 1996, 58, 267-288.

(40) Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R. Least Angle Regression. Annals of Statistics 2004, 32, 407-499.

(41) Downs, J.; Vogel, E. A Plant-wide Industrial Process Control Problem. Comput. Chem. Eng. 1993, 17, 245-255.

(42) Chiang, L. H.; Jiang, B.; Zhu, X.; Huang, D.; Braatz, R. D. Diagnosis of Multiple and Unknown Faults Using the Causal Map and Multivariate Statistics. J. Process Control 2015, 28, 27-39. (43) Yin, S.; Ding, S. X.; Haghani, A.; Hao, H.; Zhang, P. A Comparison Study of Basic Data-driven Fault Diagnosis and Process Monitoring Methods on the Benchmark Tennessee Eastman Process. J. Process Control 2012, 22, 1567-1581. (44) Kong, X. B.; Liu, X. J.; Lee, K. Y. An Effective Nonlinear Multivariable HMPC for USC Power Plant Incorporating NFN-based Modeling. IEEE Trans. Industrial Information

27

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

2016, 12,555-566. (45) Chen, K. Y.; Chen, L. S.; Chen, M. C.; Lee, C. L. Using SVM Based Method for Equipment Fault Detection in a Thermal Power Plant. Computers in Industry 2011, 62, 42-50.

28

ACS Paragon Plus Environment

Page 28 of 49

Page 29 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

List of Figure Captions Figure 1 The flowchart of TEP. Figure 2 Fault detection results for Fault #10 of the TEP based on the cointegration analysis. Figure 3 (a) Selection frequency obtained by the proposed method for every variable in the first fifty fault samples of Fault #10. (b) Monitoring results evaluated by MRR after removing faulty variables. Figure 4 (a) Selection frequency obtained by the proposed method for every variable in the second fifty fault samples of Fault #10. (b) Monitoring results evaluated by MRR after removing faulty variables. Figure 5 Fault detection results for Fault #12 of the TEP based on the cointegration analysis. Figure 6 (a) Selection frequency obtained by the proposed method for every variable in the first fifty fault samples of Fault #12. (b) Monitoring results evaluated by MRR after removing faulty variables. Figure 7 (a) Selection frequency obtained by the proposed method for every variable in the second fifty fault samples of Fault #12. (b) Monitoring results evaluated by MRR after removing faulty variables. Figure 8 The plot of output power (kW). Figure 9 Flowchart of thermal power plant. Figure 10 Fault detection results for Case #1 in the thermal power plant based on the cointegration analysis. Figure 11 (a) Selection frequency obtained by the proposed method for every variables in the first fifty fault samples of Case #1. (b) Monitoring results evaluated by MRR after removing faulty variables. Figure 12 (a) Selection frequency obtained by the proposed method for every variable in the second fifty fault samples of Case #1. (b) Monitoring results evaluated by MRR after

29

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

removing faulty variables. Figure 13 Fault detection results for Case #2 in the thermal power plant based on the cointegration analysis. Figure 14 (a) Selection frequency obtained by the proposed method for every variable in the first fifty fault samples of Case #2. (b) Monitoring results evaluated by MRR after removing faulty variables. Figure 15 (a) Selection frequency obtained by the proposed method for every variable in the second fifty fault samples of Case #2. (b) Monitoring results evaluated by MRR after removing faulty variables.

30

ACS Paragon Plus Environment

Page 30 of 49

Page 31 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Fig. 1.The flowchart of TEP

31

ACS Paragon Plus Environment

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

T2

Industrial & Engineering Chemistry Research

Fig. 2. Fault detection results for Fault #10 of the TEP based on the cointegration analysis (blue dotted line: monitoring statistics; red dashed line: 95% monitoring control limit.)

32

ACS Paragon Plus Environment

Page 32 of 49

(a)

T2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Frequencies of selection for each variable

Page 33 of 49

(b)

Fig. 3. (a) Selection frequency obtained by the proposed method for every variable in the first fifty fault samples of Fault #10. (b) Monitoring results evaluated by MRR after removing faulty variables (blue dotted line: monitoring statistics after removing the faulty variables; red dashed line: 95% monitoring control limit).

33

ACS Paragon Plus Environment

(a)

T2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Frequencies of selection for each variable

Industrial & Engineering Chemistry Research

(b)

Fig. 4. (a) Selection frequency obtained by the proposed method for every variable in the second fifty fault samples of Fault #10. (b) Monitoring results evaluated by MRR after removing faulty variables (blue dotted line: monitoring statistics after removing the faulty variables; red dashed line: 95% monitoring control limit).

34

ACS Paragon Plus Environment

Page 34 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

T2

Page 35 of 49

Fig. 5. Fault detection results for Fault #12 of the TEP based on the cointegration analysis (blue dotted line: monitoring statistics; red dashed line: 95% monitoring control limit.)

35

ACS Paragon Plus Environment

(a)

T2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Frequencies of selection for each variable

Industrial & Engineering Chemistry Research

(b)

Fig. 6. (a) Selection frequency obtained by the proposed method for every variable in the first fifty fault samples of Fault #12. (b) Monitoring results evaluated by MRR after removing faulty variables (blue dotted line: monitoring statistics after removing the faulty variables; red dashed line: 95% monitoring control limit).

36

ACS Paragon Plus Environment

Page 36 of 49

(a)

T2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Frequencies of selection for each variable

Page 37 of 49

(b)

Fig. 7. (a) Selection frequency obtained by the proposed method for every variable in the second fifty fault samples of Fault #12. (b) Monitoring results evaluated by MRR after removing faulty variables (blue dotted line: monitoring statistics after removing the faulty variables; red dashed line: 95% monitoring control limit).

37

ACS Paragon Plus Environment

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Power (kW)

Industrial & Engineering Chemistry Research

Fig. 8. The plot of output power (kW).

38

ACS Paragon Plus Environment

Page 38 of 49

Page 39 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Fig. 9. Flowchart of thermal power plant.

39

ACS Paragon Plus Environment

T

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

2

Industrial & Engineering Chemistry Research

Fig. 10. Fault detection results for Case #1 in the thermal power plant based on the cointegration analysis. (blue dotted line: monitoring statistics; dashed line: 95% monitoring control limit.)

40

ACS Paragon Plus Environment

Page 40 of 49

2

(a)

T

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Frequencies of selection for each variable

Page 41 of 49

(b)

Fig. 11. (a) Selection frequency obtained by the proposed method for every variable in the first fifty fault samples of Case #1. (b) Monitoring results evaluated by MRR after removing faulty variables (blue dotted line: monitoring statistics after removing the faulty variables; red dashed line: 95% monitoring control limit).

41

ACS Paragon Plus Environment

(a)

T

2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Frequencies of selection for each variable

Industrial & Engineering Chemistry Research

(b)

Fig. 12. (a) Selection frequency obtained by the proposed method for every variable in the second fifty fault samples of Case #1. (b) Monitoring results evaluated by MRR after removing faulty variables (blue dotted line: monitoring statistics after removing the faulty variables; red dashed line: 95% monitoring control limit).

42

ACS Paragon Plus Environment

Page 42 of 49

T

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

2

Page 43 of 49

Fig. 13. Fault detection results for Case #2 in the thermal power plant based on the cointegration analysis. (blue dotted line: monitoring statistics; dashed line: 95% monitoring control limit.)

43

ACS Paragon Plus Environment

(a)

T

2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Frequencies of selection for each variable

Industrial & Engineering Chemistry Research

(b)

Fig. 14. (a) Selection frequency obtained by the proposed method for every variable in the first fifty fault samples of Case #2. (b) Monitoring results evaluated by MRR after removing faulty variables (blue dotted line: monitoring statistics after removing the faulty variables; red dashed line: 95% monitoring control limit).

44

ACS Paragon Plus Environment

Page 44 of 49

(a)

T2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Frequencies of selection for each variable

Page 45 of 49

(b)

Fig. 15. (a) Selection frequency obtained by the proposed method for every variable in the second fifty fault samples of Case #2. (b) Monitoring results evaluated by MRR after removing faulty variables (blue dotted line: monitoring statistics after removing the faulty variables; red dashed line: 95% monitoring control limit).

45

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

List of Tables Table 1 Variables measured in the TEP Table 2 Comparison of fault diagnosis results evaluated by MRR for the first fifty fault samples of Cases #1 and #2 at different threshold values ( α ) in thermal power plant

46

ACS Paragon Plus Environment

Page 46 of 49

Page 47 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Table 1 Variables measured in the TEP Variable No.

Variable Description

Variable No.

Variable Description

1

A feed (Stream 1)

27

Component E (Stream 6)

2

D feed (Stream 2)

28

Component F (Stream 6)

3

E feed (Stream 3)

29

Component A (Stream 9)

4

A and C feed (Stream 4)

30

Component B (Stream 9)

5

Recycle flow (Stream 8)

31

Component C (Stream 9)

6

Reactor feed rate (Stream 6)

32

Component D (Stream 9)

7

Reactor pressure

33

Component E (Stream 9)

8

Reactor level

34

Component F (Stream 9)

9

Reactor temperature

35

Component G (Stream 9)

10

Purge rate (Stream 9)

36

Component H (Stream 9)

11

Product separator temperature

37

Component D (Stream 11)

12

Product separator level

38

Component E (Stream 11)

13

Product separator pressure

39

Component F (Stream 11)

40

Component G (Stream 11)

14

Product separator underflow (Stream 10)

15

Stripper level

41

Component H (Stream 11)

16

Stripper pressure

42

MV to D feed flow (Stream 2)

43

MV to E feed flow (Stream 3)

17

Stripper separator underflow (Stream 11)

18

Stripper temperature

44

MV to A feed flow (Stream 1)

19

Stripper steam flow

45

MV to total feed flow (Stream 4)

20

Compressor work

46

Compressor recycle valve

47

Purge valve (Stream 9)

21

22

Reactor cooling water outlet temperature Stripper cooling water outlet

48

temperature

Separator pot liquid flow (Stream 10) Stripper liquid product flow

23

Component A (Stream 6)

49

24

Component B (Stream 6)

50

Stripper steam valve

25

Component C (Stream 6)

51

Reactor cooling water flow

26

Component D (Stream 6)

52

Condenser cooling water flow

47

ACS Paragon Plus Environment

(Stream 11)

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 2 Comparison of fault diagnosis results evaluated by MRR for the first fifty fault samples of Cases #1 and #2 at different threshold values ( α ) in thermal power plant

α The proposed method RBC approach 30% 2(30%)a 5(28%) Case #1 20% 5(12%) 9(16%) 10% 9(4%) 10(8%) 30% 12(8%) 43(2%) Case #2 20% 12(8%) 43(2%) 10% 12(8%) 43(2%) a A(B) A is the number of removed variables; B is the value of missing reconstruction ratio (MRR). Fault Case

48

ACS Paragon Plus Environment

Page 48 of 49

Page 49 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Industrial & Engineering Chemistry Research

Table of Contents

For Table of Contents Only 49

ACS Paragon Plus Environment