and Multivariate Granger Causality (MVGC) - American Chemical

Jun 2, 2017 - Moreover, the multivariate Granger causality (MVGC) algorithm is used to analyze the causal relationship among the variables obtained fr...
0 downloads 0 Views 1MB Size
Subscriber access provided by Binghamton University | Libraries

Article

Estimation of disturbance propagation path using principal component analysis (PCA) and multivariate granger causality (MVGC) techniques Usama Ahmed, Daegeun Ha, Seolin Shin, Nadeem Shaukat, Umer Zahid, and Chonghun Han Ind. Eng. Chem. Res., Just Accepted Manuscript • DOI: 10.1021/acs.iecr.6b02763 • Publication Date (Web): 02 Jun 2017 Downloaded from http://pubs.acs.org on June 8, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Industrial & Engineering Chemistry Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 47

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Manuscript for Industrial & Engineering Chemistry Research

1

2

3

Estimation of disturbance propagation path using principal

4

component analysis (PCA) and multivariate granger causality

5

(MVGC) techniques

6

Usama Ahmed a, Daegeun Ha a, Seolin Shin a, Nadeem Shaukat b, Umer Zahid c, Chonghun

7

Han +a

8 9 10

a

School of Chemical and Biological Engineering, Seoul National University, Seoul 151-744, Republic of Korea

11 b

12

Department of Nuclear Engineering, Seoul National University, Seoul 151-744, Republic of Korea

13 c

14

Chemical Engineering Department, King Fahd University of Petroleum & Minerals, Dhahran, Saudi Arabia.

15 16 17 18 +a

19

Correspondance : Chonghun Han ([email protected])

20 21

+

22

[email protected]

To whom correspondence should be addressed. Phone: +82-02-880-1887. Fax: +82-02-880-1887. Email:

1 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 47

Manuscript for Industrial & Engineering Chemistry Research

1

Abstract:

2

Process monitoring and fault diagnosis using the multivariate statistical methodologies has been

3

extensively used in the process and product development industries for the last many decades.

4

The fault in one process variable readily affects all the other associated variables which not only

5

makes the fault detection process more difficult but also time consuming. In this study, PCA

6

based fault amplification algorithm is developed to detect both the root cause of fault and the

7

fault propagation path in the system. The developed algorithm project the samples on the

8

residual subspace (RS) to determine the disturbance propagation path. Usually, the RS of the

9

fault data is superimposed with the normal process variations which should be minimized to

10

amplify the fault magnitude. The RS containing amplified fault is then converted into the co-

11

variance matrix followed by singular value decomposition (SVD) analysis which in turn

12

generates the fault direction matrix corresponding to the largest eigenvalue. The fault variables

13

are then re-arranged according to their magnitude of contribution towards a fault which in turn

14

represents the fault propagation path using an absolute descending order functions. Moreover,

15

the multivariate granger causality (MVGC) algorithm is used to analyze the causal relationship

16

among the variables obtained from the developed algorithm. Both the methodologies are tested

17

on the LNG fractionation process train and distillation column operation where some fault case

18

scenarios are assumed to estimate the fault directions. It is observed that the hierarchy of

19

variables obtained from fault propagation path algorithm are in good agreement with the MVGC

20

algorithm. Therefore, fault amplification methodology can be used in industrial systems for

21

identifying the root cause of fault as well as the fault propagation path.

22

Keywords: Principal Component Analysis (PCA), Multi variate granger causality (MVGC),

23

Singular Value Decomposition (SVD), Fault detection, Fault propagation path estimation 2 ACS Paragon Plus Environment

Page 3 of 47

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Manuscript for Industrial & Engineering Chemistry Research

1

2

1. Introduction

3

With the developments in the automation and control systems in the process and product

4

industries, it is possible to collect enormous amount of data. However, analysis and interpretation

5

of data is a key issue. Using various statistical tools, it is possible to quickly analyze the data to

6

enhance both the process performance and reduce industrial waste, thereby, improving process

7

economics. Online and offline process monitoring techniques are widely used in the industries to

8

statistically analyze the process behavior. To ensure the smooth and safe operation of chemical

9

plants, large number of sensors are usually used to record and analyze the data. However, with

10

an increase in number of sensors, the chances for sensor faults in addition to the process faults

11

have also been increased. Moreover, the occurrence of any fault in a system affects all the

12

associated variables and disturbs their normal correlations which makes it difficult to detect the

13

actual root cause of fault. Therefore, instant fault detection through analyzing the root cause and

14

estimating fault propagation path in the system has always remained a key issue in process

15

monitoring.

16

Data-driven1 models have been widely used in semiconductor manufacturing2, chemical3 and

17

steel industries4 exhibiting multi-level control hierarchy during the last few decades. Several

18

univariate and multivariate statistical tools have been developed that can be efficiently used for

19

both the fault detection and diagnosis. However, multivariate statistical methodologies are more

20

preferred over the univariate techniques for analyzing the complex industrial system where

21

process variables exhibits a strong correlations. Principal component analysis (PCA) is one of the

22

effective multivariate statistical technique that finds its applications for process monitoring and

3 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 47

Manuscript for Industrial & Engineering Chemistry Research

1

control5, 6, fault detection and diagnosis7, 8, and sensor validation9, 10 in various process industries.

2

The PCA converts the higher dimensional correlated data into lower dimensional un-correlated

3

data while retaining most of the original information. It can linearly reduce the dimensionality

4

while ignoring the nonlinearities in the process data. PCA models the process behavior in terms

5

of process variables during normal operation and compares the variation in those variables

6

during fault situation. Hotelling T2 statistics and Q-statistic (squared prediction error (SPE)) are

7

the two fault detection indices commonly used for analyzing the variation in process variables.

8

The T2 statistics represents the systematic part of process variation in principal component

9

subspace (PCS), whereas, SPE shows the residual part of the process variation in the residual

10

subspace (RS). The T2 statistics and SPE can be calculated for each sample and compared with

11

the confidence limits to monitor the process faults. The occurrence of any fault in the process

12

either affects the T2 statistic or SPE of the samples or even changes both the statistics in some

13

cases. Usually both the indices exceed their critical values during fault situation, whereas, the

14

process is considered to be normal if both the indices remain under the control limits. T2

15

statistics is usually used for overall process monitoring as it measures the variation among

16

samples and indicate their distance from the center of the model. As multiple sensors are

17

simultaneously affected by a certain fault, the contribution plot of samples shows the

18

involvement of many variables towards a fault that makes it difficult to identify the actual fault

19

variable. On the contrary, SPE measures the difference/residuals between the sample and its

20

projection on the model. It measures the sum of variations in the RS which is not explained by

21

the principle components (PC). SPE is highly sensitive to even minor process variations due to

22

its smaller value compared to the T2 statistic11. Qin etl al.12 also performed the detailed

4 ACS Paragon Plus Environment

Page 5 of 47

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Manuscript for Industrial & Engineering Chemistry Research

1

comparison for both the indices and showed that SPE can detect the fault more readily compare

2

to the T2 statistics.

3

The PCA models combined with the fault detection indices finds it applications in various

4

industries12-17 to monitor the process variations and diagnose process faults. For an instance,

5

Pehna et al.18 applied the PCA model to the nuclear power plant and used the fault detection

6

indices to monitor the temperature variation in the nuclear reactors. Similarly, Ferrer19 used the

7

T2 statistics to analyze the process shift in the automobile manufacturing industries. Landells et

8

al.20 used the PCA methodology along with the statistical indices for early fault detection in

9

refineries and other chemical production industries.

10

Several studies have relied on PCA models and used fault detection indices for early fault

11

detection in the industrial systems. However, little attention has been paid towards identifying

12

the fault propagation path in addition to the fault detection. As the fault in one variable affects

13

the other associated variables, it is important to estimate the fault propagation path direction

14

moving across the variables. Recently, Hong et al. 21-23 developed the fault propagation path

15

estimation algorithms based on the progressive PCA models. The algorithm uses the SPE

16

contribution plots to detect the variable having high contribution and identifies it as the fault

17

variable. The detected variable is then eliminated and new PCA model is developed to identify

18

the next variable. The detection of fault variable from each model in turns represent the order of

19

variables affected by a certain fault. This methodology can be applied to the small systems with

20

limited number of variables, however, it can become more complex and time consuming with an

21

increase in number of process variables or unit processes.

22

Granger causality (GC) algorithms24 based on time series hypothesis is another tool that can

23

evaluate the cause and effect relationship among variables. Developing the logical interpretation 5 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 47

Manuscript for Industrial & Engineering Chemistry Research

1

of causal analysis along with the process knowledge can help in detecting the root cause of the

2

faults. GC algorithms not only find its application in process and energy industries25-27 but also in

3

economics28 due to its ease of implementation and reliable interpretation of results. It uses a

4

statistical hypothesis to predict whether one time series can affect the other time series or not.

5

Additionally, the PCA models can be combined with the GC algorithms for efficient process

6

monitoring. For instance, Li et al.29 developed the framework of locating the root cause of fault

7

using both PCA and GC algorithms. Similarly, Ladman et al.27 integrates the process causality

8

and topology information to develop a causal model that determine the disturbance propagation

9

path. Moreover, Yuan et al.26 used the multilevel GC framework with clustering techniques for

10

determining the causal relationship among plant-wide oscillatory variables. Ladman et al.30 also

11

compared various statistical techniques for different industrial process and suggested that only

12

one specific method is not powerful enough to predict the actual causal and effect relationship.

13

Therefore, several techniques should be used in addition with the process knowledge to obtain

14

successful results.

15

The PCA models usually require a training data set for developing the model and to determine

16

confidence limits for the fault detection indices. Then the real plant data or fault data can be used

17

for detecting and diagnosing the fault. On the other hand, GC uses only the time series

18

information of variables irrespective of the normal or fault data to evaluate the causal

19

relationship among variables. The aim of this study is to develop a robust algorithm that can

20

predict the fault propagation path in a system while verifying the causal relationship among those

21

fault variables. Therefore, the fault amplification methodology is first applied to the conventional

22

PCA model to amplify the fault magnitude to identify the disturbance propagation path in terms

23

of process variables. Then, the standalone multivariate GC (MVGC) methodology is used to 6 ACS Paragon Plus Environment

Page 7 of 47

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Manuscript for Industrial & Engineering Chemistry Research

1

identify the cause and effect relationship among the fault variables. The results from both the

2

models are then compared to ensure the reliability of the fault propagation path obtained and

3

verifying the causal relationship among variables at the same time. The developed models are

4

applied to the LNG fractionation process and distillation columns operation where some of the

5

common fault case scenarios are assumed to estimate the fault directions.

6

The paper is divided into four major sections. First section briefly explains the PCA model

7

development process and the methodology for estimating the fault propagation path. The

8

following section explains the time series multi-variate granger causality algorithm to determine

9

the causal analysis among variables. The third section explains the LNG fractionation process

10

taken as an exemplary case and discusses the results for the simulated fault case scenarios to

11

determine the fault propagation path. Finally, the last section embodies the conclusion of the

12

paper.

13

2. Fault detection based on Principal Component Analysis (PCA) methodology

14

2.1. Concept of the PCA approach

15

PCA model is developed in this study using Matlab® which is a commercial tool that contains

16

large number of build-in mathematical functions to handle large dimensional matrices. PCA

17

model uses the redundant information in the data to linearly reduce the set of correlated variables

18

into fewer principal components (PC’s) that are not correlated. PCA model requires a data

19

matrix X ε ℜ  × 

20

where n and m represent the number of samples and corresponding variables, respectively.

21

Usually, a large number of samples are included in the training data which are scaled to zero

22

mean and unit variance to capture the normal process variations. The scaled data matrix X given

7 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 47

Manuscript for Industrial & Engineering Chemistry Research

1 2

in equation 1 is used to develop the PCA model where  represents the   sample of m variables.

3

= [  … . .  ] ε ℜ  × 

4

Two types of alternative techniques can be used to generate the principal components. First one

5

is the singular value decomposition (SVD) method that directly decomposes the data matrix into

6 7

(1)

score matrix T ε ℜ  ×  , loading matrix P ε ℜ  ×  and residual matrix E as given in equation 2, where l represents the number of PC’s13. The principle component scores (T) and loadings (P) in

8

turn represents the samples and variables relationship, respectively.

9 10

  =  +  =  + 

11

matrices represents the variation and noise in the system18, respectively. Moreover, the residual

12 13

(2)

T holds the linear combination of matrix X with the vector P such that T = X P where  and E  and  , matrix can be further decomposed into residual scores and loadings represented as  respectively. Usually, the number of PC’s are equal to the number of variables (m) included in

14

the training data set. Some of the common methodologies available to shortlist the number of

15

PC’s includes scree test, parallel analysis, percent variance test and residual sum of square

16

statistics. As no fix criteria is available to use any specific technique13, we have used the percent

17

variance technique to select only those PC’s that could explain the cumulative variance (90-99%)

18

of data. The alternate method is to find the covariance of the data matrix X followed by the

19

application of eigenvalue decomposition as given in equation 3.

20

 =  

(3)

8 ACS Paragon Plus Environment

Page 9 of 47

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Manuscript for Industrial & Engineering Chemistry Research

1

The eigenvalue decomposition generates the P (principal components) and the diagonal matrix

2

enclosing eigenvalues arranged in the descending order corresponding to the l leading

3

eigenvectors as represented in equation 4.

4

Λ = diag {λ! , λ# , λ$ , . . . . , λ }

5 6

7

8

9

(4)

The sample vector x ε ℜ  can be projected on the principal component subspace (PCS) and residual subspace (RS) as shown in equation 5 and equation 6, respectively. % = &≡  ∴ & =  ' = − % ≡ (I -  )

(5) (6)

%+ ' =

(7)

10

2.2 Fault detection algorithm

11

2.2.1

12

The most important part of the process monitoring is the fault detection. Hotelling’s T2 and SPE

13

indices measures the variability in the PCS and RS, respectively. T2 statistic measures the

14

variation of each sample in the PCA model, whereas, SPE measures the distance between sample

15

and its projection on to the model. The statistical limits are developed for both the indices and

16

are compared with each sample. The Hotelling’s T2 statistic can be measured for each sample x

17

as given in equation 8, where ∑* represents the eigenvalues corresponding to the P loading

18

vectors.

Measure of variability using fault detection indices

19

 =  +∑, ) 

20

The control limit for the T2 statistic exhibiting multi-variate normal distribution can be calculated

21

using frequency-distribution as shown in equation 9.

(8)

9 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 47

Manuscript for Industrial & Engineering Chemistry Research

1

# -*,, α =

2

Where a, n and α represents the number of principal components, number of samples and level

3

of significance. On the other hand, the SPE of each sample can be calculated by using equation

4

10 where I represents the identity matrix.

*+!) *

.*,*,α

(9)

5

' ∥ ≡ ∥ +3 −  ) ∥2 SPE =∥

6

Jackseon et al.31 developed the methodology for determining the control limits of SPE as given

7

in equations 11-14, where α and δα # denotes the level of significance and upper control limit,

8

respectively.

9

456 ≤ δα #

(11)

ℎ: ;α