Subscriber access provided by Binghamton University | Libraries
Article
Estimation of disturbance propagation path using principal component analysis (PCA) and multivariate granger causality (MVGC) techniques Usama Ahmed, Daegeun Ha, Seolin Shin, Nadeem Shaukat, Umer Zahid, and Chonghun Han Ind. Eng. Chem. Res., Just Accepted Manuscript • DOI: 10.1021/acs.iecr.6b02763 • Publication Date (Web): 02 Jun 2017 Downloaded from http://pubs.acs.org on June 8, 2017
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Industrial & Engineering Chemistry Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 47
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
Manuscript for Industrial & Engineering Chemistry Research
1
2
3
Estimation of disturbance propagation path using principal
4
component analysis (PCA) and multivariate granger causality
5
(MVGC) techniques
6
Usama Ahmed a, Daegeun Ha a, Seolin Shin a, Nadeem Shaukat b, Umer Zahid c, Chonghun
7
Han +a
8 9 10
a
School of Chemical and Biological Engineering, Seoul National University, Seoul 151-744, Republic of Korea
11 b
12
Department of Nuclear Engineering, Seoul National University, Seoul 151-744, Republic of Korea
13 c
14
Chemical Engineering Department, King Fahd University of Petroleum & Minerals, Dhahran, Saudi Arabia.
15 16 17 18 +a
19
Correspondance : Chonghun Han (
[email protected])
20 21
+
22
[email protected] To whom correspondence should be addressed. Phone: +82-02-880-1887. Fax: +82-02-880-1887. Email:
1 ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 2 of 47
Manuscript for Industrial & Engineering Chemistry Research
1
Abstract:
2
Process monitoring and fault diagnosis using the multivariate statistical methodologies has been
3
extensively used in the process and product development industries for the last many decades.
4
The fault in one process variable readily affects all the other associated variables which not only
5
makes the fault detection process more difficult but also time consuming. In this study, PCA
6
based fault amplification algorithm is developed to detect both the root cause of fault and the
7
fault propagation path in the system. The developed algorithm project the samples on the
8
residual subspace (RS) to determine the disturbance propagation path. Usually, the RS of the
9
fault data is superimposed with the normal process variations which should be minimized to
10
amplify the fault magnitude. The RS containing amplified fault is then converted into the co-
11
variance matrix followed by singular value decomposition (SVD) analysis which in turn
12
generates the fault direction matrix corresponding to the largest eigenvalue. The fault variables
13
are then re-arranged according to their magnitude of contribution towards a fault which in turn
14
represents the fault propagation path using an absolute descending order functions. Moreover,
15
the multivariate granger causality (MVGC) algorithm is used to analyze the causal relationship
16
among the variables obtained from the developed algorithm. Both the methodologies are tested
17
on the LNG fractionation process train and distillation column operation where some fault case
18
scenarios are assumed to estimate the fault directions. It is observed that the hierarchy of
19
variables obtained from fault propagation path algorithm are in good agreement with the MVGC
20
algorithm. Therefore, fault amplification methodology can be used in industrial systems for
21
identifying the root cause of fault as well as the fault propagation path.
22
Keywords: Principal Component Analysis (PCA), Multi variate granger causality (MVGC),
23
Singular Value Decomposition (SVD), Fault detection, Fault propagation path estimation 2 ACS Paragon Plus Environment
Page 3 of 47
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
Manuscript for Industrial & Engineering Chemistry Research
1
2
1. Introduction
3
With the developments in the automation and control systems in the process and product
4
industries, it is possible to collect enormous amount of data. However, analysis and interpretation
5
of data is a key issue. Using various statistical tools, it is possible to quickly analyze the data to
6
enhance both the process performance and reduce industrial waste, thereby, improving process
7
economics. Online and offline process monitoring techniques are widely used in the industries to
8
statistically analyze the process behavior. To ensure the smooth and safe operation of chemical
9
plants, large number of sensors are usually used to record and analyze the data. However, with
10
an increase in number of sensors, the chances for sensor faults in addition to the process faults
11
have also been increased. Moreover, the occurrence of any fault in a system affects all the
12
associated variables and disturbs their normal correlations which makes it difficult to detect the
13
actual root cause of fault. Therefore, instant fault detection through analyzing the root cause and
14
estimating fault propagation path in the system has always remained a key issue in process
15
monitoring.
16
Data-driven1 models have been widely used in semiconductor manufacturing2, chemical3 and
17
steel industries4 exhibiting multi-level control hierarchy during the last few decades. Several
18
univariate and multivariate statistical tools have been developed that can be efficiently used for
19
both the fault detection and diagnosis. However, multivariate statistical methodologies are more
20
preferred over the univariate techniques for analyzing the complex industrial system where
21
process variables exhibits a strong correlations. Principal component analysis (PCA) is one of the
22
effective multivariate statistical technique that finds its applications for process monitoring and
3 ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 4 of 47
Manuscript for Industrial & Engineering Chemistry Research
1
control5, 6, fault detection and diagnosis7, 8, and sensor validation9, 10 in various process industries.
2
The PCA converts the higher dimensional correlated data into lower dimensional un-correlated
3
data while retaining most of the original information. It can linearly reduce the dimensionality
4
while ignoring the nonlinearities in the process data. PCA models the process behavior in terms
5
of process variables during normal operation and compares the variation in those variables
6
during fault situation. Hotelling T2 statistics and Q-statistic (squared prediction error (SPE)) are
7
the two fault detection indices commonly used for analyzing the variation in process variables.
8
The T2 statistics represents the systematic part of process variation in principal component
9
subspace (PCS), whereas, SPE shows the residual part of the process variation in the residual
10
subspace (RS). The T2 statistics and SPE can be calculated for each sample and compared with
11
the confidence limits to monitor the process faults. The occurrence of any fault in the process
12
either affects the T2 statistic or SPE of the samples or even changes both the statistics in some
13
cases. Usually both the indices exceed their critical values during fault situation, whereas, the
14
process is considered to be normal if both the indices remain under the control limits. T2
15
statistics is usually used for overall process monitoring as it measures the variation among
16
samples and indicate their distance from the center of the model. As multiple sensors are
17
simultaneously affected by a certain fault, the contribution plot of samples shows the
18
involvement of many variables towards a fault that makes it difficult to identify the actual fault
19
variable. On the contrary, SPE measures the difference/residuals between the sample and its
20
projection on the model. It measures the sum of variations in the RS which is not explained by
21
the principle components (PC). SPE is highly sensitive to even minor process variations due to
22
its smaller value compared to the T2 statistic11. Qin etl al.12 also performed the detailed
4 ACS Paragon Plus Environment
Page 5 of 47
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
Manuscript for Industrial & Engineering Chemistry Research
1
comparison for both the indices and showed that SPE can detect the fault more readily compare
2
to the T2 statistics.
3
The PCA models combined with the fault detection indices finds it applications in various
4
industries12-17 to monitor the process variations and diagnose process faults. For an instance,
5
Pehna et al.18 applied the PCA model to the nuclear power plant and used the fault detection
6
indices to monitor the temperature variation in the nuclear reactors. Similarly, Ferrer19 used the
7
T2 statistics to analyze the process shift in the automobile manufacturing industries. Landells et
8
al.20 used the PCA methodology along with the statistical indices for early fault detection in
9
refineries and other chemical production industries.
10
Several studies have relied on PCA models and used fault detection indices for early fault
11
detection in the industrial systems. However, little attention has been paid towards identifying
12
the fault propagation path in addition to the fault detection. As the fault in one variable affects
13
the other associated variables, it is important to estimate the fault propagation path direction
14
moving across the variables. Recently, Hong et al. 21-23 developed the fault propagation path
15
estimation algorithms based on the progressive PCA models. The algorithm uses the SPE
16
contribution plots to detect the variable having high contribution and identifies it as the fault
17
variable. The detected variable is then eliminated and new PCA model is developed to identify
18
the next variable. The detection of fault variable from each model in turns represent the order of
19
variables affected by a certain fault. This methodology can be applied to the small systems with
20
limited number of variables, however, it can become more complex and time consuming with an
21
increase in number of process variables or unit processes.
22
Granger causality (GC) algorithms24 based on time series hypothesis is another tool that can
23
evaluate the cause and effect relationship among variables. Developing the logical interpretation 5 ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 6 of 47
Manuscript for Industrial & Engineering Chemistry Research
1
of causal analysis along with the process knowledge can help in detecting the root cause of the
2
faults. GC algorithms not only find its application in process and energy industries25-27 but also in
3
economics28 due to its ease of implementation and reliable interpretation of results. It uses a
4
statistical hypothesis to predict whether one time series can affect the other time series or not.
5
Additionally, the PCA models can be combined with the GC algorithms for efficient process
6
monitoring. For instance, Li et al.29 developed the framework of locating the root cause of fault
7
using both PCA and GC algorithms. Similarly, Ladman et al.27 integrates the process causality
8
and topology information to develop a causal model that determine the disturbance propagation
9
path. Moreover, Yuan et al.26 used the multilevel GC framework with clustering techniques for
10
determining the causal relationship among plant-wide oscillatory variables. Ladman et al.30 also
11
compared various statistical techniques for different industrial process and suggested that only
12
one specific method is not powerful enough to predict the actual causal and effect relationship.
13
Therefore, several techniques should be used in addition with the process knowledge to obtain
14
successful results.
15
The PCA models usually require a training data set for developing the model and to determine
16
confidence limits for the fault detection indices. Then the real plant data or fault data can be used
17
for detecting and diagnosing the fault. On the other hand, GC uses only the time series
18
information of variables irrespective of the normal or fault data to evaluate the causal
19
relationship among variables. The aim of this study is to develop a robust algorithm that can
20
predict the fault propagation path in a system while verifying the causal relationship among those
21
fault variables. Therefore, the fault amplification methodology is first applied to the conventional
22
PCA model to amplify the fault magnitude to identify the disturbance propagation path in terms
23
of process variables. Then, the standalone multivariate GC (MVGC) methodology is used to 6 ACS Paragon Plus Environment
Page 7 of 47
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
Manuscript for Industrial & Engineering Chemistry Research
1
identify the cause and effect relationship among the fault variables. The results from both the
2
models are then compared to ensure the reliability of the fault propagation path obtained and
3
verifying the causal relationship among variables at the same time. The developed models are
4
applied to the LNG fractionation process and distillation columns operation where some of the
5
common fault case scenarios are assumed to estimate the fault directions.
6
The paper is divided into four major sections. First section briefly explains the PCA model
7
development process and the methodology for estimating the fault propagation path. The
8
following section explains the time series multi-variate granger causality algorithm to determine
9
the causal analysis among variables. The third section explains the LNG fractionation process
10
taken as an exemplary case and discusses the results for the simulated fault case scenarios to
11
determine the fault propagation path. Finally, the last section embodies the conclusion of the
12
paper.
13
2. Fault detection based on Principal Component Analysis (PCA) methodology
14
2.1. Concept of the PCA approach
15
PCA model is developed in this study using Matlab® which is a commercial tool that contains
16
large number of build-in mathematical functions to handle large dimensional matrices. PCA
17
model uses the redundant information in the data to linearly reduce the set of correlated variables
18
into fewer principal components (PC’s) that are not correlated. PCA model requires a data
19
matrix X ε ℜ ×
20
where n and m represent the number of samples and corresponding variables, respectively.
21
Usually, a large number of samples are included in the training data which are scaled to zero
22
mean and unit variance to capture the normal process variations. The scaled data matrix X given
7 ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 8 of 47
Manuscript for Industrial & Engineering Chemistry Research
1 2
in equation 1 is used to develop the PCA model where represents the sample of m variables.
3
= [ … . . ] ε ℜ ×
4
Two types of alternative techniques can be used to generate the principal components. First one
5
is the singular value decomposition (SVD) method that directly decomposes the data matrix into
6 7
(1)
score matrix T ε ℜ × , loading matrix P ε ℜ × and residual matrix E as given in equation 2, where l represents the number of PC’s13. The principle component scores (T) and loadings (P) in
8
turn represents the samples and variables relationship, respectively.
9 10
= + = +
11
matrices represents the variation and noise in the system18, respectively. Moreover, the residual
12 13
(2)
T holds the linear combination of matrix X with the vector P such that T = X P where and E and , matrix can be further decomposed into residual scores and loadings represented as respectively. Usually, the number of PC’s are equal to the number of variables (m) included in
14
the training data set. Some of the common methodologies available to shortlist the number of
15
PC’s includes scree test, parallel analysis, percent variance test and residual sum of square
16
statistics. As no fix criteria is available to use any specific technique13, we have used the percent
17
variance technique to select only those PC’s that could explain the cumulative variance (90-99%)
18
of data. The alternate method is to find the covariance of the data matrix X followed by the
19
application of eigenvalue decomposition as given in equation 3.
20
=
(3)
8 ACS Paragon Plus Environment
Page 9 of 47
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
Manuscript for Industrial & Engineering Chemistry Research
1
The eigenvalue decomposition generates the P (principal components) and the diagonal matrix
2
enclosing eigenvalues arranged in the descending order corresponding to the l leading
3
eigenvectors as represented in equation 4.
4
Λ = diag {λ! , λ# , λ$ , . . . . , λ }
5 6
7
8
9
(4)
The sample vector x ε ℜ can be projected on the principal component subspace (PCS) and residual subspace (RS) as shown in equation 5 and equation 6, respectively. % = &≡ ∴ & = ' = − % ≡ (I - )
(5) (6)
%+ ' =
(7)
10
2.2 Fault detection algorithm
11
2.2.1
12
The most important part of the process monitoring is the fault detection. Hotelling’s T2 and SPE
13
indices measures the variability in the PCS and RS, respectively. T2 statistic measures the
14
variation of each sample in the PCA model, whereas, SPE measures the distance between sample
15
and its projection on to the model. The statistical limits are developed for both the indices and
16
are compared with each sample. The Hotelling’s T2 statistic can be measured for each sample x
17
as given in equation 8, where ∑* represents the eigenvalues corresponding to the P loading
18
vectors.
Measure of variability using fault detection indices
19
= +∑, )
20
The control limit for the T2 statistic exhibiting multi-variate normal distribution can be calculated
21
using frequency-distribution as shown in equation 9.
(8)
9 ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 10 of 47
Manuscript for Industrial & Engineering Chemistry Research
1
# -*,, α =
2
Where a, n and α represents the number of principal components, number of samples and level
3
of significance. On the other hand, the SPE of each sample can be calculated by using equation
4
10 where I represents the identity matrix.
*+!) *
.*,*,α
(9)
5
' ∥ ≡ ∥ +3 − ) ∥2 SPE =∥
6
Jackseon et al.31 developed the methodology for determining the control limits of SPE as given
7
in equations 11-14, where α and δα # denotes the level of significance and upper control limit,
8
respectively.
9
456 ≤ δα #
(11)
ℎ: ;α