Efficient Process Monitoring via the Integrated Use of Markov Random

Sep 5, 2018 - Efficient Process Monitoring via the Integrated Use of Markov Random Fields Learning and the Graphical Lasso. Changsoo Kim , Hodong Lee ...
0 downloads 0 Views 2MB Size
Subscriber access provided by UNIV OF DURHAM

Process Systems Engineering

Efficient process monitoring via the integrated use of Markov random fields learning and the graphical lasso Changsoo Kim, Hodong Lee, Kyeongsu Kim, Younggeun Lee, and Won Bo Lee Ind. Eng. Chem. Res., Just Accepted Manuscript • DOI: 10.1021/acs.iecr.8b02106 • Publication Date (Web): 05 Sep 2018 Downloaded from http://pubs.acs.org on September 5, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Efficient process monitoring via the integrated use of Markov random fields learning and the graphical lasso Changsoo Kim, Hodong Lee, Kyeongsu Kim, Younggeun Lee, and Won Bo Lee∗ School of Chemical and Biological Engineering, Institute of Chemical Processes, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea E-mail: [email protected] Abstract Process monitoring is an important aspect of safe operation of process plants. Various methods exist which monitor the process using data-driven methods, but they all have certain limitations. For instance, most of the fault detection methods are not able to detect the fault propagation path, and some methods require a priori knowledge on the faults, or the relationships between the monitored variables. In this study, a monitoring method for accurately detecting the faults and analyzing the fault propagation path is proposed. Named the Glasso-MRF monitoring framework, this method integrates the use of the graphical lasso algorithm (G-lasso) and the Markov random field (MRF) modeling framework to divide the monitored variables into relevant groups and then detect the faults separately for each of the groups. Graphical lasso uses the lasso constraint on the inverse covariance matrix of variables within the maximum likelihood estimation problem, driving it to be of sparse form. The use of graphical lasso downsizes the process into groups that are highly correlated, relieving the computational complexity of the MRF-based monitoring so that the

1

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

process can be efficiently monitored, and enabling the fault propagation path to be identified. MRF modeling can extensively model the variable relationships including cyclic structures, and can be obtained without a priori knowledge of the relationships between variables, using the iterative graphical lasso algorithm proposed in this study. The inference of MRFs are usually complex due to the existence of partition functions, but by down-sizing the system using iterative G-lasso, this problem is resolved as well. The proposed method was applied to the well-known Tennessee Eastman process to evaluate its performance. The detection accuracy of the Glasso-MRF monitoring framework was higher than any other state-of-the-art monitoring methods, including auto-encoders and Bayesian networks, showing more than 95% fault detection accuracy for all of the 28 faults programmed within the Tennessee Eastman process. Also, the fault propagation path could be detected according to the difference in fault detection time of the divided groups, providing enhanced analysis of the initiated fault. These results prove that, the proposed methodology can effectively detect the fault as well as show its propagation throughout the process, without any a priori knowledge of the process variables.

Introduction Process monitoring, or fault detection and isolation (FDI), is an essential part of product quality management and safe plant operation. There exist various methods for FDI, which can be categorized into three approaches, namely the analytical, knowledge-based, and data-driven methods. 1 Of these methods, data-driven methods are the most widely used in the industry, since they do not require a priori knowledge of faults and show good performance in terms of speed and accuracy. Data-driven methods make use of historical data to develop monitoring statistics, providing limits to the operation of process variables and detecting a data point as a fault when the monitoring limit is violated. Conventional data-driven monitoring methods include the use of dimensionality reduction

2

ACS Paragon Plus Environment

Page 2 of 35

Page 3 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

techniques such as principal component analysis (PCA), partial least squares (PLS), and Fisher discriminant analysis (FDA). Although widely used, the data-driven methods have certain aspects that limit their usage in terms of FDI. PCA and PLS have limitations in dealing with nonlinearity of the data, since they reduce the dimensionality of the variables upon the assumption of linearity. Also, it is difficult to isolate a fault using these methods, since the individual characteristics of the variables are simplified into reduced dimensions. Various studies suggest different dimensionality reduction techniques to resolve this problem, such as the use of kernel PCA, independent component analysis (ICA), and autoencoders. Kernel PCA incorporates the use of kernel space when reducing the dimensionality of the variable space, so that nonlinear relationships between variables can be considered more rigorously within the kernel space. 2,3 ICA 4,5 makes use of statistically independent components that contain higher-order statistical information between variables. Autoencoders are a recently developed unsupervised machine learning technique, where a nonlinear activation function is used within encoders and decoders to effectively extract useful features from multiple variables. 6 All of these are useful in developing more appropriate monitoring statistics for nonlinear process variables, and succeed in enhancing the fault detection probabilities compared to conventional methods. However, they are not able to effectively isolate the fault, or provide a fault propagation path. Apart from the studies that apply different dimensionality reduction techniques to deal with nonlinearity of the variables, other studies suggest the use of sparse dimensionality reduction techniques or knowledge-based methods to overcome fault isolation problems. Sparse forms of dimensionality reduction allow better understanding of the relationships between variables, leading to good performance in fault isolation. Studies by Gajjar et al. 7 and Luo et al. 8 make use of sparse PCA for FDI. More complex forms of sparse dimensionality reduction techniques are incorporated as well. These studies include the work by Luo et al., 9 where process knowledge is used to construct a knowledge-based

3

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

sparse projection matrix, and the work by Bao et al., 10 where sparse global-local preserving projections are used for FDI. While these methods are better suited for fault isolation compared to conventional dimensionality reduction techniques, they show only little improvement in fault detection accuracy, owing to the fact that they share the limitations of conventional methods, such as linear dimensionality reduction. Studies on knowledge-based FDI include the use of neural-networks, support vector machines (SVM), and the Bayesian network. Neural-networks and SVMs have become more useful in the recent years owing to the boost in computational power, and studies such as the work by Chiang et al. 11 make use of these methods to achieve good monitoring performance. Bayesian networks have recently been used in FDI, such as in studies by Verron et al., 12 Gonzalez et al., 13 and Zeng et al. 14 Bayesian networks model the process to be monitored into directed graphical models, and use the causal relationships among variables to effectively isolate the fault and analyze the fault propagation path. While these studies have shown great progress in the terms of FDI, they have a critical limitation in that, they require a priori knowledge of the process faults or the relationships between variables to work. Neural-networks and SVMs require data points classified as "faults" to train the monitoring classifier, which is impossible to obtain before a fault actually occurs. The directed edges of Bayesian networks have to be modeled beforehand to detect the cause of the fault upon occurence, but this is difficult as well, since calculation of causal relationships are difficult in large-scale systems. Also, Bayesian networks have limited expressibility, such as not being able to model cyclic relationships of variables, whereas there are numerous cyclic relationships within chemical processes. These limitations restrain these methods from actually being applied in the industry. Thus a novel monitoring scheme that improves the monitoring performance of state-of-the-art monitoring methods, and also is capable of analyzing propagation paths, is required. In this study, a monitoring methodology which is capable of dealing with process nonlinearity and fault propagation path anaylsis, while retaining good process monitoring

4

ACS Paragon Plus Environment

Page 4 of 35

Page 5 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

performance in terms of fault detection speed and accuracy, is proposed. The method, denoted as Glasso-MRF monitoring throughout the paper, integrates the use of the graphical lasso and Markov random fields modeling, to efficiently perform process monitoring. Also, Glasso-MRF monitoring allows fault propagation analysis, 15,16 to provide information on the characteristics of the fault and to prevent the fault from further affecting the process. This is an important feature of Glasso-MRF monitoring, since fault propagation path analysis is an important step for identifying the fault and minimizing process damage, as mentioned in Chiang et al. 1 To prove the performance of the proposed methodology, the well-known Tennessee Eastman Process (TEP) is used, to test out the monitoring performance using the proposed algorithm and compare the results to state-of-the art monitoring techniques. The outline of the paper is as follows. The preliminaries required for explaining the methodology is introduced in Section 2, then the Glasso-MRF monitoring methodology, along with its two subparts, the graphical lasso and Markov random fields, is explained extensively in Section 3. The implementation results of the method to the TEP is given in Section 4, along with the discussion of results and comparison of monitoring performance with other monitoring algorithms. Section 5 concludes the paper and explains the related future work.

Preliminaries Markov Random Fields Graphical models largely consist of two types, directed graphs (Bayesian networks), and undirected graphs (Markov random fields). 17 Bayesian network representation is a directed acyclic graph, where the nodes represent the random variables of the system, and the edges correspond to the direct influence of one node to another. Conditional probability distributions can be effectively expressed by the Bayesian network, and has the 5

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 35

potential for causal analysis. Also, the inference of Bayesian networks is computationally cheap compared to undirected graphical models, allowing it to be applied to various fields of study, such as molecular modeling and biometrics. However, there are certain relationships that cannot be expressed using the Bayesian network, an example of which is the cyclic relationship of nodes. This is a critical limitation for applying Bayesian networks to chemical processes, since chemical processes consist of numerous cyclic expressions, such as in recycle streams and columns. The fact that it is difficult to retrieve causal relationships from historical data is another factor limiting its application to process plants. Markov random fields (MRFs) resolve this shortcoming of Bayesian networks, where the nodes are connected by edges without any directional expression. This allows the expression of cyclic relationships as well as more complex graph structures, where the directional interaction between variables cannot be naturally ascribed. To express the joint probability of variables in a MRF, a symmetric parameterization, called a factor, is used. A factor, or potential function, expressed as ψi ( a, b) represents the affinity between the two values, a and b. Since a factor is only one contribution to the overall joint distribution, the global probability distribution of a MRF is expressed as the joint contribution of all local factors, as shown in Eq. 1,

f (x) =

1 Z

∏ ψC (xC )

(1)

C

The positive functions ψC (·) are the factors for clique potentials, where a clique is a complete subgraph. The C is the set of maximal cliques. The variable Z is the partition function, and is defined as,

Z=

∑ ∏ ψC (xC ) x

(2)

C

and works as a normalization factor for normalizing the clique probability. The evaluation of the partition function is computationally infeasible and is the bottleneck for training

6

ACS Paragon Plus Environment

Page 7 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

and inferencing the MRF. Thus when dealing with MRFs it is essential to devise a way for efficiently calculating the partition function. Kernel density estimation is used in this study, which is explained in detail in Section 3.2.

Graphical Lasso The lasso is a well known regression method first proposed by Tibshirani, 18 which makes use of the l1 norm to set irrelevant coefficients to zero while calculating regression parameters. A value of zero means that the certain variable can be excluded from the regression model, thus working as a variable selection methodology. Recently the lasso has been extended to be applied in graphical models, namely graphical lasso, allowing the core subset of a graphical model to be found efficiently. The graphical lasso was first proposed by Friedman et al., 19 and various methods for applying the lasso on graphical models have been proposed recently, 20–23 along with research making use of the sparse graphical models induced from the use of the graphical lasso. 24 Following the work proposed by Friedman et al., 19 the graphical lasso problem can be expressed as:

max log det(Θ) − tr (SΘ) − ρkΘk1

(3)

Θ

where Θ is the precision matrix, or the inverse of the covariance matrix of the variables Σ, S is the empirical covariance matrix, and ρ is the regularization parameter for the l1 norm. Eq. 3 is the Gaussian log-likelihood of the data, partially maximized with respect to the mean parameter, µ. Maximizing Eq. 3 is equivalent to solving the graphical lasso problem, and Friedman et al. 19 solves this problem by first partitioning the matrices W, and S as: 







 W11 w12   S11 s12  W= ,S =   T T w12 w22 s12 s22

7

ACS Paragon Plus Environment

(4)

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 35

then using the fact that the solution for w12 satisfies:

w12 = arg min{y T W11 −1 y : ky − s12 k∞ ≤ ρ}

(5)

y

and that the above equation is equivalent to solving the dual problem, 1 min kW11 1/2 β − bk2 + ρk βk1 β 2

(6)

where b = W11 −1/2 s12 . Using the relationship that WΘ = I, the sub-gradient equation for maximization of the log-likelihood Eq. 3 is:

W−S−ρ·Γ = 0

(7)

Thus by iteratively solving and updating the lasso problem Eq. 7, the graphical lasso problem can be solved to return the essential subset of a given graph. As a result, a sparse precision matrix (Θ) is obtained, where the zero values in the matrix denote the absence of an edge between the two nodes. This way, the structure of an undirected graphical model can be obtained, in a sparse form so that the computational complexity of inferencing the graph is mitigated. As a note, the value of the regularization parameter ρ alters the structure of the resulting undirected graph, which leads to the issue of selecting an appropriate value for ρ. The objective of graphical lasso is to obtain the structure of an undirected graph, rather than to minimize the regression error as is the case in lasso regression. Thus the value of ρ may be selectively determined depending on the purpose of its use, which, in this study, is to divide the entire group of process variables into the decided number of sub-groups. This process of obtaining an undirected graph structure using graphical lasso, which consists of highly inter-correlated variables, is one of the core elements of this study.

8

ACS Paragon Plus Environment

Page 9 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

MRF Monitoring Integrated with Graphical Lasso The proposed monitoring scheme works in two steps: first the entire set of variables are grouped into smaller sets via the iterative graphical lasso algorithm, where the structure of the MRF is obtained as well. Then using the probability density calculation procedure of MRF, the monitoring limits of each group are calculated from the significance level α of the normal process data. During the monitoring process, the data points to be monitored are converted into probability values using the trained MRF model, then evaluated against the monitoring limits to determine whether it is in normal or faulty condition.

Step 1: Iterative Graphical Lasso The graphical lasso serves two purposes in the proposed monitoring scheme. One is the grouping of monitoring variables into inter-correlated sets as a sparse graph structure, and the other is mitigation of computational complexity of MRF learning and inference. By grouping the highly correlated variables, the time delay of fault detection is reduced, fault detection accuracy is increased, and since the time it takes to detect the fault varies for different groups, it has the potential for fault propagation path analysis. The mitigation of computational complexity is a very important aspect as well, since the accuracy and speed of MRF learning and inference depends greatly on the number of nodes and edges. Graphical lasso usually results in one subset of nodes with all of the other nodes eliminated from the group, which is not the desired form to be used in the MRF monitoring scheme. To effectively monitor the process, all of the variables should be included within a certain group so that they can undergo multivariate monitoring. To this end, an algorithm for iteratively grouping all of the process variables is proposed. First, the desired number of groups is decided based on experience, then the graphical lasso is applied initially to the entire set of monitoring variables. After a subset is made, graphical lasso is applied again on the remaining variables. This procedure is repeated until all of the variables are

9

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

included within a group. The flow chart of the algorithm is visualized in Figure 1.

Figure 1: Iterative graphical lasso algorithm for grouping the variables into relevant relationships.

Step 2: MRF Monitoring To use MRF for process monitoring purposes, the parameters of the graphical model have to be trained, then a monitoring statistic obtained from normal process data has to be calculated. As mentioned in the preliminaries section, training and inference of the MRF is computationally expensive, so an efficient method for calculating the probabilities of input data has to be used. In this study, kernel density estimation (KDE), a non-parametric method estimating the probability density function of graphical models, is used to train and infer the probability of the MRFs. KDE allows effective modelling of the nonlinear, non-Gaussian variables, and allows quick inference even though the training process is 10

ACS Paragon Plus Environment

Page 10 of 35

Page 11 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

slow and computationally expensive. The calculation of a unknown probability density function using KDE is given in Eq. 8,

f h (x) =

1 n x − xi K( ) ∑ nh i=1 h

(8)

where the variable f is the density of interest that is to be estimated, K is the kernel that is to be used to estimate the density function, and h is the bandwidth of the specific kernel. The most widely used kernel function is the Gaussian kernel, which is expressed as: 1 −1 k x − x i k2 Ki ( x, h) = √ ) exp( 2 h2 h 2π

(9)

The Gaussian kernel is used throughout this study as well. Bandwidth is an important factor in kernel density estimation, having a great influence on the estimation results. To speed up the calculation process, an empirical equation for calculating bandwidth, the Silverman’s rule of thumb 25 is used to select a value for the bandwidth: 2

d +4 4 ·S h= n ( d + 2)

(10)

where S is the sampled covariance matrix of the dataset. The overall process of MRF monitoring is, to first train separate KDE models for each of the MRFs, then obtain the monitoring limit using the trained KDE models, by setting the α confidence level of the probability values. For the new data points that are subject to monitoring, the probability of each point is inferred using the trained KDE model for each group of MRF, then their values are compared to the monitoring limits. If the inferred values are above the limit the data point is in a normal condition, whereas if below the value, it is in a faulty condition.

11

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Implementation of Glasso-MRF monitoring to the Tennessee Eastman process The proposed monitoring method is applied to the widely used, Tennessee Eastman benchmark process (TEP), to test out its performance. Since most of the recent monitoring methods have been readily applied to the TEP, it serves the purpose of performance comparison. Two widely used metrics, fault detection accuracy (FDA) and fault detection rate (FDR), are used to evaluate the performance of the Glasso-MRF monitoring method. FDA and FDR are defined based on the binary classification test of the data points, as shown in Figure 2. The TEP is explained in the following section, then the results of applying the proposed method in terms of fault detection accuracy, rate and speed are shown, in comparison with conventional, as well as state-of-the-art monitoring methods.

Figure 2: Binary classification of monitored data points (left), and the definitions of FDA and FDR 26 (right).

12

ACS Paragon Plus Environment

Page 12 of 35

Page 13 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Tennessee Eastman Process The Tennesse Eastman Process (TEP) is a widely-used benchmark chemical process. After being first published by Downs and Vogel, 27 it has been used to test out various control structures, optimization methods, and to verify the usefulness of different process monitoring techniques. It was first published in the Fortran language, but to enhance compatibility and usage, it was revised in various languages and certain modifications have been made to it. The most recent version is the MATLAB version published by Bathelt et al., 28 which resolved the numerical solver issues in the original version and comprises of three different control strategy models. Compared to the version in Downs and Vogel, 27 this model implements the control strategy described in Ricker, 29 and has 8 more additionally programmed faults, making up a total of 28 fault cases. Most of the previous studies use the dataset extracted from the Downs and Vogel 27 model, but since the MATLAB version shows more consistent numerical results compared to the original Fortran model, this version is used in this study. The 28 faults are tested out to verify the performance of the Glasso-MRF monitoring framework, then the first 21 faults are used to compare its performance with previously proposed monitoring methods. The configuration of the TEP is given in Figure 3. The TEP model consists of five process units, the reactor, condenser, compressor, vapor/liquid separator, and the stripper. Four types of gaseous materials, A, C, D, and E, are put into the process to produce two types of products G and H, along with a by-product F. The data obtained from the TEP simulation consists of 22 continuously measured variables, 19 composition variables, and 12 manipulated variables. For monitoring purposes, only 50 of the 53 variables are used, as listed in Table 1. This is due to the fact that variables number 46, 50, and 53, which are the MV for compressor recycle valve, the stripper steam valve, and the agitator speed, respectively, retain constant values under the control strategy implemented and thus may degrade the performance of monitoring schemes. The set of 28 programmed fault cases are listed in Table 2. It should be noted that the description of fault 21 is different from 13

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3: P&ID of the Tennessee Eastman process. Revised MATLAB version (Bathelt et al. 28 ). IFAC Copyright is acknowledged.

Bathelt et al., 28 and is consistent with previous studies that monitor the TEP. 6,10,30 When using the MATLAB version of TEP, the fault initiation time and duration of simulation can be user-defined, along with the type of fault to be applied. In is study, all of the faults were set to be introduced at the 1000th data point, and the 21 simulations were run until the 7200th data point to observe the change in process conditions.

Glasso-MRF monitoring on TEP Prior to the actual monitoring process, the iterative graphical lasso algorithm is implemented on the 50 process variables of the TEP to divide them into relevant groups. The resulting groups are visualized in Figure 4. The process variables are set to be divided into five groups. The variables included within each group, and the value of the l1 penalty factor ρ for each of the iterations are 14

ACS Paragon Plus Environment

Page 14 of 35

Page 15 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Table 1: Monitored variables in the Tennessee Eastman process. Numberings are based on the original paper (Downs and Vogel 27 ). No. Process variables 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

No.

A feed (stream 1) D feed (stream 2) E feed (stream 3) A & C feed (stream 4)

Process variables

18 19 20 21

Stripper temperature Stripper steam flow Compressor work Reactor cooling water outlet temperature Separator cooling water outlet temRecycle flow (stream 8) 22 perature Reactor feed rate (stream 6) 23-28 Components A, B, C, D, E, F in stream 6 Reactor pressure 29-36 Components A, B, C, D, E, F, G, H in stream 9 Reactor level 37-41 Components D, E, F, G, H in stream 11 Reactor temperature 42 MV for D feed flow (stream 2) Purge rate (stream 9) 43 MV for E feed flow (stream 3) Product separator temperature 44 MV for A feed flow (stream 1) Product separator level 45 MV for total feed flow (stream 4) Product separator pressure 47 MV for purge valve (stream 9) Product separator underflow MV for separator pot liquid flow 48 (stream 10) (stream 10) MV for stripper liquid prod flow Stripper level 49 (stream 11) Stripper pressure 51 MV for reactor cooling water flow Stripper underflow (stream 11) 52 MV for condenser cooling water flow

15

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 35

Table 2: Process faults in the TEP model. The IDV notation is used consistently with Downs and Vogel. 27 No.

Description

Type

IDV(1) IDV(2) IDV(3) IDV(4) IDV(5) IDV(6) IDV(7) IDV(8) IDV(9) IDV(10) IDV(11) IDV(12) IDV(13) IDV(14) IDV(15) IDV(16) IDV(17) IDV(18) IDV(19) IDV(20) IDV(21) IDV(22) IDV(23) IDV(24) IDV(25) IDV(26) IDV(27) IDV(28)

A/C feed ratio, B composition constant (stream 4) B composition, A/C ratio constant (stream 4) D feed (stream 2) Reactor cooling water inlet temperature Condenser cooling water inlet temperature A feed loss (stream 1) C header pressure loss-reduced availability (stream 4) A, B and C composition (stream 4) D feed temperature (stream 2) C feed temperature (stream 4) Reactor cooling water inlet temperature Condenser cooling water inlet temperature Reaction kinetics Reactor cooling water valve Condenser cooling water valve Unknown Unknown Unknown Unknown Unknown The valve for stream 4 E feed temperature (stream 3) A feed pressure (stream 1) D feed pressure (stream 2) E feed pressure (stream 3) A and C feed pressure (stream 4) pressure fluctuation in reactor CW re-circulating unit pressure fluctuation in condenser CW re-circulating unit

Step Step Step Step Step Step Step Random variation Random variation Random variation Random variation Random variation Slow drift Sticking Sticking Constant position Random variation Random variation Random variation Random variation Random variation Random variation Random variation

16

ACS Paragon Plus Environment

Page 17 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 4: Results of applying the iterative graphical lasso on the monitored variables of the TEP. Numberings on nodes correspond to the variable numbers.

shown in Figure 4. It can be seen that the iterative graphical lasso algorithm shows reasonable results, since the variables consisting each group are inter-correlated according to the variable attributes, the sequence of the process units, and the associated control loops. Group 1 consists of input and output flowrates (1, 10) and their corresponding MVs (42, 43, 44), and the unit pressure variables (7, 13, 16), and their corresponding MVs (47). Group 2 consists mainly of temperature (11, 18, 21, 22) and MV (48, 49, 51) variables. The MVs are connected to the temperature variables through the control loop, and the compressor work variable (20), and the composition of F (28) are naturally connected to the temperature variables through thermodynamic correlations. Group 3 consists of the input flowrates (3, 4) and its MV (45), and the compositions of the corresponding streams (25, 30, 32-36, 40). Group 4 consists of the composition variables of the input stream (23, 24, 26, 27), the purge stream (29, 31), and the product stream (37, 38, 39, 41). Group 5 consists mainly of flowrate variables (2, 5, 6, 14, 17, 19), and the level variables (8, 12, 15) that are directly related to them. The temperature variable (9) is related to the reactor level 17

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

variable (8) since they both represent reactor variables, and the MV variable (52) relates to the temperature variable since it controls the cooling water flow. This analysis shows that all of the variable members of each group are correlated according to the three criteria, variable attributes, sequence of process units, and the associated control loops. As for the regularization parameter, it can be seen that the value of ρ naturally decreases after each iteration, since the remaining number of variables decrease. The variable IDs and edge structure for each of the resulting groups, are saved to be used for the MRF monitoring process. After the separate groups subject to monitoring are defined, appropriately trained KDE models for each of the groups, as well as their monitoring limits have to be obtained. KDE for multiple variables usually suffer from the "curse of dimensionality", and a large number of data points are required to obtain a model with low error rates. To obtain the required number of data points for each of the monitored groups, the information given in Gonzalez et al. 13 is used. Upon fitting the data to an exponential equation and evaluating the model for 11 variables, approximately 70,000 data points are required to reach sufficient confidence levels. Thus the simulation under normal operating conditions is run until the 70,000th data point, then a KDE model was trained for each of the groups using this dataset. The data from Gonzalez et al., 13 and the required number of data points calculated from the fitted equation is given in Table 3. Table 3: Number of data points required for training multivariate KDE models. Values up to 6 dimensions were taken from Gonzalez et al. 13 Dimension Number of data points 1 2 3 4 5 6 11

40 84 175 366 765 1600 67,890

18

ACS Paragon Plus Environment

Page 18 of 35

Page 19 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Due to the relatively high dimensionality of each of the groups, and the large number of data points, the KDE training process takes some time. However, since this process is required only once for training the KDE models for each of the MRFs before the actual monitoring starts, it does not alter the real-time monitoring performance or the speed of the proposed method. The trained KDE models are used to calculate the monitoring limit. As mentioned in Rato and Reis, 30 it is important to select the value of the significance level α so that all of the monitoring groups of subject to monitoring attain the same error rate in normal conditions. Thus the value of α was set so that each group showed an error rate of 5% in normal operating conditions. The Glasso-MRF monitoring result for fault 1 is given in Figure 5, as an representative example. The monitoring results for each of the five groups are shown, where the probability density values calculated from the pre-trained KDE model are given as the black data points, and the monitoring limit for each group is given in red. Fault 1 makes a step change of the A/C ratio in stream 4 upon initiation at the 1000th sample, altering the process variables downstream. The deviation of the process variables from normal operating values are evident in fault 1, and it can be seen that fault 1 is clearly detected in all of the five groups. All of the data points before the 1000th sample are above the monitoring limit for all five groups, and then a large drop occurs so that the probability density values are below the monitoring limit after the 1000th data point. It is notable that some samples in group 5 appear to have false negativity after the fault occurs, while showing a fault detection rate of 93.35%. This is because the variables in group 5 are consisted of quickly reacting variables in terms of control, such as flow streams, unit levels, and unit temperatures. It can be analyzed that the flow, level, and temperature controllers try to recover its normal operating points constantly after the fault occurs. While the fault detection rate for group 5 maybe slightly lower than other groups, this does not cause a problem for the overall monitoring process since the monitoring results of other groups

19

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

evaluate the status of the entire process as in a faulty condition.

Figure 5: MRF monitoring results of the five groups for fault 1

To compare the performance of Glasso-MRF monitoring to other monitoring methods, faults 9 and 15 were chosen for analysis. In Figure 6, the PCA monitoring results for the two faults are given, and in Figure 7, the Glasso-MRF monitoring results for the two faults are given. The conventional statistics, T 2 and SPE values, were used for PCA monitoring, and the number of PCs were set to 17, following the work by Rato and Reis. 30 Similar to Figure 5, the black lines represent the calculated monitoring statistic values for the monitoring method implemented, and the red lines represent the monitoring limits. It should be noted that due to the difference in monitoring methods, fault detecting conditions of PCA are opposite to that of MRF monitoring conditions, where a faulty condition occurs when the monitoring statistics are above the monitoring limit. Fault 9 occurs as a random variation of the D feed (stream 2) temperature. As shown in Figure 6a and Figure 6b, PCA fails in detecting the fault, where most of the monitoring statistic values stay below the monitoring limit. Since there is no change in the data

20

ACS Paragon Plus Environment

Page 20 of 35

60

50

50

40

40

SPE

60

30

30

20

20

10

10

0

0 0

1000

2000

3000

4000

5000

6000

7000

0

8000

1000

2000

3000

4000

5000

6000

7000

8000

6000

7000

8000

Sample Time

Sample Time

(a) T 2 for fault 9

(b) SPE for fault 9

45

70

40

60

35

50 30

40

SPE

25

T2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

T2

Page 21 of 35

20

30

15

20 10

10 5

0

0 0

1000

2000

3000

4000

5000

6000

7000

8000

0

1000

2000

3000

4000

5000

Sample Time

Sample Time

(c) T 2 for fault 15

(d) SPE for fault 15

Figure 6: PCA monitoring results for fault 9 ((a), (b)), and fault 15 ((c), (d))

21

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(a) Glasso-MRF monitoring result for fault 9

(b) Glasso-MRF monitoring result for fault 15

Figure 7: Glasso-MRF monitoring results for fault 9 and fault 15

22

ACS Paragon Plus Environment

Page 22 of 35

Page 23 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

trend for the PCA monitoring statstics, it can be assumed that PCA fails at detecting the fault. In Figure 7a, monitoring results for separate groups are shown for Glasso-MRF monitoring. While groups 1 and 2 seem to fail in effectively detecting the fault, with many false negativity, groups 3, 4, and 5 succeed in detection. Especially in group 4, the fault is detected with high accuracy, with a detection rate of 97.47%. This shows the advantage and effectiveness of Glasso-MRF monitoring, where by dividing the monitored variables into groups of similar characteristics or intercorrelation in terms of control or distance, different groups become separately sensitive to a specific fault. Whereas some groups might not react to a fault due to low correlation, groups containing variables related to the fault of interest show great performance in fault detection, increasing the overall fault detection accuracy of the process. Fault 15 occurs as valve sticking of the condenser cooling water valve. Similar to the case for fault 9, PCA monitoring fails in detecting the occured fault, showing no data trend change after the fault occurs. The results of the Glasso-MRF monitoring shows good fault detection performance, where the groups 1 and 2 are insensitive to the fault, but groups 3, 4, and 5 effectively detect the fault. The common features of fault 9 and 15 are that both faults are related to temperature maintenance of the process. Along with the fact that group 4 mainly consists of composition values of the input and output flow streams, it can be assumed that temperature related variations result in only small changes of the monitored variables, making it difficult for the conventional process monitoring methods to detect the change. This is in accordance with the fact that temperature changes are quickly controlled, and directly results in composition disturbance in the TEP, since it affects the reaction rate. The FDA and FDR of all five groups, for each of the faults are presented in Table 4. The overall FDA and FDR are values from the group showing the best monitoring performance. Since error detection in one group qualifies as a faulty condition, this value can be deemed as the monitoring performance of the Glasso-MRF monitoring. It can be seen that, although

23

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 35

individual group performance varies according to the type of fault and the variables related to them, Glasso-MRF monitoring shows stable monitoring results, showing over 95% FDA and over 96% FDR for all of the faults. Table 4: FDA and FDR of the five groups for all 28 faults (FDA/FDR).

Fault No.

G1

G2

Groups G3

G4

G5

IDV(1) IDV(2) IDV(3) IDV(4) IDV(5) IDV(6) IDV(7) IDV(8) IDV(9) IDV(10) IDV(11) IDV(12) IDV(13) IDV(14) IDV(15) IDV(16) IDV(17) IDV(18) IDV(19) IDV(20) IDV(21) IDV(22) IDV(23) IDV(24) IDV(25) IDV(26) IDV(27) IDV(28)

98.13/99.89 97.92/99.65 15.15/3.53 19.80/8.93 17.71/6.50 98.21/99.98 22.86/12.48 96.35/97.82 22.86/12.48 25.04/15.01 74.89/72.91 50.99/45.15 97.74/99.44 23.59/13.34 18.04/6.89 15.23/3.63 59.37/54.88 28.47/19.00 58.13/53.44 96.58/98.10 20.68/9.95 22.09/11.59 38.87/31.08 97.00/98.58 94.76/95.98 54.38/49.09 37.22/29.16 19.55/8.64

98.92/99.95 98.71/99.71 57.13/51.43 98.94/99.98 52.20/45.70 98.93/99.97 47.88/40.69 98.13/99.03 61.03/55.96 97.15/97.90 98.46/99.42 88.20/87.50 98.49/99.45 98.93/99.97 45.88/38.36 42.69/34.66 97.92/98.79 95.92/96.47 98.79/99.81 97.50/98.31 45.48/37.90 63.41/58.72 42.15/34.03 95.43/95.90 80.53/78.60 63.10/58.36 96.35/96.97 58.46/52.98

99.25/1.00 99.19/99.94 88.42/87.42 88.82/87.89 88.64/87.68 99.24/99.98 99.25/1.00 99.07/99.79 89.08/88.20 89.88/89.11 90.52/89.86 89.86/89.10 99.01/99.73 88.99/88.08 86.88/85.63 88.79/87.86 88.63/87.66 89.56/88.74 92.06/91.65 98.33/98.94 88.02/86.95 87.57/86.44 87.70/86.58 90.65/90.02 92.46/92.11 98.88/99.56 89.45/88.61 89.47/88.65

99.01/99.66 99.29/99.98 96.39/96.61 96.61/96.87 97.22/97.58 99.15/99.82 97.46/97.86 99.29/99.98 97.47/97.87 97.72/98.16 96.74/97.02 96.28/96.48 99.29/99.98 97.17/97.52 97.40/97.79 96.53/96.77 97.01/97.34 97.68/98.11 97.06/97.39 99.01/99.66 96.89/97.19 96.63/96.89 97.14/97.48 97.89/98.36 97.96/98.44 96.88/97.18 96.69/96.97 97.28/97.65

93.35/93.11 98.97/99.65 89.03/88.10 89.53/88.68 90.67/90.00 99.26/99.98 89.70/88.87 98.56/99.16 89.89/89.10 91.10/90.50 98.08/98.61 91.77/91.28 99.22/99.94 98.86/99.52 90.04/89.28 89.20/88.29 97.44/97.87 91.49/90.95 95.07/95.11 98.76/99.40 89.71/88.89 90.07/89.31 89.02/97.48 96.17/98.58 91.11/98.44 89.78/99.56 96.93/97.27 89.82/97.65

24

ACS Paragon Plus Environment

Page 25 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Fault detection accuracy comparison with other monitoring techniques For the purpose of performance comparison, monitoring results for Glasso-MRF monitoring are shown along with other conventional and recently-proposed monitoring techniques, for the first 21 programmed faults, in Table 5. The FDA values of various methods are taken from Yan et al. 6 and the FDR values for the DPCA-DR method are taken from Rato and Reis. 30 Table 5: FDA (%) of PCA, KPCA, CAE, 6 and Glasso-MRF monitoring, and the FDR (%) for DPCA-DR method 30 for the first 21 faults within the TEP model PCA Fault No. T2 IDV(1) IDV(2) IDV(3) IDV(4) IDV(5) IDV(6) IDV(7) IDV(8) IDV(9) IDV(10) IDV(11) IDV(12) IDV(13) IDV(14) IDV(15) IDV(16) IDV(17) IDV(18) IDV(19) IDV(20) IDV(21)

99.0 95.7 2.0 0.2 23.3 98.7 98.0 94.2 1.0 4.5 19.5 92.1 93.6 89.8 1.0 1.2 63.1 88.2 0.5 13.2 19.7

SPE

KPCA T2

99.2 98.3 1.6 97.4 43.5 100 98.7 97.5 1.2 72.0 73.3 98.1 96.0 98.8 21.1 65.8 97.5 92.1 35.2 64.5 65.3

99.3 98.3 2.7 30.0 24.2 100 100 97.9 2.8 45.0 73.6 99.0 94.8 100 5.2 42.0 83.6 89.9 4.8 51.5 28.1

SPE

CAE T2

100 99.3 6.1 100 27.8 99.4 100 97.3 5.1 45.3 47.9 98.5 94.3 99.5 8.4 36.4 77.5 90.5 18.6 57.6 40.6

99.0 98.8 83.3 49.2 44.7 99.5 98.3 97.8 18.2 65.3 83.3 99.8 95.9 100 32.1 47.9 100 100 21.3 99.9 88.9

SPE

DPCA-DR 2 TPREV

2 TRES

MRF

96.8 96.4 45.1 98.3 96.9 100 100 96.6 30.4 89.4 87.6 99.4 98.6 97.1 43.8 85.8 94.6 97.6 83.3 98.6 92.0

99.6 98.5 2.1 99.8 99.9 99.9 99.9 98.5 2.0 95.6 96.5 99.8 95.8 99.8 38.5 97.6 97.6 90.5 97.1 90.8 53.9

99.8 98.3 1.6 99.9 99.9 99.9 99.9 98.1 1.0 93.3 86.5 99.8 95.6 99.9 4.7 94.5 97.5 90.0 84.3 91.6 57.7

99.3 99.3 96.4 98.9 97.2 99.3 99.3 99.3 97.5 97.7 98.5 96.3 99.3 98.9 97.4 96.5 97.9 97.7 98.8 99.0 96.9

It is clear from the results in Table 5 that Glasso-MRF monitoring greatly outperforms all of the other monitoring algorithms, showing consistently high FDA for all of the faults. FDR values of Glasso-MRF monitoring are not shown in Table 5, but it can be seen from the results in Table 4 that Glasso-MRF monitoring outperforms the DPCA-DR method as 25

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

well. For certain fault cases, such as fault 12, 17, and 18, FDA of Glasso-MRF monitoring is slightly lower than the CAE monitoring method proposed by Yan et al., 6 but the difference is negligible. The effectiveness of the proposed method is emphasized in faults number 3, 9, 15, 16, and 19. FDA for these faults are under 10 percent when using PCA and KPCA, meaning that fault detection is barely possible in these cases. FDA increases for CAE, but still retain a value below 90 percent. For faults 9 and 15 the best performing method only shows FDA lower than 50 percent. However, with the use of the Glasso-MRF monitoring method, FDA for all of the faults are over 90 percent, showing the consistency of the proposed method. This high FDA results from the fact that the nonlinear, non-Gaussian characteristics of the variables are accounted for by using kernel density estimation, and that there is no reduction in dimensionality during the monitoring process, allowing even small changes in variable values to be effectively detected. Also, the monitoring strategy of separately monitoring the variables in different groups, which enables the small variations to be amplified so that they can be effectively monitored, contributes to the high FDA.

Fault detection speed & fault propagation While the fault detection performance of Glasso-MRF monitoring is evident, it also shows very fast fault detection speed, compared to conventional monitoring methods. The fault detection time for fault number 1, using PCA and Glasso-MRF monitoring, is shown in Table 6. The time difference in fault detection for the two monitoring methods is approximately 7 data points, which is equivalent to 252 seconds of monitoring time. This time difference in fault detection could result in a critical damage to the process in the actual industry, so it is important that the faults be detected as quickly as possible. Since Glasso-MRF monitoring dissects the entire set of variables into separate groups, allowing the individual variables to be analyzed without characteristics being ignored, fault detection is done in a more efficient manner. As another aspect of fault detection speed calculation, the fault propagation path can 26

ACS Paragon Plus Environment

Page 26 of 35

Page 27 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Table 6: Fault detection time of PCA and Glasso-MRF monitoring for fault 1. Method

Detection time

PCA

T2 SPE

1012 1008

MRF

G1 G2 G3 G4 G5

1003 1003 1001 1002 1001

be analyzed using the fault detection speed of each of the groups, since the detection time for each group varies according to the relevance of the composing variables in relation to the fault of interest. Fault cases number 3 and 15 are analyzed to illustrate the fault propagation path analysis performance of Glasso-MRF monitoring. Table 7: Detection time and variable analysis of the five groups for fault 3 and 15. Fault Case Groups Order of detection Initial detection time (samples)

IDV(3)

G3 G5 G4 G2 G1

1 1 3 4 5

1001 1001 1002 1008 1045

IDV(15)

G3 G5 G4 G1 G2

1 1 3 4 5

1001 1001 1002 1003 1012

As shown in Table 7, the fault detection time of each group varies significantly for both of the fault cases. In fault number 3, where the D feed (stream 2) is changed step-wise, the detection time of the slowest group is delayed by 44 data points, compared to the group with the fastest detection time. It can be predicted that the fault propagates along the groups according to their fault detection time, allowing a more extensive understanding of the nature of the fault. The fastest groups, number 3 and 5, are composed of variables 27

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

directly related to feed streams and reactor operating conditions, such as D feed (2), E feed (3), Recycle flow (5), Reactor level (8), and Reactor temperature (9). The number in parentheses represent the variable numbers from Table 1. The next group to detect the fault is group 4, which is composed mainly of feed composition variables from streams 6, 9, and 11. This can be seen as the change in feed flowrate of stream 2 affecting the compositions of purge and product streams, which is a natural propagation of the fault before the control structure starts working to mitigate the change. Group 2, which consists mostly of temperature variables and manipulated variables, starts detecting the fault at sampling point 1008. This can be seen as the manipulated variables being activated to mitigate the change caused by the fault. Also, these manipulated variables alter the temperature values of the separator and stripper, altering the monitoring values of group 2 even more. Group 1 is consisted mostly of flow, and pressure variables, and the manipulated variables that alter the flow and pressure variables. It can be analyzed that, after the change in flowrate of stream 2 is altered, and has affected the flowrate, composition and temperatures of downstream units, the control structure for manipulating input flowrates are activated to minimize the change evoked by the change in stream 2. This analysis, provided by the detection time difference of the groups, is in accordance with the decentralized control structure described in Ricker. 29 For fault number 15, the detection time of the slowest group is delayed by 11 data points, compared to the group with the fastest detection time. This difference may seem small compared to the case of fault number 3, but it still evidently shows the fault propagation path. Fault 15 is related to the sticking of the condenser cooling water valve, leading to insufficient condensing of reactor outlets. This directly affects the product flowrates and compositions, resulting in immediate error detection in groups 3 and 5, followed by group 4. This is in accordance with the case of fault number 3, and is natural considering that the flowrate and composition are the first to be affected by flow phase changes. However, the order of fault detection of the final two groups are opposite to that of fault number 3,

28

ACS Paragon Plus Environment

Page 28 of 35

Page 29 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

where group 1 detects the fault faster than group 2. This is due to the fact that group 1 is composed of pressure related variables, such as Reactor pressure (7), Product separator pressure (13), Stripper pressure (16), and MV for purge valve (47). Sticking of cooling water valve leads to pressure increase, eventually making the alarm for group 1 to go off. Although the fault detection time of group 1 is slower than groups 3, 5, and 4, this is a natural result, since the control structure implemented 29 offers decentralized control of the variables, and flowrates are controlled to a setpoint whereas pressures are controlled to be within a certain range. As for group 2, it is the slowest to react since it mainly comprises of temperature related variables, reacting to the fault after the pressure related variables are affected. This different reaction of the variable groups compared to the case of fault 3, reflects the difference in the feature of the initiated fault. As can be seen in the two examples, with a quick analysis of the difference in fault detection time of the groups and the variables consisting each group, the propagation process of the initiated fault, and the activation of control structures can be predicted. This allows the user to quickly analyze the specifics and prevent further propagation of the fault, leading to safer plant operation.

Conclusions and Future Work In this study, a novel process monitoring framework integrating the iterative graphical lasso and MRF modelling technique is proposed. The idea is to divide the process variables into relevant groups, and apply a monitoring method using Markov random fields so that the individual characteristics of the variables are not reduced into simplified dimensions, and nonlinear, non-Gaussian relationships can be taken into account. The monitoring performance of the proposed method was evaluated by implementing it on the TEP model, and results showed that the FDA and FDR were higher than any other monitoring technique proposed in literature. Also, the difference in monitoring speed among the

29

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

groups provided a basis for fault propagation path analysis. The proposed monitoring scheme proved to be very efficient, and it also has the potential to be used for fault isolation using the factor graph modelling of MRFs, which is being studied as future work. Combined with the fault propagation path analysis capability displayed in the final parts of the paper, the fault isolation procedure can provide a thorough analysis of the fault, being able to pin-point the cause of the fault, and directly connect the fault effects propagating throughout the process.

Acknowledgement This work was supported by the Korea Institute of Energy Technology Evaluation and Planning (KETEP) and the Ministry of Trade, Industry & Energy (MOTIE) of the Republic of Korea (No. 20162220100030).

References (1) Chiang, L. H.; Russell, E. L.; Braatz, R. D. Fault detection and diagnosis in industrial systems; Springer Science & Business Media, 2000. (2) Schölkopf, B.; Smola, A.; Müller, K.-R. Kernel principal component analysis. International Conference on Artificial Neural Networks. 1997; pp 583–588. (3) Choi, S. W.; Lee, C.; Lee, J.-M.; Park, J. H.; Lee, I.-B. Fault detection and identification of nonlinear processes based on kernel PCA. Chemometrics and Intelligent Laboratory Systems 2005, 75, 55–67. (4) Lee, J.-M.; Yoo, C.; Lee, I.-B. Statistical process monitoring with independent component analysis. Journal of Process Control 2004, 14, 467–485.

30

ACS Paragon Plus Environment

Page 30 of 35

Page 31 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

(5) Ge, Z.; Song, Z. Process monitoring based on independent component analysis- principal component analysis (ICA- PCA) and similarity factors. Industrial & Engineering Chemistry Research 2007, 46, 2054–2063. (6) Yan, W.; Guo, P.; gong, L.; Li, Z. Nonlinear and robust statistical process monitoring based on variant autoencoders. Chemometrics and Intelligent Laboratory Systems 2016, 158, 31 – 40. (7) Gajjar, S.; Kulahci, M.; Palazoglu, A. Real-time fault detection and diagnosis using sparse principal component analysis. Journal of Process Control 2018, 67, 112 – 128, Big Data: Data Science for Process Control and Operations. (8) Luo, L.; Bao, S.; Mao, J.; Tang, D. Fault detection and diagnosis based on sparse PCA and two-level contribution plots. Industrial & Engineering Chemistry Research 2016, 56, 225–240. (9) Luo, L.; Bao, S.; Mao, J.; Ding, Z. Industrial Process Monitoring Based on Knowledge– Data Integrated Sparse Model and Two-Level Deviation Magnitude Plots. Industrial & Engineering Chemistry Research 2018, 57, 611–622. (10) Bao, S.; Luo, L.; Mao, J.; Tang, D. Improved fault detection and diagnosis using sparse global-local preserving projections. Journal of Process Control 2016, 47, 121–135. (11) Chiang, L. H.; Kotanchek, M. E.; Kordon, A. K. Fault diagnosis based on Fisher discriminant analysis and support vector machines. Computers & Chemical engineering 2004, 28, 1389–1401. (12) Verron, S.; Li, J.; Tiplica, T. Fault detection and isolation of faults in a multivariate process with Bayesian network. Journal of Process Control 2010, 20, 902–911. (13) Gonzalez, R.; Huang, B.; Lau, E. Process monitoring using kernel density estimation

31

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

and Bayesian networking with an industrial case study. ISA Transactions 2015, 58, 330–347. (14) Zeng, J.; Luo, S.; Cai, J.; Kruger, U.; Xie, L. Nonparametric Density Estimation of Hierarchical Probabilistic Graph Models for Assumption-Free Monitoring. Industrial & Engineering Chemistry Research 2017, 56, 1278–1287. (15) Yang, F.; Xiao, D. Progress in root cause and fault propagation analysis of large-scale industrial processes. Journal of Control Science and Engineering 2012, 2012. (16) Ahmed, U.; Ha, D.; An, J.; Zahid, U.; Han, C. Fault propagation path estimation in NGL fractionation process using principal component analysis. Chemometrics and Intelligent Laboratory Systems 2017, 162, 73–82. (17) Koller, D.; Friedman, N. Probabilistic graphical models: principles and techniques; MIT press, 2009. (18) Tibshirani, R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 1996, 267–288. (19) Friedman, J.; Hastie, T.; Tibshirani, R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 2008, 9, 432–441. (20) Lake, B.; Tenenbaum, J. Discovering structure by learning sparse graphs. Cognitive Science Society, Inc. 2010, (21) Lee, S.-I.; Ganapathi, V.; Koller, D. Efficient structure learning of markov networks using l_1-regularization. Advances in Neural Information Processing Systems. 2007; pp 817–824. (22) Friedman, J.; Hastie, T.; Tibshirani, R. Applications of the lasso and grouped lasso to the estimation of sparse graphical models; 2010.

32

ACS Paragon Plus Environment

Page 32 of 35

Page 33 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

(23) Mazumder, R.; Hastie, T. The graphical lasso: New insights and alternatives. Electronic Journal of Statistics 2012, 6, 2125. (24) Dobra, A.; Hans, C.; Jones, B.; Nevins, J. R.; Yao, G.; West, M. Sparse graphical models for exploring gene expression data. Journal of Multivariate Analysis 2004, 90, 196–212. (25) Silverman, B. W. Density estimation for statistics and data analysis; Routledge, 2018. (26) Olson, D. L.; Delen, D. Advanced data mining techniques; Springer Science & Business Media, 2008. (27) Downs, J.; Vogel, E. A plant-wide industrial process control problem. Computers & Chemical Engineering 1993, 17, 245 – 255, Industrial challenge problems in process control. (28) Bathelt, A.; Ricker, N. L.; Jelali, M. Revision of the Tennessee eastman process model. IFAC-PapersOnLine 2015, 48, 309–314. (29) Ricker, N. L. Decentralized control of the Tennessee Eastman challenge process. Journal of Process Control 1996, 6, 205–221. (30) Rato, T. J.; Reis, M. S. Fault detection in the Tennessee Eastman benchmark process using dynamic principal components analysis based on decorrelated residuals (DPCADR). Chemometrics and Intelligent Laboratory Systems 2013, 125, 101–108.

33

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Graphical TOC Entry

34

ACS Paragon Plus Environment

Page 34 of 35

Page 35 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41

Industrial & Engineering Chemistry Research

ACS Paragon Plus Environment