Article Cite This: Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
pubs.acs.org/IECR
Probability Density Estimation and Bayesian Causal Analysis Based Fault Detection and Root Identification Xiaolu Chen, Jing Wang,* and JingLin Zhou The College of Information Science and Technology, Beijing University of Chemical Technology, 100029 Beijing, China
Downloaded via RMIT UNIV on October 21, 2018 at 08:04:02 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.
S Supporting Information *
ABSTRACT: Fault detection and diagnosis, as an important means to ensure industrial safety and profitability, has been given much attention. The traditional Bayesian network (BN) that is a typical graphical model has many applications in this area, but it has great limitations in the processing of continuous variables. Based on the BN model of system causal structure, this paper proposes the kernel density estimation (KDE) method to estimate the probability density function instead of the parameter learning of the traditional Bayesian network. In addition, the evaluation index of estimation quality as a test standard is strictly deduced for ensuring the accuracy of the model. Anomalous process behavior can be detected and diagnosed by examining the changes of the probability density. The improved method is more convenient than the traditional BN when dealing with the process data, since there is no need to do discretization or the Gaussian assumption. Industrial simulation experiments show that the proposed method can accurately detect system faults and trace back to the source of faults.
1. INTRODUCTION With the rapid development of modern industrial systems, the systematic complexity increases constantly. It has great significance to monitor the process effectively and deal with abnormal conditions in a timely manner in order to ensure the safety and stability of the industrial production process. Therefore, a reliable fault detection and diagnosis method has become an active and challenging research topic. In addition, there is a strong correlation between the equipment and systems of the chemical process. The fault usually appears as a disturbance somewhere in the system, which is transmitted to other parts of the process along with the logistics and connections.1 Therefore, the research on the transmission mechanism of the process fault and the fundamental reasons for the fault have drawn increasing academic attention. At present, the available fault detection and diagnosis methods can be divided into three categories: analytical modelbased method, knowledge-based method, and data-driven method.2 For the complex and multivariable systems, the analytical model-based method has great limitations because it is very difficult to build an accurate analytical model. Pure knowledge-based approaches use qualitative models to obtain process monitoring metrics, but the accuracy of diagnosis depends on the richness of the knowledge base. The datadriven approach continuously monitors the system status by analyzing the process data. It can detect the occurrence of the fault in time and even isolate faults.3 The data-driven fault detection and diagnosis method shows obvious advantages over the other two methods. Many data treatment and analysis techniques, such as machine learning, multivariate statistics, © XXXX American Chemical Society
and signal processing, can be used for fault detection and diagnosis. The popular multivariate statistics methods include principal component analysis (PCA),4,5 independent component analysis (ICA),6 Fisher discriminant analysis (FDA),7 and partial least-squares (PLS).8 They rely too heavily on the process data to focus on the intrinsic characteristics and connections among the system variables. The conceptual simplicity of the multivariate statistics framework may lead to the difficulties in accurately diagnosing faulty variables and operating units. In order to avoid the above-mentioned disadvantages of the multivariate statistical method, the graphical models in the machine learning method can provide a more clear system structural description because it covers the cause−effect relationships between operating units and process variables. Among various graphical models, the probabilistic graphical model has greater advantages than the deterministic graphical model due to its easy applicability to an uncertain system and without any requirement for the database of each potential fault or in-depth knowledge of possible fault conditions. The probabilistic graphical model uses a causal structure and probability parameters to describe the system.9 The most representative probabilistic graphical model method is the Bayesian network (BN) which uses a directed acyclic graph (DAG) to describe a set of random variables and their Received: Revised: Accepted: Published: A
July 3, 2018 October 9, 2018 October 10, 2018 October 10, 2018 DOI: 10.1021/acs.iecr.8b03009 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research conditional dependencies.10 The causal structure and probability parameters of the BN are combined to give a qualitative and quantitative description of the system. The BN has successful application in various fields, such as control performance diagnosis,11 root cause identification,12 process pattern recognition, and process monitoring.13 However, the BN has some obstacles when dealing with the continuous random variables, because the estimation of highdimensional joint/conditional densities is nontrivial. Most of the existing work focuses on the stochastic process based on discrete variables. There are only two types of processing techniques for dealing with the continuous variables. The indirect method is to discretize the continuous variables, which will inevitably lead to the loss of information.14,15 The direct method should assume that the set of random variables are a multivariate Gaussian distribution. This addition of strong hypotheses makes the results less convincing.16,17 In fact, the main difficulty in establishing a complete graphical model from the continuous variables is not the causal structure but the probability dependencies. The literature18−20 has proposed different types of graphical models to construct the topology structure. There is also some progress in estimating the conditional probability of the probability graphical model. The literature21 proposes to estimate the probability density based on the kernel density estimation (KDE). The high-dimension probability density reduces byproducts of a low-dimensional probability density. This method can be applied to small-scale systems. The literature22 expands the work of the literature21 and adopts the hierarchical concept to simplify the process of estimating the highdimensional probability density. The literature23 combines the Bayesian network and the multivariate KDE method for the fault monitoring of the industrial process. However, it takes a long time to train the model due to the learning mechanism of the Bayesian network. It is worth mentioning that none of these pieces of literature consider the accurate performance of the KDE method, although the estimation accuracy of the probability dependencies is the basis for the availability of a probabilistic graphical model. The lack of evaluation index of estimation quality is not sufficient for all the relevant follow-up research. This paper expands our previous work24 from the discrete random variables to the continuous random variables. Besides using the causal analysis method instead of structure learning of the BN to establish the topological structure of the model, this paper further proposes to estimate the model probability dependencies among the process variables based on the KDE method. The conditional probability density is obtained from a combination of low-dimensional probability density and highdimensional probability density. This nonparametric estimation method can directly estimate the probability density of continuous variables and avoid the limitations of the traditional Gaussian assumption. More specifically, the evaluation index for the estimation quality of KDE is strictly deduced based on the definition and the nature of the KDE. The proposed method does not have any restrictions, such as linear, nonlinear, or distribution function. It can establish an accurate causal probability graphical model to detect faults and find the root cause of faults. This paper is organized as follows. Section 2 is an overall framework of the proposed method. Section 3 is the methodology which includes the establishment of the causal structure, the estimation principles of the conditional
probability density based on KDE, evaluation criteria, and the detailed implementation progress when applied to fault diagnosis. For simplification, the detailed derivation of the evaluation criteria is shown in the Appendix in the Supporting Information. Section 4 is the verification of the proposed method for an TE benchmark industrial process. Finally, Section 5 draws the conclusions.
2. OVERALL FRAMEWORK Most of the sampling data from the complex industrial processes are the continuous variables, so the traditional BN
Figure 1. Overall framework.
structure learning is not applicable due to their limitations on the discrete variables. Here, we consider to establish a system graphical model that contains structure and parameters similar to the BN but avoiding the discretization or Gaussian distribution assumption. The structure is determined by the causality between the operating units, which is the qualitative expression of the industrial system. The parameters, that is, the causality probabilities, are obtained by using a nonparametric estimation method; thereby, they quantitatively represent the relationship among the process variables. The evaluation index of the probability relationship estimation is given to guarantee the accuracy of the graphical model. Then the fault detection and root cause diagnosis is performed based on the proposed graphical model. It should be noted that the fault detection here only needs to depend on the system parameters, and the fault diagnosis requires a complete model because the root cause of the fault is obtained by network reasoning. The overall framework of the proposed method can be expressed as Figure 1.
3. METHODOLOGY Establish Causal Structure. The key task of establishing a graphical model is to determine the system causal structure. Here we briefly introduce the multivariate post−nonlinear acyclic causal model to determine causality among multiple variables. The detail can be found in our previous work.24 Assume that a directed acyclic graph (DAG) is used to represent the relationship among the multiple observed variables. Randomly select a pair of variables Xi and Xj, i, j = {1, 2,···,n} from a multivariable system, respectively. If Xi is Xj’s parent node and its data generating process can be described in a post-nonlinear (PNL) mixing model: Xj = f j,2(f j,1(Xi) + ej), B
DOI: 10.1021/acs.iecr.8b03009 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
conditional probability density function among the causaleffect variables. The nonparametric probability density estimate method used to determine the parameters includes one-dimensional and multidimensional probability density functions. The conditional probability density can be calculated by the joint probability density and one-dimensional probability density. KDE is a prominent method for estimating the nonparametric probability density. The main advantage of the KDE method is that it gives an explicit form of the density function. Let X1,X2,X3,···,Xn be a set of samples of the random variable X; X has an unknown density function f(x),x ∈ R. The distribution density function f(x) can be derived from its corresponding cumulative distribution function F(x),
Table 1. Common Kernel Functions number 1
kernel function
Expression
Uniform
1 I(|u| 2
(1 − |u|)I(|u| ≤ 1)
2
Triangle
3
Gaussian
4
Epanechnikov
1 2π 3 (1 4
≤ 1)
(
1
exp − 2 μ2
)
− μ2 )I(|u| ≤ 1)
f i,1 denotes the nonlinear effect of the causes, and f i,2 denotes the invertible postnonlinear distortion in variable Xi. ei is the independent disturbance. Consider the hypothesis test and use the nonlinear independent component analysis (ICA) method to solve this problem. In a simple term, there are two steps to examine the possible causal relationships between variables: (1) Recover the disturbance ej corresponding to the assumed causal relation Xi → Xj based on the constrained nonlinear ICA. (2) Verify if the estimated disturbance ej is independent from the assume cause Xi based on the statistical independence tests. For any two variables, the causal assumptions are made in the forward and reverse directions, then the cause and effect direction is determined by comparing the statistics. After n(n − 1) times of hypotheses and tests, the causality of different variables can be finally determined. This multivariate postnonlinear acyclic causal modeling method can effectively replace the traditional BN structure learning to establish the causal topology of the system. Nonparametric Probability Density Estimation. The parameters in the probability graphical model measure the connection strength of causality. They are expressed by the probability density function of the variable itself and the
f (x ) =
dF(x) F(x + h) − F(x − h) ≈ dx 2h
(1)
Here, h > 0 is the window width. Use the empirical distribution 1 function Fn(x) = n ∑i I(Xi ≤ x) to estimate F(x), and substitute it into eq 1, dF(x) F(x + h) − F(x − h) ≈ 2h dx 1 = ∑ I(x − h < Xi ≤ x + h) 2nh i
f ̂ (x ) =
=
1 nh
i Xi − x yz zz k h {
∑ K 0jjj i
(2)
Equation 2 gives the KDE for f(x) with a window width h and 1 a kernel function K 0 = 2 I(|u| ≤ 1). The more general kernel density estimate is defined as
Figure 2. Flowchart for detecting and tracing faults. C
DOI: 10.1021/acs.iecr.8b03009 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research f ̂ (x ) =
1 nh
i Xi − x yz zz k h {
∑ K jjj n i
n
∑ i=1
1 ijj x − Xi y − Yi yzz Kj , z h1h2 jjk h1 h2 zz{
f (x , y ) f (x )
Var{f ̂ (x)} = E[f ̂ (x)]2 − [Ef ̂ (x)]2
(7)
The probability density function of the child nodes in the causal model is multidimensional. Here taking the twodimensions function as an example, the deviation and variance of the kernel density estimate f(x,y) are defined as Bias{f ̂ (x , y)} = E[f ̂ (x , y)] − f (x , y)
(8)
Var{f ̂ (x , y)} = E[f ̂ (x , y)]2 − [Ef ̂ (x , y)]2
(9)
Introduce the mean integral square error (MISE) to evaluate the estimation quality. The MISE index can test the difference between the estimated probability density function and the true function and ensure that the kernel estimation has better fitting and smoothness. One-dimensional MISE is defined as ÄÅ ÉÑ Å Ñ MISE[f ̂ (x)] = EÅÅÅ {f ̂ (x) − f (x)}2 dx ÑÑÑ (10) ÇÅÅ ÖÑÑ
(4)
where h1 and h2 are the window width corresponding to the cause variable x and the effect variable y, respectively. According to the conditional probability calculation formula, the conditional density f(y|x) is as follows: f (y|x) =
(6)
(3)
̂ where f(x) represents the estimate of the probability density function, n is the number of samples, h is the window width, and K is the kernel function. Furthermore, in the causal graphical model, it is necessary to estimate the conditional density of random multivariate causal variable sets. Similarly, Let X1,X2,X3,···,Xn and Y1,Y2,Y3,···,Yn be a set of samples of a random cause vector X and a random effect vector Y, respectively. The joint probability density of x and y is defined as 1 f ̂ (x , y ) = n
Bias{f ̂ (x)} = E[f ̂ (x)] − f (x)
∫
(5)
Two-dimensional MISE is defined as
When estimating the unknown probability density function of a random variable, there are many options for the kernel function.25 Several common kernel functions are shown in Table 1. However, the preconditions for the selection of the kernel function should be considered: (1) Non-negative: K(x) ≥ 0,x ∈ R; (2) Symmetry: K(x) = K(−x),x ∈ R; (3) Normality: ∫ +∞ −∞K(x) dx = 1. According to the definition of KDE, f(x) is related not only to the size of samples n but also to the selection of kernel function K and window width h. Therefore, under a fixed number of samples n, K, and h are crucial to the accuracy of the system model parameters. It directly related to the effectiveness of fault detection and root cause diagnosis. In order to visually demonstrate the estimation quality of the probability density function, we give the evaluation criteria in the next section. The literature26 has shown that the choice of different kernel functions has little effect on the results of kernel density estimation, so we will not consider it here. Evaluation Index of Estimation Quality. Consider two cases directly from the definition of kernel density: (1) assume the window width h value is large. The details of density function will be submerged due to the average compressed transform x − Xi . This makes the density estimate curve too h smooth and results in the relatively lower resolution. The estimation deviation is too large; (2) assume the window width h value is small. On the contrary, the influence of randomness will increase, and the important characteristics of density will be buried. As a result, the fluctuation of the density estimate becomes larger and the stability is worse. The estimation variance is too large. An accurate estimation should not only be close to the true value but should also be stable in the different observations. These two properties can be described by the estimated deviations and variances.27 The probability density function of the root variable in the causal model is one-dimensional; that is, the deviation and variance of the kernel density estimate f(x) are defined as
MISE[f ̂ (x , y)] = E
∫ ∫ [f ̂ (x , y) − f (x , y)]2 dxdy
(11)
Making full use of the nature of the kernel function, the details of the derivation process are presented in the Supporting Information. Here we just give the ultimate results, MISE[f ̂ (x)] = =
∫ Var(f ̂ (x))dx + ∫ Bias2(f ̂ (x))dx
Ä ÑÉÑ 1 ÅÄÅ ÑÉÑ2 1 ÅÅÅ ÅÅ K 2(t )dt ÑÑÑ + h4ÅÅÅ t 2K (t )dt ÑÑÑ Å Ñ Å ÑÑÖ ÑÖ 4 ÅÇ nh ÅÇ
∫
∫
∫ [f ″(x)]2 dx (12)
MISE[f ̂ (x , y)] = +
1 nh1h2
∫ K2(t )dt
ÉÑ2 1 4 4ÄÅÅÅ 2 2 Ñ h1 h2 ÅÅ t K (t )dt ÑÑÑ ÅÅÇ ÑÑÖ 4
∫
∫ ∫ (∇f (x , y))2 dxdy
(13)
In the process of simplification, the infinitesimal terms are ignored and the error evaluation criterion (MISE) are finally obtained. It is found that the values of the functions ∫ K2(t) dt and ∫ t2K(t) dt are directly related to the kernel function by observing eqs 12 and 13. When the kernel function is given, the values of ∫ K2(t)dt and ∫ t2K(t)dt can be calculated. Besides the accuracy of the KDE strongly depends on the optimal window width under the constraint of limited samples. Furthermore, eqs 12 and 13 can be used as an optimization index to find an appropriate window width. The ideal value of window width can be obtained by minimizing the approximate MISE. For one-dimensional ̂ probability density, let d(MISE[f(x)])/dh = 0; then
hopt
1/5 | l o o 2 o o K ( t ) dt ∫ o o o o =m } ÄÅ 2 ÉÑ2 o o o o 2 Å Ñ o o nÅÅÅÇ∫ t K (t )dtÑÑÑÖ ∫ f ″(x) dx o o n ~
(14)
For two-dimensional probability density, let D
DOI: 10.1021/acs.iecr.8b03009 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research ∂MISE[f ̂ (x , y)] = h13h 24 ∂h1 −
−
2
1 nh12h 2
=0 ̂ ∂MISE[f (x , y)] = h 23h14 ∂h 2
2
λ1 encourages ”the sparsity in the coefficients”, and λ2 encourages ”the sparsity in their differences”. These two terms are added to keep the trade off between the modeling accuracy and denoising performance. When λ1 = 0, the FLSA performs a total variation denoising of the signal. A dynamic programming algorithm is used to solve the optimization problem. The approach begins with the observation that when λ1 = 0, eq 17 is the complete data likelihood of a particular Hidden Markov Model (HMM). In this HMM, the β′s takes the role of the hidden states, and their most probable value may be found via the Viterbi algorithm (a dynamic programming algorithm). The HMM posits an emission probability Pr(yk|βk) that is a standard normal distribution and a transition probability Pr(βk+1|βk) that is double exponential with parameter λ2 (where Pr() denotes probability). The dynamic programming algorithms for HMMs are well-known, excellent descriptions of the Viterbi algorithm available in refs 29 and 30. It is helpful to rewrite the objective in eq 17 into a more general form:
(∫ t K(t)dt) ∫ ∫ (∇f (x , y)) dxdy 2
∫ K2(t )dt 2
(∫ t K(t)dt) ∫ ∫ (∇f (x , y)) dxdy 2
1 nh 22h1
2
∫ K2(t )dt
=0 (15)
So we have h1opt
h2opt
1/5 | l o o 2 o o ( ) K t dt ∫ o o o o =m } 2 o o o o 5 2 2 o o nh2 ∫ t K (t )dt ∫ ∫ (∇f (x , y)) dxdy o o n ~
(
)
1/5 | l o o 2 o o K ( t ) dt ∫ o o o o =m } 2 o o o o 5 2 2 o o nh1 ∫ t K (t )dt ∫ ∫ (∇f (x , y)) dxdy o o n ~
(
)
N
N
∑ ek(βk) − λ 2 ∑ d(βk , βk− 1) k=1
(16)
where
Once the kernel function is predetermined, ∫ K2(t )dt 2
(∫ t 2K(t )dt )
R
ek(b) =
= C(k) is a constant. We can use the optimization
N
∑ (yk k=1
∑ yik vi(b)
(19)
i=1
algorithm to get the optimal window width. In the calculation, f(x) and f(x,y) are the true probability density functions to be estimated, and its specific form is unknown. So we substitute eq 3 for eq 14 and eq 4 for eq 16, that the estimated probability density replaces the true probability density to calculate the optimal window width. Dynamic Threshold for the Fault Detection. Theoretically, the KDE distribution under the normal conditions is different from that under the fault operation. How to divide their difference? One can locate a fault using the KDE by setting a threshold value for likelihood. It is not feasible to distinguish the fault directly with the confidence interval of the normal state. Because the distribution of data is not ideal in the actual process even in the normal operation, its confidence is not completely described as a constant horizontal line. The real data usually is accompanied by a lot of noise. When a fault occurs, the constant confidence line cannot distinguish the normal state and the fault state. The fused lasso method is useful to set the desired rejection cutoff value for each node based on the normal data. Because the FLSA gives a design for dynamic confidence limit, by evaluating the KDE function for all the data points, one can obtain information about the typical likelihood values. We briefly introduce the fused lasso method. The Fused Lasso Signal Approximator (FLSA)28 aims at eliminating noise and smoothing data. The real-valued observations y = βx are obtained by finding the sequence β1,···, βN that minimizes the criterion: 1 2
(18)
k=2
N
Denote the variable sequences (x1, x2,···, xk) as the shorthand x1:k. Rewrite the maximization of the criterion in eq 18 as follows: ÄÅ N ÉÑ N ÑÑ ÅÅÅ Ñ max β1:NÅÅÅ∑ ek(βk ) − λ 2 ∑ d(βk , βk − 1)ÑÑÑ ÅÅ ÑÑ ÅÇ k = 1 ÑÖ k=2 ÄÅ N − 1 ÅÅ Å =max βN[eN (βN )] + max β1:(N −1)ÅÅÅ ∑ ek(βk ) ÅÅ ÅÇ k = 1 ÉÑ N ÑÑ Ñ − λ 2 ∑ d(βk , βk − 1)ÑÑÑ ÑÑ k=2 ÑÖ ÅÄÅ N − 1 ÑÉÑ N ÅÅ ÑÑ Å fN (βN ) ≔ max β1:(N −1)ÅÅ ∑ ek(βk ) − λ 2 ∑ d(βk , βk − 1)ÑÑÑ ÅÅ ÑÑ k=2 ÅÇ k = 1 ÑÖ =max βN −1[eN − 1(βN − 1) + λ 2d(βN , βN − 1)] ÅÄÅ N − 2 ÑÉÑ N−1 ÅÅ ÑÑ Å + max β1:(N − 2)ÅÅ ∑ ek(βk ) − λ 2 ∑ d(βk , βk − 1)ÑÑÑ ÅÅ ÑÑ ÅÇ k = 1 ÑÖ k=2 (20)
The definitions of functions f N−1(βN−1), f N−2(βN−2), ..., f 2(β2) are similar to f N(βN). The maximization problem is solved further iteratively. Intermediate functions are introduced to summarize the algorithm (with k ranging from 2 to N):
N
− βk xk)2 + λ1 ∑ |βk | + λ 2 ∑ |βk − βk − 1| k=1
δ1(b) ≔ e1(b)
k=2
(17)
̃ ψk(b) ≔ argmax b[̃ δk − 1(b)̃ − λ 2|b − b|]
where λ1 and λ2 are tuning parameters and x1, ..., xN is the feature variables. 1 N ∑k = 1 (yk − βk xk)2 is the traditional index of the least2 squares algorithm. It strives for the accuracy of the model. The
fk (b) ≔ δk − 1(ψk(b)) − λ 2|b − ψk(b)| δk ≔ ek(b) + fk (b) E
(21)
DOI: 10.1021/acs.iecr.8b03009 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
Figure 3. TE industrial process flow diagram.
The functions ψk() take part in the backward pass of the algorithm. This backward pass computes β̂1,···, β̂N through a recursion identical to that of the Viterbi algorithm for HMMs:
the causal structure. It is detected whether the parent nodes have fault until the root node of the fault is found.
4. CASE SIMULATION In order to illustrate the applicability of the proposed method in the actual industrial process, we use the Tenessee Eastman
βN̂ = argmaxb{δN (b)} βk̂ = ψk + 1(βk̂ + 1) for k = N − 1, N − 2, ..., 1
(22)
Table 2. Several Variables of the TE Process
In the process of fault detection, the probability estimations based on the KDE of the process data are fed into the fused lasso denoising algorithm to be smoothing processed. It can remove the model noise and find the threshold for distinguishing between the normal and fault operation. Implementation of Fault Test and Trace Based on Causal Structure and KDEs. The following subsection explains how to combine the above methods into a welldefined framework that can be used to diagnose the abnormal events. Figure 2 provides a flowchart for fault diagnosis and backtracking.The main steps are as follows: 1. Build DAG (the cause-effect structure of the process variables) based on the process knowledge and historic data. 2. List the probability density and conditional probability density that needs to be estimated according to the established system structure. 3. Estimate the probability density of each node based on KDE. Multidimensional joint probability density (child nodes) and one-dimensional probability density (root nodes) is used to calculate their conditional probability density. All of these probability densities are named as parameters. 4. Set the desired rejection threshold for each node based on the fused lasso method. 5. Collect validation data and detect based on the threshold value whether the fault occurs. 6. If fault occurs, reverse reasoning based on the graphical model to find the fault roots (parent nodes) according to
Variable (Symbol in Figure 3)
Physical meaning
units
x1(v5) x2(v6) x3(v7) x4(v8) x5(v9) x6(v12) x7(v20) x8(v21)
Recycle flow Reactor feed rate Reactor pressure Reactor level Reactor temperature Product separator level Compress work Reactor cooling water outlet temperature
km3/h km3/h kPa % °C % KW °C
Figure 4. Causal structure of partial TE process.
(TE) process as experiment object. The TE platform simulates an actual chemical process, as shown in Figure3, which is widely used as a benchmark to test the process control and fault diagnosis technology. The complete TE process contains F
DOI: 10.1021/acs.iecr.8b03009 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
Figure 5. Probability of x2 for normal data.
Figure 6. Probability of x8 for normal data.
Table 3. Disturbances for the TE Process No.
Process variable
Tape
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
A/C feed ratio, B composition constant (stream 4) B composition, A/C feed ratio constant (stream 4) D feed temperature (stream 2) Reactor cooling water inlet temperature Condenser cooling water inlet temperature A feed loss (stream 1) C header pressure loss−reduced availability (stream 4) A, B, C feed composition (stream 4) D feed temperature (stream 2) C feed temperature (stream 4) Reactor cooling water inlet temperature Condenser cooling water inlet temperature Reaction kinetics Reactor cooling water valve Condenser cooling water valve Unknown Unknown Unknown Unknown Unknown
Step Step Step Step Step Step Step Random Random Random Random Random Slow drift Sticking Sticking Unknown Unknown Unknown Unknown Unknown
a total of 52 variables. In order to simplify the complexity, select 8 variables in the reactor module to build its causal structure, and the physical meaning of each variable is listed in Table 2. According to the modeling method mentioned in the previous section, the causal direction between 8 variables is determined, and the corresponding DAG is shown in Figure 4. First, list all the nodes that need to estimate probability density and conditional probability density based on the causal
Figure 7. Conditional probability of x7 under x5.
structure. Using the evaluation index mentioned above, we can optimize the window width and get the optimal probability density estimation. As shown in Figure 4, there are two root nodes x2 and x8 whose KDE is one-dimensional. The other nodes are affected by their parent nodes, so their conditional G
DOI: 10.1021/acs.iecr.8b03009 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
Figure 8. Conditional probability densities of x5 under x8 and x5 under x2,x2,x8.
conditional probability of x7 after about 480 samples has exceeded the normal limit. A fault has occurred. What is the root cause of failure in x7? We can reverse reasoning based on the established causal structure−parameter model. The specific operation is starting from the fault variable, calculate the probability density of its parent node from the bottom to the top. Compare the probability density curve under the normal and fault conditions to determine whether there is a fault. Continue this step until finding the root cause of the fault. In order to conversely infer the roots of the fault x7, it is necessary to calculate f(x5|x8), f(x5|x2), f(x2), and f(x8) separately. Simulation results are shown in Figure 8. It is shown that x5 and x8 detected the occurrence of the fault, but x2 did not. It is not difficult to find that the fault is propagated from x8 to x5 and then propagated to x7. This result indicates that the root of the fault is x8. The physical meaning of the x8 variable is the temperature of the cooling water, and fault No. 4 is the step change in the cooling water temperature. This result confirmed the industrial reality, which also proves the accuracy of the proposed method.
probability densities need to be estimated. In total, we need to estimate f(x2), f(x8), f(x4|x2), f(x5|x8), f(x7|x5), f(x3|x5), f(x1|x3), and f(x6|x3) from top to bottom. Next, the KDE parameters of the model are trained under the normal data of 960 samples. Taking x2 and x8 as examples, the estimated probability density functions are shown in Figure 5(a) and Figure 6(a), respectively. At each sampling instant, there is a corresponding value for the variable. We substitute this value into the distribution that this variable satisfies. The obtained probability density value is defined as the probability density function at this time. In Figure 5(b) is shown the threedimensional relationship between time, variable, and probability density. In order to see the abnormal information directly, we plotted the relationship of probability density values for each time. The corresponding probability density curve at each sample is shown in Figure 5(c) and Figure 6(b). The curve of Figure 5(c) is not a direct relationship between the probability density and time but implies the probability density corresponding to the current variable at this moment. Then, the fault data is used to test the fault detection capabilities of the proposed method based on the complete probability graphical model. Twenty process disturbances in Table 3 are used to simulate the abnormal operating status.31 In order to check the sensitivity of the proposed method to detect minor faults, fault No. 4 is used as test sampling; that is, there was a step change in the reactor cooling water. Select 480 samples of fault data and combine the normal data of 480 samples to reform a data set of 960 samples. Here the variable x7 is selected as the experimental object to test whether there is a fault. According to the causal structure, it is easy to see that x7 is directly related to x5. Here x5 is the parent node of x7, so we need to calculate the conditional probability density f(x7| x5). Figure 7(a) is the conditional probability density function of x7 under the influence of x5, which is a three-dimensional graph. The x-axis and y-axis represent the x5 and x7 variables, respectively, and the z-axis represents the conditional probability density function f(x7|x5). Figure 7(b) shows the variation of probability density with sampling time for normal data and fault data. The first 480 samples are normal data and the remaining is fault data. The obtained KDE estimations, as a rough signal, are denoised and recovered based on the fused lasso method. The crossed line in Figure 7(b) represents the recovered KDE after denoising. It can be clearly seen that the
5. CONCLUSION This article proposes a probability graphical model directly obtained from the continuous process variables for fault detection and fault backtracking. The model structure is determined by causality. The parameters in the model are determined by KDE. For the effected variables, the conditional probability density is calculated from the high-dimension probability density and low-dimension probability density. In order to judge the accuracy of the kernel density estimation, we have rigorously deduced the evaluation index of the estimated quality. Furthermore, the dynamic threshold for the probability density detection is constructed based on the fused lasso algorithm. Through the simulation experiment on the TE process, we not only detected the occurrence of the fault but also successfully found the root cause of the fault.
■
ASSOCIATED CONTENT
S Supporting Information *
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.iecr.8b03009. Details of the MISE derivation (PDF) H
DOI: 10.1021/acs.iecr.8b03009 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
■
(17) Mcgeachie, M. J.; Chang, H. H.; Weiss, S. T. CGBayesNets: conditional Gaussian Bayesian network learning and inference with mixed discrete and continuous data. PLoS Comput. Biol. 2014, 10, e1003676. (18) Badoual, A.; Novara, P.; Romani, L.; Schmitter, D.; Unser, M. A Non-Stationary Subdivision Scheme for the Construction of Deformable Models with Sphere-Like Topology. Graphical Models 2017, 94, 38−51. (19) Yang, Z. Structure Learning of Probabilistic Graphical Models: A Comprehensive Survey. Eprint Arxiv 2011, 94, 38−51. (20) Ben Ishak, M.; Leray, P.; Ben Amor, N. A two-way approach for Probabilistic Graphical Models structure learning and ontology enrichment. International Conference on Knowledge Engineering and Ontology Development 2011, 189−194. (21) Chen, Q.; Kruger, U.; Leung, A. T. Regularised kernel density estimation for clustered process data. Control Engineering Practice 2004, 12, 267−274. (22) Zeng, J.; Luo, S.; Cai, J.; Kruger, U.; Xie, L. Nonparametric density estimation of hierarchical probabilistic graph models for assumption-free monitoring. Ind. Eng. Chem. Res. 2017, 56, 1278− 1287. (23) Gonzalez, R.; Huang, B.; Lau, E. Process monitoring using kernel density estimation and Bayesian networking with an industrial case study. ISA Trans. 2015, 58, 330−338. (24) Chen, X.; Wang, J.; Zhou, J. Process Monitoring Based on Multivariate Causality Analysis and Probability Inference. IEEE Access 2018, 6, 6360−6369. (25) Nello, C. Kernel Function; John Wiley & Sons, Inc., 2004. (26) Dehnad, K. Density Estimation for Statistics and Data Analysis. Technometrics 1987, 29, 495−495. (27) Jaynes, E. T.; Bretthorst, G. L. Probability theory: the logic of science; Cambridge University Press, 2003. (28) Friedman, J.; Hastie, T.; Höfling, H.; Tibshirani, R. Pathwise coordinate optimization. Annals of Applied Statistics 2007, 1, 302− 332. (29) Rabiner, L. R. A tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Readings in Speech Recognition 1990, 77, 267−296. (30) Durbin, R. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. DataBase systems and Logic Programming 1998, 10, 549−552. (31) Kulkarni, A.; Jayaraman, V. K.; Kulkarni, B. D. Knowledge incorporated support vector machines to detect faults in Tennessee Eastman Process. Comput. Chem. Eng. 2005, 29, 2128−2133.
AUTHOR INFORMATION
Corresponding Author
*E-mail:
[email protected]. ORCID
Jing Wang: 0000-0002-6847-8452 JingLin Zhou: 0000-0003-1589-7423 Notes
The authors declare no competing financial interest.
■
ACKNOWLEDGMENTS This work is supported by the National Natural Science Foundation of China (No. 61573050, 61473025) and the open-project grant funded by the State Key Laboratory of Synthetical Automation for Process Industry at the Northeastern University (No. PAL-N201702).
■
REFERENCES
(1) Wu Bin, L. Q.; Yu, C. Process industry fault diagnosis; Science Press: Beijing, 2012. (2) Zhao, S.; Huang, B.; Liu, F. Fault Detection and Diagnosis of Multiple-Model Systems With Mismodeled Transition Probabilities. IEEE Transactions on Industrial Electronics 2015, 62, 5063−5071. (3) Yin, S.; Yang, C.; Zhang, J.; Jiang, Y. A Data-Driven Learning Approach for Nonlinear Process Monitoring Based on Available Sensing Measurements. IEEE Trans. Ind. Electron. 2017, 64, 643−653. (4) Lou, Z.; Shen, D.; Wang, Y. Two-step principal component analysis for dynamic processes monitoring. Can. J. Chem. Eng. 2018, 96, 160−170. (5) Jiang, Q.; Yan, X.; Li, J. PCA-ICA Integrated with Bayesian Method for Non-Gaussian Fault Diagnosis. Ind. Eng. Chem. Res. 2016, 55, 4979−4986. (6) Ge, Z.; Xie, L.; Kruger, U.; Song, Z. Local ICA for multivariate statistical fault diagnosis in systems with unknown signal and error distributions. AIChE J. 2012, 58, 2357−2372. (7) Yu, J. Localized Fisher discriminant analysis based complex chemical process monitoring. AIChE J. 2011, 57, 1817−1828. (8) He, S.; Wang, Y.; Liu, C. Modified partial least square for diagnosing key-performance-indicator-related faults. Can. J. Chem. Eng. 2018, 96, 444−454. (9) Wan, J.; Zabaras, N. A probabilistic graphical model based stochastic input model construction. J. Comput. Phys. 2014, 272, 664− 685. (10) Bensi, M.; Kiureghian, A. D.; Straub, D. Efficient Bayesian network modeling of systems. Reliability Engineering & System Safety 2013, 112, 200−213. (11) Namaki Shoushtari, O.; Huang, B. Bayesian Control Loop Diagnosis by Combining Historical Data and Process Knowledge of Fault Signatures. IEEE Trans. Ind. Electron. 2015, 62, 3696−3704. (12) Yu, J.; Rashid, M. M. A novel dynamic bayesian network-based networked process monitoring approach for fault detection, propagation identification, and root cause diagnosis. AIChE J. 2013, 59, 2348−2365. (13) Suk, H. I.; Sin, B. K.; Lee, S. W. Hand gesture recognition based on dynamic Bayesian network framework. Pattern Recognition 2010, 43, 3059−3072. (14) Song, D.; Ek, C. H.; Huebner, K.; Kragic, D. Multivariate discretization for Bayesian Network structure learning in robot grasping. IEEE International Conference on Robotics and Automation 2011, 1944−1950. (15) Zwirglmaier, K.; Straub, D. A discretization procedure for rare events in Bayesian networks. Reliability Engineering & System Safety 2016, 153, 96−109. (16) Huang, S.; Li, J.; Ye, J.; Fleisher, A.; Chen, K.; Wu, T.; Reiman, E. A sparse structure learning algorithm for Gaussian Bayesian Network identification from high-dimensional data. IEEE Transactions on Pattern Analysis & Machine Intelligence 2013, 35, 1328−1342. I
DOI: 10.1021/acs.iecr.8b03009 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX