Nonparametric Density Estimation of Hierarchical Probabilistic Graph

Jan 3, 2017 - Probabilistic graphical models, such as Bayesian networks, have recently gained attention in process monitoring and fault diagnosis. The...
0 downloads 9 Views 3MB Size
Article pubs.acs.org/IECR

Nonparametric Density Estimation of Hierarchical Probabilistic Graph Models for Assumption-Free Monitoring Jiusun Zeng,† Shihua Luo,‡ Jinhui Cai,*,† Uwe Kruger,*,¶ and Lei Xie§ †

College of Metrology & Measurement Engineering, China Jiliang University, Hangzhou 310018, China School of Statistics, Jiangxi University of Finance and Economics, Nanchang, 330013, China ¶ Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, New York 12180-3590, United States § National Laboratory of Industrial Control Technology, Zhejiang University, Hangzhou 310027, China ‡

ABSTRACT: Probabilistic graphical models, such as Bayesian networks, have recently gained attention in process monitoring and fault diagnosis. Their application, however, is limited to discrete or continuous Gaussian distributed variables, which results from the difficulty in efficiently estimating multivariate density functions. This article shows that decomposing the graphical model into a hierarchical structure reduces estimating a multivariate density function to the estimation of low-dimensional/conditional probabilities. These conditional density functions can be effectively estimated from data using a nonparametric kernel method and the low-dimensional densities can be estimated using a kernel density estimation (KDE). On the basis of the estimated densities, anomalous process behavior can be detected and diagnosed by examining which probability is lower than its corresponding confidence limit. Applications to simulated examples and an industrial blast furnace iron-making process show that the proposed method is more sensitive and conceptually simpler than competitive methods for monitoring nonlinear variable interrelationships and/or nonGaussian distributed process variables. and reconstruction methods have been proposed,2 they suffer from the so-called “smearing effect”3 or numerical inaccuracies.4 In recent years, graphical models5 have emerged as an alternative to MSPC, offering a more reliable fault diagnosis. A graphical model can be easily constructed using the cause− effect relationships between operating units and hence process variables. The interconnection between process units may be obtained from process flowcharts, for example piping and instrumentation or flowchart diagrams, or identified from reference data using Granger causality6 or mutual information.7 Upon constructing the topology of a graphical model, associated parameters can be estimated using the Markov Chain Monte Carlo method. The improved fault diagnosis capability of graphical models stems from the fact that they can take causal relationships among process variables into account. The literature proposed different types of graphical models. Maurya et al.8 suggested the use of a signed directed graph for the diagnosis of incipient faults. Chiang and Braatz9 used causal maps identified by the Kullback−Leibler divergence to diagnose process faults. Such deterministic graphical models work well when there is a database that encompasses each potential fault scenario or they incorporate in-depth knowledge of possible

1. INTRODUCTION Over the past few decades, process monitoring and fault diagnosis have become increasingly important aspects in the operation of processes in the chemical and petrochemical industry. This mainly results from more stringent requirements for safety and reliability, leading to the mushrooming of research activities in related subject areas during this period. Available methods can be roughly divided into model-based and data-based techniques.1 Model-based techniques usually require the availability of an accurate system model, which cannot be guaranteed for complex chemical processes. On the basis of their conceptual simplicity, multivariate statistical process control (MSPC) techniques have become popular.2 These data-based methods make use of the abundant data information in the chemical and petrochemical process industry without requiring detailed process knowledge. Popular MSPC methods include principal component analysis and partial least-squares.2 Both methods project recorded samples from the original data spaces onto low-dimensional subspaces by retaining most of the variance or covariance information. Process monitoring and fault diagnosis is then carried out on the basis of the information captured within the lowdimensional subspaces and the difference between the recorded and projected samples. Hence, MSPC methods are suitable for large-scale systems producing sizable sets of random variables. The conceptual simplicity of the MSPC framework, however, may lead to difficulties in accurately isolating and diagnosing faulty variables and operating units. Although contribution plots © 2017 American Chemical Society

Received: Revised: Accepted: Published: 1278

October 21, 2016 December 22, 2016 January 3, 2017 January 3, 2017 DOI: 10.1021/acs.iecr.6b04068 Ind. Eng. Chem. Res. 2017, 56, 1278−1287

Article

Industrial & Engineering Chemistry Research

ironmaking process. Finally, section 7 gives a concluding summary of this article.

fault conditions. Conversely, if this is not the case, these methods may not be relied upon. Compared to deterministic graphic models, probabilistic graphical models can overcome this deficiency.10 A probabilistic graphical model uses a graph-based representation and a probabilistic structure to describe the system. It is, therefore, suitable for systems under uncertainty. The most popular methods are Bayesian networks, which use a directed acyclic graph to describe a set of random variables and their conditional dependencies. Bayesian networks have proven their usefulness through applications in various fields, including control loop performance diagnosis,11 root cause identification,12 process pattern identification,13 and process monitoring.14 Despite its reported success, probabilistic graphical models may run into difficulties in the presence of continuous random variable sets, as the estimation of high dimensional joint/ conditional densities is nontrivial. Existing work discretizes continuous variables or assumes the set of random variables has a multivariate Gaussian distribution. Recently, Gonzalez et al.15 proposed reducing the high dimensional joint probability density of Bayesian networks by products of low dimensional (one or two-dimensional) densities, which can be estimated using a KDE.16 This concept works well for simple small-scale systems. However, complex large-scale systems may produce joint probability density functions that cannot be decomposed into products of low dimensional densities. This, in turn, yields density estimates of sizable variable sets, which is, again, nontrivial. This article expands upon the work in ref 15 by incorporating a direct estimate of the joint probability density. More precisely, the article shows that decomposing the variable set into a hierarchical structure yields the estimation of the joint probability densities, which can be reduced to the estimation of several conditional probability densities and low dimensional densities. These conditional probability densities are then estimated using a nonparametric kernel conditional density estimation (KCDE) method17 and the low dimensional density using KDE. On the basis of the estimated densities, anomalous process behavior can be detected and isolated using the hierarchial structure. The main contributions and benefits of the work in this article is that it does not rely on any assumptions concerning the distribution function, nor the interrelationships among the recorded variable set. This compares favorably to competitive data-based methods, which predominantly rely on assumptions concerning the variable interrelationships, that is, linear or nonlinear, and distribution functions, Gaussian or nonGaussian. In addition to that, the proposed method provides a more reliable fault diagnosis by reducing dimensionality, which is a direct result of segmenting a large variable set into small subsets that are connected by a hierarchical structure. The article is organized as follows. Section 2 provides a brief summary of probabilistic graphical models as well as the KDE and KCDE methods. Section 3 introduces the hierarchical probabilistic graph model as a monitoring tool. Section 4 then develops monitoring statistics and a fault diagnosis method. This is followed by comparing the performance of the hierarchical probabilistic graph models with competitive methods proposed in the literature using some simulation examples in section 5. Section 6 summarizes the application of the methods utilized in section 5 to an industrial blast furnace

2. PRELIMINARIES This section gives a brief summary of probabilistic graph models, kernel density estimation, and nonparametric kernel conditional density estimation in subsections 2.1, 2.2 and 2.3, respectively. These techniques form part of the proposed hierarchical monitoring concept introduced in section 3. 2.1. Probabilistic Graphical Model. Probabilistic graphical models consist of a set of nodes and directed edges. The nodes relate to process variables and edges between nodes represent probabilistic dependencies, or cause−effect relationships, among the variable set. A probabilistic graphical model offers a confidence measure for decision making, which as this article shows does render it an appropriate tool for process monitoring. The most popular method is the Bayesian network, based on a directed acyclic graph approach.18,19 This, however, limits its application to systems described by weakly connected graphs only.20 For more complex processes, a directed cyclical graph is more appropriate.21 To accommodate more general cases, it should be noted that a probabilistic graphical model does not need to be acyclic. A specific example can be seen in Figure 1 where there is a directed cycle.

Figure 1. Example of a graphical model.

Section 3 introduces a method that incorporates cyclical relationships between process variables. More precisely, the process variables are defined as nodes or vertices and are drawn as circles with the names as their labels. The directed edges, representing dependence or causal-effect relationships, are drawn by arrows between nodes. Nodes with departing arrows are called parent nodes, while those with arrows pointing toward them are child ones. Given a network structure for a set of process variables s = {s1, ..., sn}, the joint density, p(s), can be expressed as n

p(s) =

∏ p(si|pa(si)) i=1

(1)

Here, p(·|·) represents a conditional probability density function and pa(si) is the set of parent nodes for si. For discrete variables, Dirichlet priors are generally imposed and the conditional probabilities can be determined using standard methods such as maximum likelihood or Bayesian estimation.22 For continuous variables, the joint probability can be estimated 1279

DOI: 10.1021/acs.iecr.6b04068 Ind. Eng. Chem. Res. 2017, 56, 1278−1287

Article

Industrial & Engineering Chemistry Research

2.3. Nonparametric Kernel Conditional Density Estimation. The data-driven nonparametric estimation of conditional densities is an important issue in a variety of fields, for example regression analysis and machine learning. The associated methods include the weighted kernel density estimation26 and the mixture density network.27 More recently, Sugiyama et al.17 proposed a conceptually simple least-squares kernel method for estimating a conditional density for random multivariate cause and effect variable sets. Let {x i}iN= 1 ∈ dx and {yi}iN= 1 ∈ dy be a set of samples of a random cause vector ? and a random effect vector @ , respectively; this method utilizes the kernel trick to approximate the conditional density p(y|x) as follows:

by assuming the process variables have a multivariate Gaussian distribution23 or using KDE by separating the set into a product of a series of one or two-dimensional densities.15 It is important to note, however, that the assumption of Gaussian distributed process variables may not hold true for many industrial applications in the chemical industry.24 Moreover, separating a graphical model for a large-scale system into a product of a series of one/two-dimensional probabilities may involve estimating numerous density functions, rendering the process monitoring task difficult, as this produces numerous control charts to be inspected simultaneously. This article demonstrates that decomposing the probabilistic graphical model into a hierarchical structure can significantly simplify the monitoring task at hand. The hierarchical decomposition allows estimating the joint density as a product of several low-dimensional conditional densities. Consider a, b, c, d, e, f, g to be a variable set that possesses the cause−effect relationships described in Figure 1. A direct estimation of the multivariate probability density function for this random variable set is nontrivial and computationally expensive. Figure 1, however, shows that the graphical model can be decomposed into three layers that have the root nodes a, b, that is, the first layer, nodes c, d, e, which make up the second layer, and the leaf nodes f, g as the third layer. This allows the joint probability density function to be expressed in the following form:

p(x , y) = ϕ(x , y) p( x) ≈ βT 2(x , y) ≜ ϕ(̂ x , y) N ⎛ ∥x − x ∥2 ∥y − yi∥2 ⎞ i ⎟⎟ ≈ ∑ βi exp⎜⎜ − − σ σ ⎝ ⎠ i=1 

p(y|x) =

2i(x, y)

Here, 2i(x , y) is a kernel function that has the kernel parameter σ, and βi is the parameter to be estimated from data. The unknown parameter vector β can be obtained by minimizing the integrated squared error term:

p(a , b , c , d , e , f , g ) = p(f , g |a , b , c , d , e) × p(a , b , c , d , e) = p(f , g |a , b , c , d , e) × p(c , d , e|a , b)p(a , b)

J(β) =

(2)

1 Nh

J *(β) =

i=1

(6)

1 2

2

∬ ϕ̂ (x, y)p(x) dx dy

∬ ϕ(̂ x, y)ϕ(x, y)p(x) dx dy 2 1 = ∬ ϕ ̂ (x , y)p(x) dx dy 2 −

∬ ϕ(̂ x, y)p(x, y) dx dy

(7)

The integrals in eq 7 can be approximated by expectations over the probability density functions p(x) and p(x, y). Substituting eq 5 into eq 7 yields J *(β) =

1 T β Gβ − g Tβ 2

(8)

where (3)

G=

Here Ki(·) is the kernel function, with h > 0 as a kernel parameter that can be determined using cross validation.16,25 The most widely used kernel function is the Gaussian kernel function: ⎛ 1 ∥x − x ∥2 ⎞ 1 i ⎟ K i(x , h) = exp⎜ − h 2π h2 ⎝ 2 ⎠

∬ (ϕ(̂ x, y) − ϕ(x, y))2 p(x) dx dy



N

∑ K i(x, h)

1 2

Neglecting the constant term 1/2∬ ϕ2(x,y)p(x) dx dy, simplifies the the above objective function to

Compared to the work in ref 15 the above decomposition considerably reduces the number of densities to be estimated by separating the high-dimensional joint density into the product of two conditional densities and a low dimensional density. In addition to that, the expression in eq 2 is more general, as it allows estimating the joint density of directed cyclic graphs. Moreover, the two-dimensional density p(a, b) can be estimated using KDE, while the conditional probabilities p(f, g|a, b, c, d, e) and p(c, d, e|a, b) can be estimated using a nonparametric kernel method. The next two subsections briefly summarize both estimation methods. Finally, if the number of root nodes exceeds two, the joint density function of the root nodes can be further decomposed by a product of conditional densities and low-dimensional densities. 2.2. Kernel Density Estimation. Let {x i}iN= 1 ∈ dx be a set of samples of the random variable ? that has the density function p(x); the kernel density estimation of p, p̂, is given by p ̂ (x) =

(5)

∫ (∫ 2(x, y)2T(x, y) dy) p(x) dx

and g=

∫ (∫ 2(x, y) p(x, y) dx) dy

can be approximated by (4)

1 Ĝ = N

Here, exp(·) is the exponential function and ∥·∥ is the length of a vector. 1280

N

∑ ∫ 2(x i, y)2T(x i, y) dy i=1

(9a)

DOI: 10.1021/acs.iecr.6b04068 Ind. Eng. Chem. Res. 2017, 56, 1278−1287

Article

Industrial & Engineering Chemistry Research 1 N

ĝ =

N

∑ 2(x i, yi)

establishing a hierarchical probabilistic graphical model requires the following steps: 1. Decompose the joint density p(a, b, c, d, e, f, g, h) into the product of the two conditional densities p1(x), p2(x) and the low-dimensional density p3(x); 2. Estimate the conditional densities p1(x), p2 (x) utilizing the N samples of the reference set ? ; 3. Calculate the probability values of p1(xj) and p2(xj) for the M samples of the second reference set ?͠ ; 4. On the basis of the significance α, for example, α = 0.05, determine the rejection region for p1(x) and p2(x) by numerical integration and apply the probability values determined under step 3 to verify whether the Type I error does not exceed α; it is also possible to determine the rejection region empirically to be the (α·100)th lower percentile of the M probability values determined under step 3; 5. Estimate the low-dimensional density p3(x) using the N reference samples of the reference set ? ; 6. Calculate the density values for the samples contained in the second reference set ?͠ , that is, p̂3(xj); and 7. On the basis of the significance α, compute the rejection region by numerical integration and apply the probability values computed under step 6 to verify whether the Type I error does not exceed α; or estimate the rejection region empirically as the (α·100)th lower percentile of the M probability values determined under step 6. The next section discusses how to utilize the probability values and the rejection regions for online process monitoring.

(9b)

i=1

On the basis of eq 5, it is straightforward to compute the integral in eq 9a. The parameter vector β can now be determined by solving the following regularized optimization problem: λ 1 β̂ = arg min βTĜ β + ĝ Tβ + βT β β 2 2

(10)

λ

where 2 βTβ is the regularization term and λ the regularization parameter. The estimate of β is given by β̂ = [Ĝ + λ I]−1 ĝ

(11)

Using cross validation, the kernel parameter σ and the regularization term λ can be determined. 17

3. HIERARCHICAL PROBABILISTIC GRAPH MODELS FOR PROCESS MONITORING This section introduces the hierarchical probabilistic graph model for monitoring complex systems in the chemical and petrochemical industry. As illustrated in Figure 1, a hierarchical graphical model may be constructed based on the variable causality without requiring detailed information about causal dependencies, including knowledge whether they are linearly or nonlinearly related and the underlying multivariate distribution functions. The experience and understanding of plant operators can assist with the identification of causal dependencies among individual operating units, for example heat exchangers, pumps, piping systems, stripping or distillation units, buffer tanks, and flash tanks to name a few. Second, pipe and instrumentation as well as process flow diagrams allow isolating individual operating units and examining how they relate to each other. A third avenue for constructing such relationships is the utilization of the Granger causality6 or the mutual information criterion.7 On the basis of the separation of the production facility into smaller, more meaningful subunits and the understanding of their interdependencies, the recorded variable set can be divided accordingly, which yields the actual hierarchical probability graphical model. Once the construction of the model is completed, a process monitoring approach can be established for diagnosing anomalous process behavior. It is important to note that a hierarchical probabilistic graphical model consists of several layers, which, in turn, circumvents estimating a typically highdimensional density function. In a similar fashion to competitive data-driven monitoring concepts, establishing a process monitoring model requires two different recorded reference sets:

4. PROCESS MONITORING STATISTICS It is important to note that, different to conventional multivariate statistical process control approaches, the probability values of the samples are used here to construct monitoring statistics. On the basis of the rejection region, which we further refer to here as confidence or control limits, the online monitoring can be performed as follows: 1. Record the kth sample xk, k > N + M; 2. Compute the probability values of p1(xk), p2(xk), and p3(xk) and add these to their corresponding time-based plots; if any probability value is below its rejection region, or outside its corresponding control limit, the process is out-of-statistical-control; otherwise, the process is instatistical-control; and 3. Set k = k + 1 and go to step 1). The following subsections discuss various scenarios to highlight that the proposed monitoring scheme does not rely on any assumptions concerning (i) the underlying distribution function of the recorded variable set and (ii) the variable interrelationships. Gaussian Distributed Process Variables. To simplify the presentation of the resulting conditional probability functions, we assume here that there is only a single root note, one second layer node, and one leaf node. It is important to note that this is not a restriction of generality but for the convenience of the derivatives below. Denoting the variable set of the root node, the second layer node, and the leaf note by x, y, and z, respectively, the conditional density functions are p(z|y, x) and p(y|x). The density function of the root node is p(x). Under the assumption that all process variables have a Gaussian

• a first set of N samples, defined here as ? = {x i}iN= 1; and • a second set containing M samples, denoted here by ?͠ = {x }M + N . j j=N+1

It is important to note that the samples have to be collected to ensure statistical independence, that is ? ∩ ?͠ = ⌀. With respect to Figure 1, we can define the random vector xT = (a b c d e f g) and have three distinct density functions, the conditional density functions p1(x) = p(g, h|a, b, c, d, e) and p2(x) = p(c, d, e|a, b) as well as the density function p3(x) = p(a, b). Using the samples stored in the reference sets ? and ?͠ , 1281

DOI: 10.1021/acs.iecr.6b04068 Ind. Eng. Chem. Res. 2017, 56, 1278−1287

Article

Industrial & Engineering Chemistry Research

ships between x, y, and z neither influences the estimation of the conditional densities p(z|y, x) and p(y|x), nor the estimation of the density function p(x). This, in turn, implies that the estimation of the densities is not dependent upon these relationships, provided the proposed hierarchical probabilistic graph model correctly represents the cause/effect relationships. The discussion at the beginning of this section outlines how these relationships can be established. Diagnosing Abnormal Process Variables. If a fault is detected, it is essential to diagnose and isolate the faulty components. Utilizing the corresponding densities of the variable sets x, y, and z, we first simplify the notation of p(z| y, x), p(y|x) and p(x) to read p1(d0), p2(d0), and p3(d0), respectively, where dT0 = (xT yT zT) is a sample describing the process to be in-statistical-control. Now, defining df as a sample describing the process to be out-of-statistical-control, the following scenarios can be examined: 1. p3(df) is below its control limit and p1(df) and p2(df) are within their confidence regions 2. p3(df) and p2(df) are below their control limits, while p1(df) is within its confidence region 3. p1(df), p2(df) and p3(df) are below their control limits 4. p3(df) and p1(df) are within their confidence regions, p2(df) is below its control limit 5. p3(df) is within its confidence region, p2(df) and p1(df) are below their control limits 6. p3(df) and p2(df) are within their confidence regions, p1(df) is below its control limit For the first three cases, it can be concluded that some variables in the set x are affected by the fault condition, that is, it is the root node, where the fault is located. For cases 4 and 5, it can be concluded that the abnormal operating condition manifests itself in the intermediate node, that is, some variables in the set y are affected by this condition. For the last case, we can conclude that the fault condition is located in the leaf node. It is important to note here that a further and more detailed separation of the variable sets x, y, and z can yield a more detailed isolation of potential fault conditions. This, however, is beyond the scope of this article and referred to for further work.

distribution, the general multivariate distribution function of (xT yT zT)T is given by ⎧⎛ μ ⎞ ⎡ Σ x Σ xy Σ xz ⎤⎫ x ⎪ ⎛x⎞ ⎥⎪ ⎪ ⎪⎜ ⎟ ⎢ μ ⎜ ⎟ y p⎜ ⎟ ∼ 5⎨⎜ y ⎟ , ⎢ Σ yx Σ y Σ yz ⎥⎬ ⎥⎪ ⎪⎜ ⎟ ⎢ ⎝z⎠ ⎪⎝ μz ⎠ ⎢⎣ Σzx Σzy Σz ⎥⎦⎪ ⎭ ⎩

(12)

Here, 5{·} defines a normal distribution, μx and Σx are the mean vector and the covariance matrix of x, respectively, and Σxy is the cross-covariance matrix of x and y. It is straightforward to determine the conditional density function28 p(y|x) ∼ 5(μy | x , Σ y | x ), where μy | x = μy − Σ yx Σ−xx1μx

(13a)

Σ y | x = Σ yy − ΣTyx Σ−xx1Σ yx

(13b)

It should be noted that fault occurring in both x and y can cause change in the conditional density. Moreover, the covariance of conditional density is reduced compared to that of p(y), hence the monitoring statistic p(y|x) is more sensitive than that of p(y). Similarly, let ξT = (xT yT), the joint density can be described as ⎧ ⎡ Σξ Σξ z ⎤⎫ ⎪⎛ μξ ⎞ ⎪ ⎛ξ⎞ ⎥⎬ p⎜ ⎟ ∼ 5⎨⎜ ⎟ , ⎢ ⎪ μ ⎪ ⎝z ⎠ ⎩⎝ z ⎠ ⎢⎣ Σzξ Σz ⎥⎦⎭

(14)

The conditional density p(z|y, x) can be obtained as p(z|ξ) ∼ 5(μz | ξ , Σ μz |ξ)

where μz | ξ = μz + ΣzξΣ−ξξ1μξ

(15)

Σz | ξ = Σzz − ΣTzξΣ−ξξ1Σzξ

(16)

Since the variables are Gaussian, the monitoring statistic p(x) can be easily obtained as p(x) ∼ 5(μx , Σ x ). General Distribution Variables. If the variables are not Gaussian distributed, the densities p(z|y, x), p(y|x), and p(x) can be approximated using the nonparametric estimation methods discussed in section 2. The estimates of p(y|x) and p(z|ξ) are given by N

p(y|x) ≈

∑ βi ̂

xy

i=1

N

p(z | ξ ) ≈

∑ βi ̂

ξz

i=1

⎛ x−x i exp⎜⎜ − σ ⎝ xy

⎞ ⎟⎟ ⎠

(17a)

⎛ ∥ξ − ξ ∥2 ∥z − zi∥2 ⎞ i ⎟⎟ exp⎜⎜ − − σξ z σξ z ⎝ ⎠

(17b)

2



y − yi σxy

2

5. SIMULATION STUDIES To examine the performance of the hierarchical probabilistic graph models to systems producing (1) Gaussian distributed process variables that are linearly interrelated (subsection 5.1); (2) non-Gaussian distributed process variables that are linearly interrelated (subsection 5.2); and (3) non-Gaussian distributed process variables that are nonlinearly interrelated (subsection 5.3), this section compares it to the performance of the standard MSPC methods based on simulated examples. The MSPC methods we considered are principal component analysis (PCA), independent component analysis (ICA), and kernel PCA (KPCA). PCA2 and ICA24,29 are designed to construct monitoring statistics that are based on Gaussian distributed process variables and non-Gaussian distributed variables, respectively. KPCA is a generic nonlinear extension of PCA30 to handle nonlinear variable interrelationships. 5.1. Gaussian Distributed Process Variables, Linear Variable Interrelationships. The process for comparing the performance of an identified hierarchical probabilistic graph model with the competitive MSPC methods is based on three variables, denoted here as x, y, and z. The root node variable x0 had a Gaussian distribution of zero mean and unit variance. To

Here, the β̂ parameters are estimated and the σ parameters are determined as discussed in the previous section. Finally, p(x) can be estimated using a KDE: p(x) ≈

1 Nh

N

⎛ x − xi ⎞ ⎟ h ⎠

∑ K i⎜⎝ i=1

(18)

where the bandwidth h can be determined as discussed in section 2. Linear and Nonlinear Variable Interrelationships. The preceding discussion highlights that the variable interrelation1282

DOI: 10.1021/acs.iecr.6b04068 Ind. Eng. Chem. Res. 2017, 56, 1278−1287

Article

Industrial & Engineering Chemistry Research

density p(x); the hierarchical probabilistic graphical model allowed a direct isolation of the fault condition from the corresponding monitoring statistics. 5.2. General Distribution Functions, Linear Variable Interrelationships. This subsection utilizes a more complex simulation example that produces non-Gaussian distributed process variables. The simulated process involved eight variables which are interrelated as follows. There are two root nodes that have the variables x1 and x2, that is, xT0 = (x01 x02). The variables x01 and x02 are uniformly distributed between [0, 0.5]. The measured data vector for the root nodes is then x = x0 + ϵ1 to account for measurement uncertainty. The measured variables of three intermediate nodes, stored here by yT = (y1 y2 y3) have the following relationship to the root nodes:

account for measurement uncertainty, the Gaussian distributed error variable ϵ1 was superimposed on x0, that is, the measured variable for the root note is x = x0 + ϵ1. The intermediate node y is related to x0 as y = 0.8x0 + ϵ2 = y0 + ϵ2, where ϵ2 accounts for measurement uncertainty. The variable measured at the leaf node is z = 0.3y0 + ϵ3. Each of the error variables are Gaussian distributed, that is, ϵk ∼ 5{0, 0.01}, k = 1, 2, 3. From this process, a reference set containing 1000 samples was simulated; that is, +0 = {(xi0yi z i0)T }1000 i=1 0

and a second reference set describing a fault scenario was simulated; that is, +f = {(x j yj z j )T }500 j=1 0 0

⎡ 0.1403 0.0868 0.2573 ⎤ A=⎢ ⎥ ⎣ 0.2601 0.4294 0.2976 ⎦

0

y = Ax 0 + ϵ2

The fault condition was a sensor bias with the magnitude of 0.3 on the leaf node z. Figure 2 summarizes the results of plotting

(19)

where ϵ2 represents measurement uncertainty. Finally, the variables of the three leaf nodes are stored in zT = (z1 z2 z3) are related to the intermediate nodes as follows: ⎡ 0.0762 0.7272 0.4633 ⎤ ⎥ ⎢ B = ⎢ 0.5198 0.0001 0.3619 ⎥ ⎢⎣ 0.6088 0.1698 0.9521⎥⎦

z = By0 + ϵ3

(20)

As before, ϵ3 accounts for measurement uncertainty. Each of the error vectors contains uniformly distributed variables between [0, 0.2]. Moreover, the error vectors are mutually independent, or uncorrelated, and, in addition to that, independent of x0. For comparing the performance of an identified hierarchical probabilistic graphical model with PCA and ICA, a reference data set of 1000 samples, that is,

Figure 2. Monitoring results using the probabilistic graphical model for the first simulation study.

the monitoring statistics described in section 3 for each of the 500 samples of the set +f . For comparison, Figure 3 shows the

+0 = {(x iT0 y iTz iT0)T }1000 i=1 0

and a second reference set describing a fault scenario was simulated, that is, +f = {(x Tj y Tj z Tj )T }500 j=1 0

0

0

was simulated. The fault involves a sensor bias with the magnitude of 0.1 on x2. As highlighted in section 3, the monitoring of the joint density function can be reduced to the monitoring of the two conditional probabilities p(z|y, x) and p(y|x), and the density function p(x). The control limit for each density was determined empirically to be the fifth percentile. Figure 4 plots the monitoring results for each monitoring statistic. For PCA, retaining five principal components are sufficient to capture more than 90% of the variance. The control limits for the Hotelling’s T2 and Q statistics were determined for a significance of α = 0.05. For ICA, five non-Gaussian independent components (ICs) are extracted and the monitoring statistics I2 and I2e proposed in ref 29 are considered. To guarantee a fair comparison, the control limits for the I2 and I2e statistics were determined empirically as the 95th percentile. The monitoring results of PCA and ICA are shown in Figures 5 and 6, respectively. From Figure 4 it can be seen that the sensor bias is successfully detected by all the three probabilities p(z|y, x), p(y|x) and p(x). This can be explained by the fact that the fault condition in one of the root nodes not only affects the root node but also the relationship between that root node and

Figure 3. Monitoring results using PCA for the first simulation study.

results of applying the PCA-based monitoring statistics2 to the same data set. For a significance of α = 0.05, Figure 2 highlights that a considerable number of conditional density values p(z|y, x) for +f were below the control limit. This implies that the sensor bias was successfully detected. Similarly, Figure 3 confirms a significant number of values of the PCA Q statistic2 were outside its confidence region, computed for α = 0.05. Given that both techniques could detect the sensor bias, it is important to note that a significant number of out-of-statisticalcontrol could be observed for the conditional density function p(z|y, x) but not for the conditional density p(y|x) and the 1283

DOI: 10.1021/acs.iecr.6b04068 Ind. Eng. Chem. Res. 2017, 56, 1278−1287

Article

Industrial & Engineering Chemistry Research

5.3. Non-Gaussian Distributed Process Variables, Nonlinear Variable Interrelationships. Finally, this subsection presents a simulation example that involves a nonlinear process that produces non-Gaussian variables. As before, there are two root nodes that relate to the variables x1 and x2, stored in xT = (x1 x2). These variables are uniformly distributed variables between [0, 0.5] and corrupted by an error vector to represent measurement uncertainty, that is, x = x0 + ϵ1. The variables of three intermediate nodes are stored in yT = (y1 y2 y3), and the variables of the leaf nodes are zT = (z1 z2). The variable interrelationships between the root and intermediate nodes are as follows: Figure 4. Monitoring results using the probabilistic graphical model for the second simulation study.

⎛ y1 ⎞ ⎛ 0.4sin(x1) + 0.6cos(x 2)⎞ ⎟ ⎜ ⎟ ⎜ ⎜ y2 ⎟ = ⎜ 0.7sin(x1) + 0.3cos(x 2)⎟ + ϵ2 ⎟ ⎜ y ⎟ ⎜⎜ ⎝ 3 ⎠ ⎝ 0.6sin(x1) + 0.4cos(x 2)⎟⎠  (21)

y0

The variable interrelationships between the intermediate and the leaf nodes are given by ⎛ 0.4y + 0.3y2 + 0.3y3 ⎞ ⎛ z1 ⎞ ⎜ 1 ⎟ ⎜ z ⎟ ⎜ 0.5 sin(y ) + 0.3 cos(y ) ⎟ = + ϵ3 2 1 3 ⎜⎜ ⎟⎟ ⎜ ⎟ z ⎝ 3 ⎠ ⎜ 0.2 sin(y ) + 0.8 cos(y )⎟ ⎝ 1 2 ⎠   

Figure 5. Monitoring results using PCA for the second simulation study.

z0

(22)

Each of the variables stored in ϵk, k = 1, 2, 3 is uniformly distributed between [0, 0.2]. From this process, a reference set containing 1000 samples and a second data set of 500 samples that describes a fault condition was simulated. This time, the fault scenario involved a multiplicative fault with the magnitude of 0.9 to y1. Similar to the previous section, the hierarchical probabilistic graphical model decomposes the joint density function into the product of two conditional probabilities p(z|y, x) and p(y|x) and the density p(x). The control limits were empirically determined as the fifth percentile. Figure 7 plots the resultant monitoring statistics for the 500 samples in the second data set.

Figure 6. Monitoring results using ICA for the second simulation study.

the intermediate/leaf nodes. Conversely, the Q statistic of PCA only detects sporadically violations, while the Hotelling’s T2 statistic is insensitive to this event. Moreover, the I2 statistic is sensitive to the sensor bias. By directly comparing Figures 4 and 6, it should be noted that the hierarchical probabilistic graphical model shows a more pronounced response to this event than the ICA monitoring model. In addition to that, it can be concluded from the three probabilistic monitoring statistics of the probabilistic graphical model that the fault can be isolated to a problem in at least one of the root nodes. Both the PCAand ICA-based monitoring statistics are not able to offer such a detailed fault isolation. It is also important to note that the hierarchical probabilistic graphical model can be divided further to allow a more detailed fault diagnosis. As the main aim of this article is to introduce the new probabilistic graphical model as a competitive monitoring approach, such a detailed analysis is beyond the scope of this article.

Figure 7. Monitoring results using the probabilistic graphical model for the third simulation study.

The results of the probabilistic graphical monitoring model were compared to those obtained by constructing monitoring models based on PCA, ICA, and KPCA. For PCA, retaining five principal components is sufficient to capture more than 90% variance. For ICA, four ICs are retained to construct the I2 and I2e statistics. For KPCA, retaining four PCs captures more than 95% variance, the corresponding monitoring statistics are 1284

DOI: 10.1021/acs.iecr.6b04068 Ind. Eng. Chem. Res. 2017, 56, 1278−1287

Article

Industrial & Engineering Chemistry Research denoted by T2kpca and Qkpca. Figures 8, 9, and 10 summarize the resultant monitoring statistics for PCA, ICA, and KPCA. By

blast furnace ironmaking process. A blast furnace is a vertical furnace to produce molten iron that is part of a steel making plant. The blast furnace receives raw materials, mostly iron ore, and coke from the top layer by layer. From the bottom, hot air having a temperature exceeding 1000 °C and coal powder enters the furnace. The ascending hot air and the presence of the coal powder leads to a series of high temperature chemical reactions, where the flue gas exits the furnace from the top. Conversely, the descending material gradually softens to reduce into the molten iron and liquid slag, which leave the furnace from the bottom intermittently. Guarantee a smooth production of molten iron requires monitoring and maintaining stable gas flows into the blast furnace. From the steel making process at the Baotou steel company in Inner Mongolia, P.R. China, a number of data sets were recorded over a period of several weeks at a sampling rate of once per second. The recorded variables, listed in Table 1, were

Figure 8. Monitoring results using PCA for the second simulation study.

Table 1. Selected Variables for Gas Flow Monitoring

Figure 9. Monitoring results using ICA for the second simulation study.

no.

variable description

u1 u2 u3 u4 u5 u6 u7 u8

quantity of blast temperature of blast permeability index descending speed of burden top pressure bottom pressure CO2 concentration in the flue gas CO concentration in the flue gas

noisy and also showed serial correlation. To reduce the noise level as well as the correlation, it is common practice to average the data and use the averaged samples.31 In this application, we applied and averaged over 20 min, so a sample was analyzed by each method every 20 min. For this study, a reference set containing 1000 samples, which cover a period of almost 14 days and a second data set of 300 samples describing an abnormal process behavior, which was a fault in the instrumentation of the gas component measurement system, were available. On the basis of the process knowledge, a graphical model for the eight variables can be easily constructed. In Table 1,variables u1 and u2 are manipulated variables and hence are regarded as root nodes; variables u7 and u8 are controlled variables and regarded as leaf nodes. The remaining variables are intermediate nodes. Following from the discussion in section 3, the joint density of the random vector