Spatial Interpretive Structural Model Identification and AHP-Based

Mar 7, 2016 - model for large-scale chemical processes, data-driven techniques must be .... flow diagram, the item mij is set to be 1 when the variabl...
0 downloads 0 Views 7MB Size
Article pubs.acs.org/IECR

Spatial Interpretive Structural Model Identification and AHP-Based Multimodule Fusion for Alarm Root-Cause Diagnosis in Chemical Processes Huihui Gao,†,‡ Yuan Xu,†,‡ and Qunxiong Zhu*,†,‡ †

Engineering Research Center of Intelligent PSE, Ministry of Education of China, Beijing 100029, China College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China



ABSTRACT: An alarm system plays a fundamental role in safety, quality, and economic profits of chemical processes. Alarm root-cause diagnosis is an essential part in alarm system management to monitor and locate the abnormalities. Because of the high integration and complexity in modern large scale industrial processes, a simplex structure model and monolithic monitoring methods cannot always meet the requirement of alarm root-cause diagnosis. This work introduces a framework to identify the alarm root cause and visualize the abnormality propagation path. A novel spatial interpretive structural model (SISM) is proposed to represent the hierarchical organization of space and show the causal relationships on different levels of granularity. In SISM, multiple spatial unit blocks are obtained by process decomposition based on the process flow diagram. Each block has one or more different variables. The hierarchical internal causal relationships in each individual block and external causal relationships between any two different blocks are identified by improving the traditional interpretive structural model. Considering the abnormality propagation, each spatial subblock and its child component(s) comprise one module, monitored by its corresponding extreme learning machine. The results from all the individual diagnostic modules may be inaccurate and mutually contradictory. We deploy an adaptation of analytical hierarchical process method to fuse these diagnostic results from various modules. The key benefits of our proposed framework have been demonstrated through seven fault scenarios in a Tennessee Eastman Chemical plant. All alarm root causes are successfully recognized with high diagnostic accuracy and short or even no diagnosis delay. The abnormality propagation paths are clearly visualized in SISM.

1. INTRODUCTION Alarm systems form a core element of almost all modern industrial processes. They are an important way of automatically monitoring the processes and drawing the operators’ attention to the abnormal situations that require timely action and assessment. Owing to the process dynamics, intricate variable interactions, and closed-loop control systems, abnormalities in one section of the plant can propagate to other sections. When process measurements violate the defined alarm thresholds, a large number of alarms may be triggered. Alarm overloads can distract the operators’ focus and hamper their ability to make fine adjustments.1,2 To maintain and improve plant safety, product quality, energy efficiency, and economic profits, early detection and accurate diagnosis of the root cause of alarms are essential and have been investigated quite extensively in recent years.3 The prevailing approaches to process monitoring and alarm root-cause diagnosis fall into two categories:4−6 (a) model based and (b) process history data based methods. Model-based methods can be used only if accurate mechanistic models of processes are developed by parameter identification, state estimation, or parity relation techniques. However, these analytical model-based approaches require in-depth knowledge of processes and extensive effort to build the precise firstprinciple models for large-scale complex industrial plants. By contrast, data-driven monitoring methods have attracted growing attention, because they do not require deep under© 2016 American Chemical Society

standing of process knowledge and mechanistic models but instead depend on the historical process data only.7 As a representative data-driven method, multivariate statistical process monitoring (MSPM) has become very popular and progressed quickly with the rapid development of database technologies and advanced computing. Principal component analysis,8 partial least-squares,9 and independent component analysis10 and their various extensions11−14 have been widely used in process monitoring and root-cause diagnosis applications. Nevertheless, the variable contribution methods may not explicitly identify the root causes of abnormalities and the fault propagation pathways because the abnormal events can propagate throughout the whole process along the various connections. In addition, some useful information may be lost in the low-dimensional representations, which may potentially degrade the process monitoring performance. Therefore it is necessary to capture the causality. Some qualitative models, such as signed directed graph,15,16 multilevel flow model,17 and computer aided engineering exchange model18 have been implemented to analyze the cause−effect relationship. But these constructions are too complicated and highly dependent Received: Revised: Accepted: Published: 3641

November 10, 2015 January 25, 2016 March 6, 2016 March 7, 2016 DOI: 10.1021/acs.iecr.5b04268 Ind. Eng. Chem. Res. 2016, 55, 3641−3658

Article

Industrial & Engineering Chemistry Research

simpler and requires less computational effort to realize; (d) the partial correlation analysis allows the directed circle structure and is appropriate for continuous process variables. To avoid the shortcomings of multivariate statistical process monitoring and complicated reasoning techniques presented above, artificial neural networks (ANN) have been widely used in process monitoring and analysis. The incorporation of ANN into alarm diagnostics may generate great benefits in terms of speed, knowledge, acquisition, and accuracy; especially, the generalization capability of ANNs can improve the performance of rootcause diagnostics. As a simple and fast single-hidden layer feedforward neural network, the extreme learning machine (ELM)37 has gained popularity and led to successful applications in many fields such as classification,38 function approximation,39 process modeling,40 and pattern recognition.41 The diagnosis of alarm root cause can be viewed from a pattern recognition perspective, for which the input pattern is constructed by the process variables pattern and the output pattern is just the alarm root cause; therefore, the ELM can be employed to train the input and output patterns within multiple sub-blocks separately. When process abnormality occurs, one or more alarms in different blocks will be triggered simultaneously or sequentially. Afterward, the trained ELM diagnosis model corresponding to the triggered sub-block is activated to perform the real-time alarm root cause diagnosis. The input to each special ELM model is the online process data and its output is any alarm root-cause class. Nevertheless, the monitoring results of individual subblock models may not always concur and are often in conflict with each other. An appropriate decision fusion method is therefore necessary to combine the various results and eliminate the conflicts among them. The common fusion techniques are Bayesian probability based fusion24 and Dempster−Shafer (D-S) evidence theory.42 However, the Bayesian-based fusion can generate counterintuitive results because it is unable to account for the inaccuracy of the classifier output. Moreover, the Bayesian based fusion cannot deal with the compound class; that is to say, a probability must be assigned to each atomic class. D-S based fusion may not function well in combining the highly conflicting evidence since there is no verification to check the fusion result. As a powerful tool for decision support problems, the analytical hierarchy process (AHP) is devised to solve multicriteria decision problems based on the relative priorities assigned to each criterion with the consistency check. The conventional AHP and its various modified versions have been proven useful in many aspects, such as factors prioritization,43 strategies identification and ranking,44 natural hazard assessment,45 and landslide susceptibility mapping.46 To apply the AHP method into classifier output fusion, we must make an important modification to deal with the imprecision and uncertainty of the classifier as well as the subjective construction of pairwise comparison matrix in the traditional AHP method. Motivated by the above considerations, we propose a novel alarm root-cause diagnosis framework that is able to construct the causal graph spatial interpretative structural model (SISM), monitor the process, diagnose the alarm root cause, and visualize the abnormality propagation path for a chemical process using data-driven methods and superficial process knowledge. In our hierarchical SISM, multiple sub-blocks are obtained by decomposing the entire process based on the process flow sheet, and the causality is captured by the partial correlation analysis. The SISM can not only represent the hierarchical organization of space but also show the causal relationships on different levels of granularity. In addition, not all sub-blocks will

on experience and process knowledge, which are often unavailable and insufficient. On the other hand, data-driven methods are preferred to do the causality analysis. Lag-based cross-correlation analysis,19 granger causality analysis,20 transfer entropy analysis,21,22 and Bayesian network learning23,24 are commonly used to identify the cause−effect relationships between any two variables. However, they all have their own shortcomings: (i) the cross-correlation analysis confounds direct and indirect relationships; (ii) the granger causality method needs a regression model, which can be quite restrictive; (iii) the transfer entropy is highly dependent on the estimation of probability density functions, which leads to a high computational burden and affects the computational result; (iv) the Bayesian network learning is sophisticated, and the causal structure obtained from Bayesian network learning needs to be a directed acyclic graph, which is in contradiction with the actual chemical processes. Besides, sometimes the results are not credible due to the lack of robustness to incomplete, noisy, and possibly inaccurately triggered alarm(s), especially when the faults propagate through complex variable interactions and across different process units. The strong integration and complexity of large scale chemical processes also make it difficult to apply those above simple models and centralized strategies. Distributed monitoring and diagnosis have been an appealing research topic in recent years. Instead of being supervised under monolithic monitoring methods, the entire process is decomposed into multiblocks, and sub-block models are built to monitor the sub-block status. The already existing distributed statistical process monitoring schemes25−27 can effectively reduce the complexity and improve the veracity in monitoring large-scale processes; however, the block divisions are based on purely process measurements, or neglect any kind of useful process knowledge, resulting in information loss and lack of interpretability. Besides, the fundamental cause−effect relationships cannot be extracted; therefore, the abnormality propagation paths cannot be determined. To reduce the computational cost as well as to explicitly visualize the spatial causal connections between different plant measurements and structures, the chemical processes should be decomposed in entirety and hierarchically into multiple blocks using process knowledge. Once the multiblock division has been finished, alarm diagnosis methods can be developed to monitor the process at various hierarchies. As an effective methodology for extracting interrelationships among various elements in complex systems, interpretative structural modeling (ISM) is highly advantageous for transforming vague, poorly articulated relationships into clear, hierarchical directed graphs.28 ISM-based interaction analysis techniques have gained wide popularity in the past few decades.29−33 Nevertheless, for all of the above ISM models, the cause−effect relationships are entirely determined by expert opinions. To apply ISM to construct the hierarchical causal model for large-scale chemical processes, data-driven techniques must be used to extract the cause−effect relationships. Compared to the popular data-driven methods for causality analysis mentioned above, the partial correlation analysis34−36 can avoid those shortcomings. Its superiority lies in these facts: (a) the partial correlation analysis is able to discriminate the direct and indirect causality by measuring the causality between two variables after their dependence on other variables is removed; (b) the partial correlation analysis does not need any predefined model; (c) the partial correlation analysis only needs the covariance and variance calculation, which is conceptually 3642

DOI: 10.1021/acs.iecr.5b04268 Ind. Eng. Chem. Res. 2016, 55, 3641−3658

Article

Industrial & Engineering Chemistry Research

Figure 1. Illustrative diagram of establishing the spatial interpretive structural model

provide information about all root causes; some may be capable of identifying only a specific subset of the root causes. Therefore the fuzzy c-means clustering method is conducted on the collected process data in the alarm state to allocate the relevant root causes to each sub-block in advance to improve monitoring

performance. The ELM is used to realize the distributed alarm root cause diagnosis. A novel data fusion strategy integrating the basic probability assignment (BPA) function and improved AHP (BPA-IAHP) is utilized to merge the diagnostic outputs from different triggered ELM networks and obtain the potential root 3643

DOI: 10.1021/acs.iecr.5b04268 Ind. Eng. Chem. Res. 2016, 55, 3641−3658

Article

Industrial & Engineering Chemistry Research

identified. Next, data-driven methods should be performed to find out all the remaining causality. To determine the true causality between any two variables Xi and Xj, we use partial correlation coefficients to quantify the direct correlation between them. Assume Ψ = {Xi|i = 1, 2, ..., V; Xi ∈ RL} is the collected historical data sets in steady state with the data length L. The correlation coefficient between variables Xi and Xj is defined as

causes. Finally, the probable alarm propagation pathway can be identified by tracking the successive alarm variables along with the causal connections in the SISM. The rest of this article is organized as follows. Section 2 concisely describes the identification of our proposed causal graph SISM. Section 3 reviews the existing ELM algorithm and defines the novel data fusion strategy BPA-IAHP. Section 4 proposes the framework for alarm root causes diagnosis in chemical processes. The presented method is evaluated using the Tennessee Eastman chemical process in section 5, and the conclusions are drawn in section 6.

r (X i , X j) =

cov(X iX j) var(X i) var(X j) L

2. SPATIAL INTERPRETIVE STRUCTURAL MODEL IDENTIFICATION The chief contributions of building a spatial interpretive structural model for alarm root-cause diagnosis lie in three aspects. First, the causal graph SISM can not only indicate the hierarchical causality within the entire process but also explicitly provide spatial distribution of different subblocks. Second, the process decomposition lays a good foundation for the multimodule monitoring and diagnosis. Each subblock in SISM and its direct descendants form a distinct diagnostic module. Third, the SISM can show the abnormality propagation path both within subblocks and among different spatial subblocks. It would help the operators to focus on the abnormal process units or unit operations and take timely corrective actions. The overall procedure of constructing the spatial interpretive structural model is illustrated in Figure 1. The steps are as follows: (1) decompose the entire process into smaller subblocks based on spatial distribution matrix (2) extract the internal and external causality (3) divide the micro- and macro-spatial hierarchy (4) determine the final causal graph SISM by adding the directed arcs. 2.1. Decompose the Entire Process Based on Spatial Distribution Matrix (SDM). To realize the distributed monitoring and diagnosis as well as incorporate the relationships between different process sections, the SDM is first created from the process flow diagram. Then, we can decompose the entire process into multiple subblocks in accordance with the SDM. The SDM is a B × V matrix [dbv] where B and V represent the number of subblocks and process variables, respectively. Each item dbv is assigned 1 or 0 by following the rule that dbv = 1 if the process variable Xv is monitored in the subblock b and dbv = 0, otherwise. With the generated SDM, we can readily create the spatial block matrix (SBM). The SBM is a square matrix [mij]V × V where V represents the number of process variables. There are total of B diagonal blocks in SBM. Each diagonal block corresponds to a process subblock or a row of SDM and contains the monitored variables whose value is 1 in that row. By analyzing the process flow diagram, the item mij is set to be 1 when the variable i has a direct and obvious influence on the variable j or i is equal to j. Apparently this procedure of process decomposition and incomplete direct causality determination needs some process knowledge, but it can be easily performed with the very basic comprehension of superficial knowledge such as process flow diagram. 2.2. Extract the External and Internal Causalities. With the generated SBM, a large set of process variables is divided into smaller subsets, and some obvious and direct causality are also

=

Xli

∑l = 1 (X li − X̅ i)(Xlj − X̅ j) L

L

∑l = 1 (X li − X̅ i)2 ∑l = 1 (Xlj − X̅ j)2

(1)

Xlj

where and are the lth values of vectors Xi and Xj, respectively. X̅i and X̅j are their average values, respectively. The pairwise correlation coefficients matrix r about total of V variables can be obtained according to eq 1. ⎡ r11 r12 ⋯ r1V ⎤ ⎢r r ⎥ 21 22 ⋯ r2V ⎥ r=⎢ ⎢⋮ ⋮ ⋱ ⋮⎥ ⎢ ⎥ ⎣ rV 1 rV 2 rV 3 rVV ⎦V ·V (2) The inverse matrix of r is represented as ⎡ u11 u12 ⋯ u1V ⎤ ⎢u u ⎥ 21 22 ⋯ u 2V ⎥ u = inv(r) = ⎢ ⎢⋮ ⋮ ⋱ ⋮ ⎥ ⎢ ⎥ ⎣ uV 1 uV 2 uV 3 uVV ⎦V ·V

(3)

Then, the partial correlation coefficient between variables Xi and Xj is calculated by eq 4: uij (i , j = 1, 2, ..., V ) P(X i , X j) = − uii ·ujj (4) Finally, the pairwise partial correlation coefficients matrix P about V variables can be obtained: ⎡ P11 P12 ⎢ ⎢ P21 P22 P=⎢ ⎢ ⋮ ⋮ ⎢⎣ PV 1 PV 2

P1V ⎤ ⎥ ⋯ P2V ⎥ ⎥ ⋱ ⋮ ⎥ PV 3 PVV ⎥⎦ V ·V ⋯

(5)

The value of P(Xi, Xj) in P ranges from −1 to 1, and it indicates the correlation intensity between variables Xi and Xj (see Table 1). For determining the causality based on the partial correlation coefficients matrix P, we should select a threshold α(0 < α < 1) to Table 1. Partial Correlation Coefficient and Its Corresponding Correlation Intensity47

3644

partial correlation coefficient

correlation intensity

0 ≤ |P(Xi, Xj)| < 0.1 0.1 ≤ |P(Xi, Xj)| < 0.3 0.3 ≤ |P(Xi, Xj)| < 0.5 0.5 ≤ |P(Xi, Xj)| < 0.8 0.8 ≤ |P(Xi, Xj)| < 1

no relationship low correlation medium correlation strong correlation extremely strong DOI: 10.1021/acs.iecr.5b04268 Ind. Eng. Chem. Res. 2016, 55, 3641−3658

Article

Industrial & Engineering Chemistry Research construct the initial spatial block adjacent matrix A*. The value of α depends on the individual demand: if stronger correlation is preferred, the threshold can be set larger; otherwise, it can be set smaller. The formation rule of initial spatial block adjacent matrix (ISBAM) is listed in Table 2. Table 2. Formation Rule of Initial Spatial Block Adjacent Matrix i≠j

i=j

P(Xi, Xj)

A*(i, j)

A*(j, i)

P(Xi, Xj) ≥ α P(Xi, Xj) ≤ −α others

1 0 0 1

0 1 0 1

After the formation of ISBAM, the final spatial block adjacent matrix (SBAM) A can be obtained by filling in the SBM with the remaining causality information contained in A*. The internal causalities of each subblock are indicated by the 1s in the corresponding diagonal block, and the external causalities between any two different subblocks are indicated by the 1s not in the diagonal block. It should be pointed out that the causalities in SBAM are the direct causalities. The expression A(i,j) = 1 indicates the variable Xi has a direct influence on the variable Xj. 2.3. Divide the Macro- and Micro-spatial Hierarchy. The macrospatial hierarchy is partitioned to get the level of each subblock, and the microspatial hierarchy is partitioned to get the level of variables within each subblock. Before the division of spatial hierarchy, the spatial block reachability matrix (SBRM) R should be created first by adding transitivity to the SBAM described in sec 2.2. Let R* = A + A2 + A3 + Λ + AN, where N is the order of matrix A. Then set the nonzero values of R* as 1. The final matrix is just the spatial block reachability matrix R we want. The reachability matrix indicates both the direct and indirect causalities among all variables. 2.3.1. Divide the Macrospatial Hierarchy. From the reachability matrix R, the macrospatial hierarchy can be divided based on the ISM algorithm, which is shown in the following pseudocodes (Algorithm ISM). Different subblocks are possibly divided into the same macrospatial layer. One and the same subblock can be divided into different macrospatial layers because different variables in the same subblock can be divided into different layers. 2.3.2. Divide the Microspatial Hierarchy. The process of microspatial hierarchy division is almost the same as that of macrospatial hierarchy division. The only difference between them is that the variables to be layered in the macro division are all of the process variables, while in the micro division only the variables within a certain subblock are to be layered. Consequently, the micro division process should be done B times for B subblocks. 2.4. Determine the Final Causal Graph SISM by Adding the Directed Arcs. Once the macro- and the microspatial hierarchy are determined, the directed arc pointing from variable Xi to Xj should be added if the value of A(i,j) in SBAM is 1. After adding all the directed arcs, the causal graph SISM can be finally established.

3. SUBMODULE DIAGNOSIS AND MULTIMODULE DATA FUSION 3.1. Brief Reviews of Extreme Learning Machine. Unlike the traditional thinking and popular implementations that all the parameters of SLFNs need to be tuned to obtain good learning performance, ELM algorithm allows the random assignment of input weights and hidden layer biases if the activation functions in the hidden layer are infinitely differentiable, and then the output weights can be analytically determined through the smallest norm least-squares solution. The ELM method not only makes learning extremely fast but also produces good generalization performance. Given a training set Φ = {(ei, oi) | ei ∈ Rn, oi ∈ Rs, i = 1, 2,..., K}, hidden layer activation function g(·), and hidden node number Ñ , where ei is the ith input sample for ELM, oi is the ith output sample for ELM, n is the input node number, s is the output node number, and K is the samples number: Step 1: Randomly assign input weight w i and bias βi (i = 1, 2, ..., Ñ ), where wi = [wi1,wi2,...,win]T is the weight vector connecting the ith hidden node and the input nodes, and βi is the threshold of the ith hidden node. Step 2: Calculate the hidden layer output matrix H ⎡ g (w1·e1 + β ) ⋯ g (w Ñ ·e1 + β ̃ ) ⎤ 1 N ⎢ ⎥ ⎢ ⎥ ⋮ ⋯ ⋮ H= ⎢ ⎥ ⎢⎣ g (w1·eK + β1) ⋯ g (w Ñ ·eK + β Ñ )⎥⎦ (6) K × Ñ Step 3: Calculate the output weight γ ⎡γT⎤ ⎢1 ⎥ = H†O γ=⎢⋮⎥ ⎢ ⎥ ⎢⎣ γNT̃ ⎥⎦ Ñ × s

(7)

where γi = [γi1 , γi2 , ..., γis]T (i = 1,2,...,N) is the weight vector connecting the ith hidden node and the output nodes O = [o1, o2, ... oK]TK×s,, and H† is the Moore−Penrose generalized inverse of matrix H. 3645

DOI: 10.1021/acs.iecr.5b04268 Ind. Eng. Chem. Res. 2016, 55, 3641−3658

Article

Industrial & Engineering Chemistry Research

represents the proportion of input samples from class Rm that are assigned to class Rn by the jth classifier. The basic probability assignment (BPA) matrix BMj of the jth classifier can be calculated from its respective confusion matrix. It is represented as

Step 4: Calculate the actual output O* of ELM O* = Hγ

(8)

3.2. Data Fusion Strategy Integrating the BPA Function with Improved AHP: BPA-IAHP. Because of the unavoidable imprecision and uncertainty of the submodule diagnostic models or classifiers, their prior performance would exert an effect on the diagnostic outputs. What’s more, when more than one classifier is activated, the outputs of different classifiers can be contradictory or even mutually exclusive. To overcome the above problems and obtain the consistent results, the fusion strategy must take the classifiers’ prior knowledge to suitably weight the results. The basic probability assignment (BPA) function is a critical element of Dempster−Shafer evidence theory since the BPAs represent all the prior knowledge of different diagnostic models or classifiers in the confusion matrix. In particular, the BPAs can reflect the prior strengths or weaknesses of each diagnostic module. After quantifying the prior knowledge of each classifier, the improved analytic hierarchy process (IAHP) will be used to combine the inconsistent results since the IAHP can handle both the atomic class and compound class. Besides, we can get the reliable solutions through the comparison matrix adjustment and consistency check. The stages of the proposed fusion strategy BPA-IAHP is shown in Figure 2.

j ⎡ Zj Z1,2 ⎢ 1,1 ⎢ j j ⎢ Z 2,1 Z 2,2 BM j = ⎢ ⋮ ⎢ ⋮ ⎢ j j ⎣ Z M j ,1 Z M j ,2

⋯ Z1,j M j ⎤ ⎥ ⎥ j ⋯ Z 2, M j ⎥ ⎥ ⋱ ⋮ ⎥ ⎥ ⋯ Z Mj j , M j ⎦

(10)

The rows stand for the classes recognized by the jth classifier, that is, the outputs of the jth classifier, while the columns indicate the j actual classes. An element Zm,n , in the BPA matrix, represents the belief of samples assigned to class Rm but actually belonging to class Rn. It can be expressed as Zmj , n

=

Cnj, m Mj

∑n = 1 Cnj, m

(11)

3.2.2. Improved Analytic Hierarchy Process. At alarm time t, we can assume that a total of Jt (Jt = 1,2,...,J) diagnostic submodules or classifiers are activated to perform alarm rootcause identification. The recognized root cause class by the jth (j = 1,2,...,Jt) classifier at alarm time t is Rtj(Rtj ∈ {R1, R2,..., RMj}), and the corresponding belief set (BS) is expressed as BStj = Rtj. It should be noted that the belief set BStj can be atomic class or compound class. The step-by-step procedures of IAHP are listed as follows. (i) Find the intersection set Et among total Jt belief sets BStj (j = 1,2,...,Jt). Each element Ef in Et must also be in each belief set BStj, that is, Et = ∪∀j,Ef ∈ BSjt Ef (j = 1,2,...,Jt). The combined probability t

(CP) of each element Ef in Et is CP(Ef) = ∏jJ = 1ZtRjj,Ef (Ef ⊆ E, f = 1,2,...,F). F is the number of elements in set Et. Then the combined probability vector about set Et is CP(Et) = [CP(E1) CP(E2) ··· CP(EF)]T. (ii) Construct the quantitative pairwise comparison matrix (PCM) for the intersection set Et. The PCM is expressed as ⎡ 1 PCM(1, 2) ⎢ ⎢ PCM(2, 1) 1 PCM = ⎢ ⋮ ⋮ ⎢ ⎢ ⎣ PCM(F , 1) PCM(F , 2)

Figure 2. Proposed fusion strategy BPA-IAHP

3.2.1. Estimation of Basic Probability Assignment Function. The confusion matrix CMj of the jth classifier is usually constructed by testing the jth classifier performance on training or separate validation data sets and is typically represented as j ⎡ Cj C1,2 ⎢ 1,1 ⎢ j j ⎢ C2,1 C2,2 j CM = ⎢ ⋮ ⎢ ⋮ ⎢ j j ⎣C M j ,1 C M j ,2

⋯ C1,j M j ⎤ ⎥ ⎥ j ⋯ C2, M j ⎥ ⎥ ⋱ ⋮ ⎥ ⎥ ⋯ C Mj j , M j ⎦

⋯ PCM(1, F )⎤ ⎥ ⋯ PCM(2, j) ⎥ ⎥ ⋱ ⋮ ⎥ ⎥ 1 ⋯ ⎦

F×F

(12)

Each element in PCM is determined by ⎧ CP(Ei) − CP(Ej) ⎪ ⎪ max(CP(Et )) − min(CP(Et )) ⎪ ⎛ max(CP(Et )) ⎞ − η⎟ + 1 ⎪ ×⎜ t ⎝ min(CP(E )) ⎠ ⎪ PCM(i , j) = ⎨ if CP(E ) ≥ CP(E ) i j ⎪ ⎪1 if i = j ⎪ ⎪ 1 if CP(Ei) < CP(Ej) ⎪ PCM (j , i ) ⎩

(9)

The rows stand for the actual classes R1, R2, ··· RMj included in the jth classifier, while the columns indicate the classes assigned by the jth classifier. An element Cjm,n, in the confusion matrix, 3646

(13)

DOI: 10.1021/acs.iecr.5b04268 Ind. Eng. Chem. Res. 2016, 55, 3641−3658

Article

Industrial & Engineering Chemistry Research Table 3. Random Consistency Index (RI) of Comparison Matrix with Different Order (n) n RI

1 0

2 0

3 0.58

4 0.90

5 1.12

6 1.24

7 1.32

8 1.41

9 1.45

10 1.49

11 1.51

Figure 3. Overall application procedure

⎡ ω (1) ⎤ ⎡ B (E ) ⎤ 1 ⎢ max ⎥ ⎢ ⎥ ⎢ ωmax (2) ⎥ ⎢ B (E 2 ) ⎥ ωmax 1 = B (E t ) = ⎢ ⎥ ⎢ ⎥= sum(ωmax ) sum(ωmax ) ⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎥ ⎢ ⎢ ⎥ ⎣ B(EF )⎦ ⎣ ωmax (F )⎦

where max(CP(Et)) is the maximum value of CP(Et), min(CP(Et)) is the minimum value of CP(Et) and ⎡ max(CP(Et )) η η ∈ ⎣0, min(CP(Et )) is an adjustable parameter. With the adjustable parameter η, the comparison matrix can be modified to meet the consistency requirement. (iii) Calculate the largest eigenvalue λmax and its corresponding eigenvector ωmax from the comparison matrix PCM. The normalized elements of eigenvector ωmax are the final combined beliefs (CB) of each element in Et. The combined belief is defined as the following equation:

(

))

(14)

(iv) Check the consistency for pairwise comparison matrix. The consistency index (CI) is defined by the formulas: 3647

DOI: 10.1021/acs.iecr.5b04268 Ind. Eng. Chem. Res. 2016, 55, 3641−3658

Article

Industrial & Engineering Chemistry Research

Figure 4. Flow diagram of the TE process

CI =

λmax − F F−1

hierarchy rule; (2) construct multiple diagnostic modules using ELM algorithm; (3) evaluate the prior performance of diagnostic modules. The work of online phase is to combine the results from different activated modules using the BPA-IAHP data fusion strategy and make a final alarm root-cause diagnosis. The online proportion of our framework consists of two major steps: (1) combine the diagnosis results by use of IAHP fusion strategy; (2) rank the potential alarm root causes and determine the most probable one and the diagnosis delay.

(15)

The consistency ratio (CR) is then calculated using the formulas: CR =

CI RI

(16)

where RI is the random consistency index. Table 3 lists the different value of RI associated with the different order of comparison matrix. When CR is less than 0.1, the comparison matrix PCM is considered to meet the consistency requirement. Otherwise, the comparison matrix PCM should be modified by adjusting η in ascending order until CR is less than 0.1. Compared with the traditional analytic hierarchy process, the difference and also the advantage of our improved analytic hierarchy process is that the comparison matrix is determined by objective calculation instead of subjective decision, and the consistency of the final result can be met by simply adjusting the parameter η (see Appendix A.2).

5. APPLICATION EXAMPLE This section provides the test of our alarm root cause diagnosis framework on the Tennessee Eastman (TE) industrial challenge problem.48 The case studies show that the proposed method performs well in modeling the complex process and diagnosing the alarm root causes. 5.1. Tennessee Eastman Chemical Process. The TE process comprises six major process units: a feed mixer, a reactor, a product condenser, a vapor−liquid separator, a recycle compressor, and a product stripper. The diagram of the process with a self-optimizing control strategy49 is shown in Figure 4. This process contains 12 manipulated variables, 22 continuous process measurements, and 19 composition measurements. According to the control strategy, not all the measurements have equal importance or necessity in our study. Of these 9 process manipulated variables, 20 continuous process measurements, and 3 composition measurements are used and configured with alarm (see Table 4). The simulation contains 20 process disturbances of which 7 “step type” disturbances are used as the predefined alarm root causes in our case study (see Table 5). The variable symbols and alarm root cause symbols shown in Table 4

4. PROPOSED FRAMEWORK FOR ALARM ROOT-CAUSE DIAGNOSIS On the basis of the above analysis, the coherent framework for alarm root-cause diagnosis is illustrated in Figure 3. The systematic procedures are mainly divided into two major phases: offline stage and online stage. The offline phase consists of three major steps and covers the bulk of the work that needs to be done to implement the alarm root cause diagnosis scheme. The three major steps are listed as follows: (1) construct the spatial interpretive structural model by use of process flow sheet, partial correlation analysis, and ISM 3648

DOI: 10.1021/acs.iecr.5b04268 Ind. Eng. Chem. Res. 2016, 55, 3641−3658

Article

Industrial & Engineering Chemistry Research

time instance t is collected if one or more alarms happen. For ELM model training and testing in the offline phase, one third of the above collected data sets is considered as the training data and one-third of that is considered as the testing data. 5.2. Offline Phase Preparation. 5.2.1. Spatial Directed Causal Graph Identification. The first task in the proposed framework is to decompose the process and generate the SDM from the process flow diagram and very superficial knowledge. As depicted in Figure 4, the TE process can be decomposed into six subblocks, namely, S1-feed, S2-reactor, S3-condenser, S4separator, S5-stripper, S6-compressor, and purge (C&P) and S7-product. These subblocks correspond to the rows of the SDM. Columns of the SDM correspond to the total 32 monitored variables and thus the SDM is a 7 × 32 matrix. The corresponding entry will be 1 if the variable is monitored in the subblock. Hence, the SDM shown in Table 6 can be obtained. The next step is to capture the causality between variables. On the basis of the flow diagram, one can readily notice that variables F1, F2, F3, and F4 monitored in the S1-feed and variable F5 monitored in the S6-C&P have an influence on the variable F6 monitored in the S1-feed, so the corresponding entries in the column of F6 are 1. The variable F6 has an influence on the variables P7, L8, and T9 monitored in its adjacent subblock S2reactor. So the corresponding entries in the row of F6 are 1. Similarly, some entries of the other rows or columns are assigned 1 accordingly and finally the obvious causality denoted by the bold type shown in Table 7 can be obtained. The remaining causality should be captured by the data-driven method. The threshold α is set as 0.5 (see Appendix A.1). Following the steps described in section 2.2, the final SBAM is shown in Table 7. By adding transitivity, the SBRM is shown in Table 8. With the constructed SBRM, the macrohierarchy and microhierarchy can be divided following the descriptions in section 2.3. After this division, the directed arcs denoting the direct causality are added between any two adjacent related variables based on the SBAM. In this way, all subgraphs are determined and then the final entire graph SISM is identified as shown in Figure 5. From Figure 5 we can see that the total seven subblocks is decomposed in two macrohierarchical levels, namely, the bottom level and the top level. The macrohierarchy denotes the external causality distribution. The subblocks S1, S5, S6, and S7 are only on the top level while the subblocks S2, S3, and S4 are both on the top level and the bottom level. The variables on the top level have influence on their child nodes both on the top level and the bottom level; whereas the variables on the bottom level have no children nodes and they are influenced by their parent nodes. In

Table 4. Monitored Variables of TE Process variable

description

variable

Mv1 Mv2 Mv3 Mv4 Mv6 Mv7

L8 T9 F10 T11 L12 P13

reactor level reactor temperature purge rate (stream 9) separator temperature separator level separator pressure

F14

separator underflow (stream 10) stripper level stripper pressure

F1

D feed flow (stream 2) E feed flow (stream 3) A feed flow (stream 1) total feed flow (stream 4) purge valve (stream 9) separator pot liquid flow (stream 10) stripper liquid product flow (stream 11) reactor cooling water flow condenser cooling water flow A feed (stream 1)

F2 F3

D feed (stream 2) E feed (stream 3)

T18 T21

F4

A and C feed (stream 4)

T22

F5 F6

recycle flow (stream 8) reactor feed rate (stream 6) reactor pressure

RA RC

Mv8 Mv10 Mv11

P7

L15 P16 F17

SG

description

stripper underflow (stream 11) stripper temperature reactor cooling water outlet temperature condenser cooling water outlet temperature composition of A (stream 6) composition of C (stream 6) composition of G (stream 11)

Table 5. Predefined Alarm Root Causes of TE Process alarm root-cause ID

description

r1 r2 r3 r4 r5 r6 r7

A/C feed ratio, B composition constant (stream 4) B composition, A/C ration constant (stream 4) D feed temperature (stream 2) reactor cooling water inlet temperature condenser cooling water inlet temperature A feed loss (stream 1) C header pressure lossreduced availability (stream 4)

and Table 5 will be used later in this section. The variables are sampled every 0.01 h. The high alarm limits and low alarm limits are set using the common three sigma method. For partial correlation analysis, the process data set consisting of 10000 samples generated under normal operations at mode 1 is used. For alarm root causes grouping and clustering, in all the seven cases, the process begins with normal operating conditions from the first through the 5000th samples, then the faulty operation occurs from the 5001st through the 10000th samples. In each faulty case under the faulty condition, the data point at Table 6. Spatial Distribution Matrix of TE Process

3649

DOI: 10.1021/acs.iecr.5b04268 Ind. Eng. Chem. Res. 2016, 55, 3641−3658

Article

Industrial & Engineering Chemistry Research Table 7. Spatial Block Adjacent Matrix of TE Process

Table 8. Spatial Block Reachability Matrix of TE Process

5.2.2. Construction of Multiple Diagnostic Modules and Estimation of BPAs. Once the subblocks are divided, an equal number of the diagnostic modules comprising the variables in each subblock and its adjacent child nodes are constructed

each subblock, the microhierarchy denotes the internal causality distribution. With the help of SISM, we can readily recognize the hierarchical causal links and possible alarm propagation paths. 3650

DOI: 10.1021/acs.iecr.5b04268 Ind. Eng. Chem. Res. 2016, 55, 3641−3658

Article

Industrial & Engineering Chemistry Research

Figure 5. Spatial interpretive structural model of the TE process

are thus different from the offline training and testing data in terms of data length and time of fault introduction. As a comparison, the Dempster−Shafer evidence fusion based fault detection method42 is used to examine the effectiveness of our proposed framework. Besides, to indicate the advantage of process decomposition and alarm root causes grouping in our method, the singular ELM diagnostic model is used for comparison. In the case of r1, the abnormal operating event is the step change of A/C feed ratio in stream 4 from the 3001st to the 8000th samples. For simplicity, we consider the fault introduction time as t = 0. The alarms first occur within submodule M5 at t = 9, thus the diagnostic model ELM 5 is activated to perform diagnosis. The output of ELM 5 is the compound class r3/r4/r5. Then, at t = 11, there are alarms occurring within submodules M2 and M5. M2 identifies this fault as the compound class r1/r3/r7, and M5 also identifies it as r3/ r4/r5. Until at t = 31, the submodules M1, M2, M4, M5, and M6 are all in an alarm state and their corresponding diagnostic models are activated. The outputs of these five models are r4/r4/ r5, r1/r3/r7, r3/r4/r5/r7, r1, and r1/r5/r7, respectively. Conflicts arise when different diagnostics models yield different diagnosis results. As can be seen, in this scenario, the individual submodules are in congruence only until t = 9, when all the models identify the process as normal. From t = 9 forth the outputs conflict; this is resolved through fusion. Table 10 shows the BPA values of different submodules at various times. At t = 9, only M5 identifies the data sample as r3/ r4/ r5 while the rest classify the process as normal. On the basis of the BPAs estimation of ELM 5 in Figure 6, the BPAs assigned to r1, r2, r3/r4/r5, r6, and r7 are just the values in the row of r3/r4/ r5 of BM5. The combined belief is equal to the BPAs of M5. The term with the maximum belief is regarded as the alarm root cause, namely r3/r4/r5. At t = 11, two modules M2 and M5 are activated. The recognized classes are r1/r3/r7 and r3/r4/r5,

(shown in Table 9). Because there are few or no alarms in M2 under fault r5, the assigned alarm root causes to M2 are r1, r2, r3, Table 9. Variables and Alarm Root Causes Assigned to Each Submodule variables in submodule submodule ID M1

M2 M3 M4 M5

M6 M7

variables in subblock Mv1, MV2, MV3, MV4, F1, F2, F3, F4, F6, RA, RC MV10, P7, L8, T9, T21 MV11, T22 MV7, T11, L12, P13, F14 MV8, L15, P16, F17, T18 MV6, F5, F10 SG

adjacent child nodes P7, L8, T9, L12, L15, T18 Mv6, Mv11, T11, L12, P13, T22 T11 Mv11, L15, T18 Mv1, Mv2, Mv3, Mv4, Mv6, Mv7, P7, SG F6, P13 Mv1, Mv2

alarm root causes atomic

compound

r1, r2, r6, r7 r2, r4, r6 r5, r6 r1, r2, r6 r1, r2, r6, r7 r2, r6 r6

r3/r4/r5

r1/r3/r7 r1/r2/r7 r3/r4/r5/r7 r3/r4/r5

r1/r5/r7

r4, r6, and r7. In the same way, the assigned alarm root causes to all these seven submodules are listed in Table 9. By performing fuzzy c-means algorithm on the collected process data, the grouped results of alarm root causes are shown in Table 9. The ELM models corresponding to the seven submodules are then trained and tested. The BPA estimation based on the testing results for each submodule is shown in Figure 6. Next, we discuss the online alarm root-cause diagnosis and the performance of our proposed framework. 5.3. Online Alarm Root-Cause Diagnosis and Results Comparison. To generate the online testing data, seven runs are performed. Each testing run simulates 80 h of operations, with the fault introduced at 30 h. The signals for each online test run 3651

DOI: 10.1021/acs.iecr.5b04268 Ind. Eng. Chem. Res. 2016, 55, 3641−3658

Article

Industrial & Engineering Chemistry Research

Figure 6. BPAs estimation for seven diagnostic ELM models

Table 10. BPA-IAHP Based Fusion Strategy for Case r1 in the TE Process time 9

11

12

19

31

M1

M2

M3

M4

M5

recognized class individual belief

M7

combined belief

r3\r4\r5

B2(1\3\7),(1\3\7) = 0.88 B2(1\3\7),2 = 0.06 B2(1/3/7),6 = 0.06

recognized class individual belief

recognized class individual belief

η = 1014 B(r1) = 0.0047

B5(3\4\5),2 = 0.01 B5(3\4\5),(3\4\5) = 0.96 B5(3\4\5),6 = 0.01 B5(3\4\5),7 = 0.01

B(r2) = 0.0013 B(r3) = 0.9879

r1\r3\r7

r3\r4\r5\r7

r3\r4\r5

B2(1\3\7),(1\3\7) = 0.88 B2(1\3\7),2 = 0.06 B2(1\3\7),6 = 0.06

B4(3\4\5\7),1 = 0.01

B2(3\4\5),1 = 0.01

B2(3\4\5\7),2 = 0.06 B2(1\3\4\5\7),(3\4\5\7) = 0.85 B4(3\4\5\7),6 = 0.08

r1\r3\r7

r3\r4\r5\r7

B4(3\4\5),2 = 0.01 B5(3\4\5),(3\4\5) = 0.96 B5(3\4\5),6 = 0.01 B5(3\4\5),7 = 0.01 r3\r4\r5

B2(1\3\7),(1\3\7) = 0.88 B2(1\3\7),2 = 0.06 B2(1\3\7),6 = 0.06

B4(3\4\5\7),1 = 0.01

B5(3\4\5),1 = 0.01

B4(3\4\5\7),2 = 0.06 B4(3\4\5\7),(3\4\5\7) = 0.85 B4(3\4\5\7),6 = 0.08

r3\r4\r5

r1\r3\r7

r3\r4\r5\r7

B5(3\4\5),1 = 0.01

B2(1\3\7),(1\3\7) = 0.88 B2(1\3\7),2 = 0.06 B2(1\3\7),6 = 0.06

B4(3\4\5\7),1 = 0.01

B51.1 = 1.00

B4(3\4\5\7),2 = 0.06 B4(3\4\5\7),(3\4\5\7) = 0.85 B4(3\4\5\7),6 = 0.08

B(r7) = 0.01

B5(3\4\5),1 = 0.01

B5(3\4\5),2 = 0.01 B5(3\4\5),(3\4\5) = 0.96 B5(3\4\5),6 = 0.01 B5(3\4\5),7 = 0.01 r1

B5(3\4\5),2 = 0.03 B5(3\4\5),(3\4\5) = 0.96

B(r1) = 0.01 B(r2) = 0.01 B(r3\ r4\ r5) = 0.96 B(r6) = 0.01

B5(3\4\5),2 = 0.01 B5(3\4\5),(3\4\5) = 0.96 B5(3\4\5),6 = 0.01 B5(3\4\5),7 = 0.01 r3\r4\r5

r1\r3\r7

fusion result r3/r4/r5

B5(3\4\5),1 = 0.01

recognized class individual belief

recognized class individual belief

M6

B(r6) = 0.0013 B(r7) = 0.0047 CR = 0.09 η = 19466 B(r1) = 0.0015 B(r 2) = 0.0014

r3

r3

B(r3) = 0.9897 B(r7) = 0.0060 CR = 0.09 r1\r5\r7

η = 4413

B6(1\5\7),(1\5\7) = 0.90 B2(1\5\7),2 = 0.04 B5(2\5\7),6 = 0.06

B(r1) = 0.0075 B(r2) = 0.0026 B(r6) = 0.0027

r1\r5\r7

B(r7) = 0.9871 CR = 0.09 B(r1) = 1.00

r7

r1

B6(1\5\7),(1\5\7) = 0.90 B2(1\3\7),2 = 0.04 B4(3\4\5\7),2 = 0.06

respectively. Once the individual BPAs of these two submodules are assigned, the IAHP is used to calculate the combined belief. 3652

DOI: 10.1021/acs.iecr.5b04268 Ind. Eng. Chem. Res. 2016, 55, 3641−3658

Article

Industrial & Engineering Chemistry Research Table 11. Performance Comparison of Various Submodules and BPA-IAHP Based Fusion Strategya individual diagnostic submodule

M1

a

M2

M3

M4

M5

M6

BPAIAHP based fusion

M7

fault case

I

DD

I

DD

I

DD

I

DD

I

DD

I

DD

rl r2 r3 r4 r5 r6 r7

r1 r2 r3\r4\r5 r3\r4\r5 r3\r4\r5 r6 r7

52 298 900 926 899 0 1

r1\r3\r7 r2 r1\r3\r7 r4

10 104 1647 1

rl\r2\r7 rl\r2\r7

38 308

r5 r6 rl\r2\r7

308 95 11

r1 r2 r3\r4\r5 r3\r4\r5 r3\r4\r5 r6 r7

31 98 625 31 7 9 1

18 111

37 1

39 133 90 30 6 50 1

r1\r5\r7 r2

r6 r1\r3\r7

r1 r2 r3\r4\r5\r7 r3\r4\\r5\r r3\r4\\r5\r r6 r3\r4\\r5\r

r1\r5\r7 r6 r1\r5\r7

11 58 8

I

r6

DD

451

I

DD

r1 r2 r3 r4 r5 r6 r7

31 98 90 1 7 0 0

I = identified as; DD = diagnosis delay (sample no.)

can only recognize a subset of root causes. For a fair comparison, we deploy three metrics to evaluate the recognition rate. The local recognition rate RRLOC of the kth submodule is defined as k the percentage of samples correctly diagnosed from those samples that the kth submodule is triggered, that is,

The intersections of the possible alarm root causes from M2 and M5 are r1, r2, r3, r6, and r7. The combined probability of each intersection is calculated as follows: 2 5 CP(r1) = B(1\3\7),(1\3\7) × B(3\4\5),1 = 0.88 × 0.01 = 0.0088 2 5 CP(r 2) = B(1\3\7),2 × B(3\4\5),2 = 0.06 × 0.01 = 0.0006

CP(r 3) =

2 B(1\3\7),(1\3\7)

×

5 B(3\4\5),(3,4,5)

RR kLOC% =

= 0.88 × 0.96 = 0.8448

(19)

where TDk represents the number of samples correctly diagnosed by the kth submodule and FDk represents the number of samples wrongly diagnosed. The capable recognition rate RRCAP of the kth submodule is defined as the percentage of k samples correctly diagnosed from those samples that the kth submodule is capable of diagnosing, that is,

2 5 CP(r 6) = B(1\3\7),6 × B(3\4\5),6 = 0.06 × 0.01 = 0.0006 2 5 CP(r 7) = B(1\3\7),(1\3\7) × B(3\4\5),7 = 0.88 × 0.01 = 0.0088

(17)

The combined belief is determined through eq 11, 12, and 13 as B(r1) = 0.0047

TDk × 100 % Θk TDk = × 100% TDk + FDk + MDk

RR CAP k % =

B(r 2) = 0.0013 B(r 3) = 0.9879 B(r 6) = 0.0013 B(r 7) = 0.0047

TDk × 100% TDk + FDk

(20)

where Θk denotes the total number of samples assigned to the kth submodule and MDk is the number of samples missed by the kth submodule because it is not triggered. The cumulative recognition rate RRCUM of the kth submodule is defined as the k percentage of samples correctly diagnosed from the total samples, that is,

(18)

The adjustable parameter η is set to the value of 1014. The consistency ratio CR is calculated as 0.09 smaller than 0.1. Thus the comparison matrix meets the consistency check. Until at t = 31, the submodule M5 has complete belief that the root cause is r1. The intersection has only one element r1. So the combined belief of r1 is 1.00. The alarm root cause is detected correctly with the diagnosis delay of 32. Similarly, the remaining six cases are examined and to maintain brevity the detailed process descriptions are not given here. Table 11 summarizes the performance of the different submodules as well as the BPA-IAHP based fusion strategy for the seven fault cases. In all cases, the proposed approach is capable of identifying all the alarm root causes quickly and correctly even though no individual submodule can diagnose all of them separately. Another benefit of this strategy is that it does not need confirmation from other submodules and flags an alarm root cause when the root cause is first successfully identified by a submodule with very high belief. Furthermore, the fusion scheme takes account into the prior knowledge of each diagnostic submodule, therefore the diagnosis delay is decreased. Generally, the percentage of samples correctly identified is called the “recognition rate”. In our work, only relevant alarm root causes are assigned to the submodules, thus some of them

RR CUM %= k

TDk × 100% Θ

(21)

where Θ is the total number of testing samples. The recognition rates of each submodule and BPA-IAHPbased fusion scheme are presented in Figure 7. The local recognition rate is reasonably high for all submodules (∼98%), but the capable and cumulative recognition rates of these seven submodules are extremely poor, varying from 1% to 59%. The recognition rates for BPA-IAHP based fusion are all very high (>95%). This fact demonstrates that the fusion strategy is superior to the individual diagnostic models, and the significant improvement in diagnosis performance can be obtained through fusion of the disparate submodules even if the outputs of some submodules are wrong. To demonstrate the superiority of our proposed framework over the singular diagnostic model, we additionally deployed the ELM algorithm to monitor the whole process and make the final diagnosis. The comparative results of diagnosis delay and 3653

DOI: 10.1021/acs.iecr.5b04268 Ind. Eng. Chem. Res. 2016, 55, 3641−3658

Article

Industrial & Engineering Chemistry Research

changed to 0.01 h. Only the delay of root cause r2 of our proposed framework being 98 is longer than that of D-S based scheme being 60. For the remaining six scenarios, our proposed framework can identify them more quickly than the D-S based scheme. Especially for the cases of r6 and r7, our proposed framework can identify them very quickly without any delay. In addition, the recognition rates of our proposed framework are extremely high (>99%), while that of D-S based scheme are relatively low (∼96%). Once the alarm root cause is determined, the alarm propagation pathway is searched and identified based on the SISM. We assume that as long as one variable goes into the alarm state, the fault has spread to it for certain. Two test scenarios discussion are presented next, and the probabilistic graphical network based method24 is used to demonstrate the effectiveness of our proposed framework. In the test case of r1, the abnormal operation is the step change of A/C feed ratio in stream 4. The stripper pressure (P16) in stripper (S5) goes into alarm first at t = 9 (considering t = 0 as the fault introduction time). From t = 9 to t = 31, the subblocks S5stripper, S2-reactor, S4-separator, S6-recycle&purge and S1-feed fire alarms successively. From t = 32 on, more variables in all the subblocks but S7-product turn into alarm state. By highlighting the 23 out of 32 variables ever in alarm state, the overall probable alarm propagation paths are identified as the causal links from the root cause r1 to the highlighted variables (see Figure 10). The bold dash lines staring form the node r1 represent the most adjacent pathways between alarm root cause and the first influenced variables. The bold solid lines represent the most probable and meaningful external propagation pathways from the upstream subblock to the downstream subblock. It should be noted that some intermediate variables on the propagation path are not in alarm state. This does not mean that the variables are not influenced by the abnormal event. On the contrary, the abnormal extent might be not significant enough to fire an alarm. Because of the alarm thresholds setting, some child nodes fire alarms earlier than their parent nodes. It is obvious that the abnormal event or fault r1 spreads broadly into the whole process following the complicated external and internal links among various variables. Because of the function of control system, only the variables MV3, F1, MV4, and F4 in S1 and T18 in S5 on the macro-top-layer of the whole process are in an alarm state. The remainder 17 variables are back to normal. As for the probabilistic graphical network-based method, it is difficult to select the root-cause variable under fault r1, and the fault propagation path cannot be identified. While using our proposed framework, the alarm root cause r1 is diagnosed correctly and the probable alarm propagation path can be identified. In the test case of r4, the abnormal operation is the step change of reactor cooling water inlet temperature. The reactor cooling water flow (MV10), reactor temperature (T9), and reactor cooling water outlet temperature (T21) in reactor (S2) goes into alarm first at t = 2 (considering t = 0 as the fault introduction time). Then, the reactor pressure (P7), condenser cooling water outlet temperature (T22), separator temperature (T11), and stripper temperature (T18) turn into an alarm state successively. By highlighting the 7 out of 32 variables ever in an alarm state, the overall probable alarm propagation paths are identified as the causal links from the root cause r4 to the highlighted variables (see Figure 11). The bold dash lines starting from the node r4 represent the most adjacent pathways between alarm root cause and the first influenced variables. The bold solid lines represent the most probable and meaningful external propagation

Figure 7. Local, capable, and cumulative recognition rates in the case study

diagnosis accuracy are illustrated in Figure 8. It is evident that our proposed framework exhibits great benefits both in accuracy and

Figure 8. Performance comparison between proposed framework and ELM

speed for alarm root cause diagnosis. This fact is ascribed to the process decomposition and data fusion in our framework. Each submodule is responsible to its own relevant and meaningful alarm root causes, and the contradictions among different submodules are effectively resolved through the BPA-IAHP fusion scheme. Figure 9 provides a performance comparison between our proposed framework and the D-S based scheme in ref 42. The sampling interval in ref 42 is 3 min while that in our program is 0.01 h. To make the comparison consistent, the unit of time is

Figure 9. Performance comparison between proposed framework and scheme in ref 42. 3654

DOI: 10.1021/acs.iecr.5b04268 Ind. Eng. Chem. Res. 2016, 55, 3641−3658

Article

Industrial & Engineering Chemistry Research

Figure 10. Identified alarm propagation pathway in the case of r1.

Figure 11. Identified alarm propagation pathway in the case of r4. 3655

DOI: 10.1021/acs.iecr.5b04268 Ind. Eng. Chem. Res. 2016, 55, 3641−3658

Article

Industrial & Engineering Chemistry Research

atomic or compound classes by fuzzy c-means method. Then, the extreme learning machine algorithm is deployed to conduct the diagnosis. The other key contribution of this paper is to integrate the basic probability assignment into the improved analytical hierarchical process to coherently fuse the outputs of multiple diagnostic submodules. This fusion strategy offers flexibility to represent the prior knowledge of each submodule and has the ability to combine both the atomic and compound class. In addition, when the alarm root cause is recognized, the fault propagation pathway can be visualized and explicated with the help of SISM. The results of the TE case study clearly demonstrate the effectiveness of our proposed framework: (i) all of the known alarm root causes can be detected and diagnosed with short diagnosis delays and high recognition rate (>99%); (ii) fault propagation pathways can be determined accurately and clearly. Since the framework may not be effective for a time-varying process, future work will explore the probabilistic temporalspatial causal networks to take into account of the dynamic nature and probabilistic feature. Furthermore, we intend to modify the current diagnostic algorithm to deal with the unknown and multiple root-causes problems.

pathways from the upstream subblock to the downstream subblock. It is obvious that the abnormal event or fault r1 spreads from the reactor to the condenser, separator, and stripper following the external links between variables. The reactor temperature (T9) and the reactor cooling water outlet temperature (T21) are the most adjacent variables to the root cause r4. Because of the function of control system, only the variables MV10 in the reactor on the macro-top-layer of the whole process are in an alarm state. The remainder 31 variables are back to normal. As for the probabilistic graphical network based method, the identified root cause variable is only the reactor cooling water outlet temperature (T21), and the identified fault propagation pathway is shown in Figure 12.

Figure 12. Identified fault propagation path in the case of r4 using the method in ref 24.



APPENDIX

A.1. Determination of Threshold α

Actually, the true fault is located at the reactor cooling water inlet temperature. This fault influences both the reactor temperature (T9) and the reactor cooling water outlet temperature (T21) directly. It means that these two variables T9 and T21 can be both considered as the root variables influenced by fault r4 rather than the T21 only. Besides, the T21 has almost no influence on T9. Because of the alarm threshold design in our method, the propagation pathway in Figure 10 is a little different from that in Figure 11. From the perspective of alarm system management, the variables in an alarm state are the focus. Therefore, the propagation path connecting the variables in an alarm state is preferable and meaningful.

The well-accepted rule listed in Table 1 may help users to decide the threshold α practically. The process may be still a little subjective; therefore, we deeply explore the effects of partial correlation coefficients and α on the cumulative recognition rate of alarm root causes as well as on the accuracy of the final causal graph SISM. The threshold α is finally decided by balancing these two kinds of effects. Following the application procedure depicted in Figure 3, the effect of different values of α on the cumulative recognition rate (see eq 21) of our proposed framework is shown in Figure A1.

6. CONCLUSION In this study, we propose a framework for alarm root cause diagnosis in a large scale industrial process. One key contribution of our research is the spatial interpretive structural model identification using the historical process data as well as the superficial process knowledge. On the basis of the process flow diagram, the entire process is decomposed into multiple subblocks, and a set of process variables is allocated into subblocks in accordance with the spatial distribution matrix. Some obvious internal causal links within subblocks and external causal links between subblocks can also be determined. Then the remaining causality between variables is extracted from the historical process data based on partial correlation analysis. By combining the causality captured from process knowledge and correlation calculation, the spatial block adjacent matrix and spatial block reachability matrix are constructed. After performing hierarchical partition on the spatial block reachability matrix using the rules of interpretive structural model, the subblocks are placed on different macrolevels, and the variables in each subblock are placed on different microlevels. The final spatial interpretive structural model is identified by adding the causal links between any two correlated variables. The same amount of diagnostic submodules composed of the subblock and its child nodes are created to monitor the process at varying levels of granularity. For more concentrated and specific diagnosis, the predefined atomic alarm root causes are clustered into either

Figure A1. Effect of threshold α on the cumulative recognition rate

From this figure, we can learn that when α is between 0.4 and 0.6 the cumulative recognition rate can be more than 90%. Especially when α is 0.5, the cumulative recognition rate can reach the highest value of 99.26%. Actually, the cumulative recognition rate depends on not only the causal graph but also the performance of submodule diagnosis and multimodule data fusion. In some cases, some values of alpha could contribute to the high recognition rate of alarm root causes while leading to a bad causal graph. Therefore, 3656

DOI: 10.1021/acs.iecr.5b04268 Ind. Eng. Chem. Res. 2016, 55, 3641−3658

Article

Industrial & Engineering Chemistry Research ⎛ max(CP(Et )) ⎞ min(CR) = min(ϕ(η)) = ϕ⎜ ⎟=0 ⎝ min(CP(Et )) ⎠

it is also necessary to explore the effect of threshold α on the accuracy of the final causal graph. But there is still no absolutely right causal model of TE process to do comparison, here we just take the probabilistic graphical network model in ref 24 as the reference. The false positive rate (FPR) and the false negative rate (FNR) are calculated at different threshold values. The FPR is expressed as the number of wrongly predicted causality divided by the total number of true non-causality, and the FNR is defined as the number of wrongly predicted non-causality as a fraction of the total number of true causality. Figure A2 depicts the effect of threshold α on the FPR and FNR. It is evident that the FPR (Type I error) and the FNR are in

(A-3)

Furthermore, the function ϕ(η) is monotonically decreasing. Here we don’t provide the proof because of the complexity. Therefore, by adjusting η in ascending order on the interval ⎡ max(CP(Et )) ⎣0, min(CP(Et )) we can ensure CR = ϕ(η) < 0.1 and the PCM meets the consistency requirement. Once the η* is found, the adjustment is done. Consequently, the combined belief is credible being used to rank the alarm root-causes candidates.

)



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work has been supported by the National Natural Science Foundation of China (61473026, 61573051).



Figure A2. Effect of threshold α on the FPR and FNR

trade-offs. By increasing α from 0.0 to 0.1, the FPR gets lower but the FNR gets higher. The perfect value of α should ensure both the FRP and the FNR are low. When α is between 0.45 and 0.55, both the FPR and the FNR are below 30%. Considering these two kinds of effects, the threshold α is set as 0.5 with the highest cumulative recognition rate being 99.26%, the balanced FPR of 16%, and FNR of 25%. A.2. Determination of Parameter η

In our proposed IAHP procedure (see section 3.2.2), the adjustable parameter η is introduced to ensure that the pair-wise comparison matrix PCM meets the consistency requirement. If CR is less than 0.1, the PCM is considered to meet the consistency requirement. Otherwise, the PCM should be modified by adjusting η in ascending order till CR is less than 0.1 (see eq 13). ⎡ max(CP(Et )) Assume CR = ϕ(η) η ∈ ⎣0, min(CP(Et )) , there must be η* making CR* = ϕ(η*) less than 0.1. This fact can be proven by eq (A-1), (A-2), and (A-3).

(

))

the function ϕ(η) is continuous on the closed interval:

⎡ max(CP(Et )) ⎤ ⎢0, ⎥ ⎣ min(CP(Et )) ⎦

(A-1)

⎛ ⎡ max(CP(Et )) ⎞⎞ CR = ϕ(η) > 0⎜η ∈ ⎢0, ⎟⎟ ⎣ min(CP(Et )) ⎠⎠ ⎝

(A-2)

REFERENCES

(1) EEUMA. Alarm SystemsA Guide to Design, Management and Procurement, 3rd ed.; EEMUA: London, 2013. (2) Management of Alarm Systems for the Process Industries. ANSI/ISA18.2-2009; International Society of Automation: 2009. (3) Yang, F.; Xiao, D. Progress in root cause and fault propagation analysis of large-scale industrial processes. J. Control Sci. Eng. 2012, 2012, 1. (4) Venkatasubramanian, V.; Rengaswamy, R.; Yin, K.; Kavuri, S. N. A review of process fault detection and diagnosis: Part I: Quantitative model-based methods. Comput. Chem. Eng. 2003, 27 (3), 293−311. (5) Venkatasubramanian, V.; Rengaswamy, R.; Kavuri, S. N. A review of process fault detection and diagnosis: Part II: Qualitative models and search strategies. Comput. Chem. Eng. 2003, 27 (3), 313−326. (6) Venkatasubramanian, V.; Rengaswamy, R.; Kavuri, S. N.; Yin, K. A review of process fault detection and diagnosis: Part III: Process history based methods. Comput. Chem. Eng. 2003, 27 (3), 327−346. (7) Ge, Z.; Song, Z.; Gao, F. Review of Recent Research on Data-Based Process Monitoring. Ind. Eng. Chem. Res. 2013, 52 (10), 3543−3562. (8) Zadakbar, O.; Imtiaz, S.; Khan, F. Dynamic Risk Assessment and Fault Detection Using Principal Component Analysis. Ind. Eng. Chem. Res. 2013, 52 (2), 809−816. (9) Wang, X.; Kruger, U.; Lennox, B. Recursive partial least squares algorithms for monitoring complex industrial processes. Control Engineering Practice 2003, 11 (6), 613−632. (10) Jiang, Q.; Yan, X.; Tong, C. Double-weighted independent component analysis for non-Gaussian chemical process monitoring. Ind. Eng. Chem. Res. 2013, 52 (40), 14396−14405. (11) Rato, T. J.; Reis, M. S. Fault detection in the Tennessee Eastman benchmark process using dynamic principal components analysis based on decorrelated residuals (DPCA-DR). Chemom. Intell. Lab. Syst. 2013, 125, 101−108. (12) Li, N.; Yang, Y. Ensemble Kernel Principal Component Analysis for Improved Nonlinear Process Monitoring. Ind. Eng. Chem. Res. 2015, 54 (1), 318−329. (13) Zhang, Y.; An, J.; Zhang, H. Monitoring of time-varying processes using kernel independent component analysis. Chem. Eng. Sci. 2013, 88, 23−32. (14) Fan, J.; Wang, Y. Fault detection and diagnosis of non-linear nonGaussian dynamic processes using kernel dynamic independent component analysis. Inf. Sci. 2014, 259, 369−379. (15) Maurya, M. R.; Rengaswamy, R.; Venkatasubramanian, V. Application of signed digraphs-based analysis for fault diagnosis of 3657

DOI: 10.1021/acs.iecr.5b04268 Ind. Eng. Chem. Res. 2016, 55, 3641−3658

Article

Industrial & Engineering Chemistry Research chemical process flowsheets. Eng. Appl. Artificial Intell. 2004, 17 (5), 501−518. (16) He, B.; Chen, T.; Yang, X. Root cause analysis in multivariate statistical process monitoring: Integrating reconstruction-based multivariate contribution analysis with fuzzy-signed directed graphs. Comput. Chem. Eng. 2014, 64 (0), 167−177. (17) Dahlstrand, F. Consequence analysis theory for alarm analysis. Knowledge-Based Systems 2002, 15 (1), 27−36. (18) Schleburg, M.; Christiansen, L.; Thornhill, N. F.; Fay, A. A combined analysis of plant connectivity and alarm logs to reduce the number of alerts in an automation system. J. Process Control 2013, 23 (6), 839−851. (19) Yang, F.; Shah, S. L.; Xiao, D.; Chen, T. Improved correlation analysis and visualization of industrial alarm data. ISA Trans. 2012, 51 (4), 499−506. (20) Yuan, T.; Qin, S. J. Root cause diagnosis of plant-wide oscillations using Granger causality. J. Process Control 2014, 24 (2), 450−459. (21) Bauer, M.; Cox, J. W.; Caveness, M. H.; Downs, J. J. Finding the Direction of Disturbance Propagation in a Chemical Process Using Transfer Entropy. Control Systems Technology, IEEE Transactions on 2007, 15 (1), 12−21. (22) Ping, D.; Fan, Y.; Tongwen, C.; Shah, S. L. Detection of direct causality based on process data. American Control Conference (ACC), 2012 2012, 50 (6), 3522−3527. (23) Yu, J.; Rashid, M. M. A novel dynamic bayesian network-based networked process monitoring approach for fault detection, propagation identification, and root cause diagnosis. AIChE J. 2013, 59 (7), 2348− 2365. (24) Mori, J.; Mahalec, V.; Yu, J. Identification of probabilistic graphical network model for root-cause diagnosis in industrial processes. Comput. Chem. Eng. 2014, 71, 171−209. (25) Ge, Z.; Song, Z. Distributed PCA Model for Plant-Wide Process Monitoring. Ind. Eng. Chem. Res. 2013, 52 (5), 1947−1957. (26) Tong, C.; Song, Y.; Yan, X. Distributed Statistical Process Monitoring Based on Four-Subspace Construction and Bayesian Inference. Ind. Eng. Chem. Res. 2013, 52 (29), 9897−9907. (27) Jiang, Q.; Wang, B.; Yan, X. Multiblock Independent Component Analysis Integrated with Hellinger Distance and Bayesian Inference for Non-Gaussian Plant-Wide Process Monitoring. Ind. Eng. Chem. Res. 2015, 54 (9), 2497−2508. (28) Warfield, J. N. Toward interpretation of complex structural models. Systems, Man and Cybernetics, IEEE Transactions on 1974, 4 (5), 405−417. (29) Mathiyazhagan, K.; Haq, A. N. Analysis of the influential pressures for green supply chain management adoptionan Indian perspective using interpretive structural modeling. The International Journal of Advanced Manufacturing Technology 2013, 68 (1−4), 817−833. (30) Mangla, S.; Madaan, J.; Sarma, P. R. S.; Gupta, M. P. Multiobjective decision modelling using interpretive structural modelling for green supply chains. International Journal of Logistics Systems and Management 2014, 17 (2), 125−142. (31) Govindan, K.; Azevedo, S. G.; Carvalho, H.; Cruz-Machado, V. Lean, green and resilient practices influence on supply chain performance: interpretive structural modeling approach. Int. J. Environ. Sci. Technol. 2015, 12 (1), 15−34. (32) Bouzon, M.; Govindan, K.; Rodriguez, C. M. T. Reverse Logistics Barriers: An Analysis Using Interpretive Structural Modeling. In Enhancing Synergies in a Collaborative Environment; Springer: 2015; pp 95−103. (33) Govindan, K.; Palaniappan, M.; Zhu, Q.; Kannan, D. Analysis of third party reverse logistics provider using interpretive structural modeling. International Journal of Production Economics 2012, 140 (1), 204−211. (34) De La Fuente, A.; Bing, N.; Hoeschele, I.; Mendes, P. Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics 2004, 20 (18), 3565−3574. (35) Zuo, Y.; Yu, G.; Tadesse, M. G.; Ressom, H. W. Biological network inference using low order partial correlation. Methods 2014, 69 (3), 266−273.

(36) Kenett, D. Y.; Huang, X.; Vodenska, I.; Havlin, S.; Stanley, H. E. Partial correlation analysis: applications for financial markets. Quantitative Finance 2015, 15 (4), 569−578. (37) Huang, G.-B.; Zhu, Q.-Y.; Siew, C.-K. Extreme learning machine: theory and applications. Neurocomputing 2006, 70 (1), 489−501. (38) Cao, F.; Liu, B.; Park, D. S. Image classification based on effective extreme learning machine. Neurocomputing 2013, 102, 90−97. (39) Han, F.; Huang, D.-S. Improved extreme learning machine for function approximation by encoding a priori information. Neurocomputing 2006, 69 (16), 2369−2373. (40) He, Y.-L.; Geng, Z.-Q.; Xu, Y.; Zhu, Q.-X. A hierarchical structure of extreme learning machine (HELM) for high-dimensional datasets with noise. Neurocomputing 2014, 128, 407−414. (41) Iosifidis, A.; Tefas, A.; Pitas, I. Dynamic action recognition based on dynemes and extreme learning machine. Pattern Recognition Letters 2013, 34 (15), 1890−1898. (42) Ghosh, K.; Natarajan, S.; Srinivasan, R. Hierarchically Distributed Fault Detection and Identification through Dempster−Shafer Evidence Fusion. Ind. Eng. Chem. Res. 2011, 50 (15), 9249−9269. (43) Kazemi, M.; Kariznoee, A.; Hosseini Moghadam Ghouchan Kohneh, S. M. R. Prioritizing factors affecting bank customers using kano model and analytical hierarchy process. Int. J. Account. Financ. Manage.-IJAFM 2013, 6, 105. (44) Luthra, S.; Garg, D.; Haleem, A. Identifying and ranking of strategies to implement green supply chain management in Indian manufacturing industry using Analytical Hierarchy Process. J. Ind. Eng. Manage. 2013, 6 (4), 930−962. (45) Nefeslioglu, H. A.; Sezer, E. A.; Gokceoglu, C.; Ayas, Z. A modified analytical hierarchy process (M-AHP) approach for decision support systems in natural hazard assessments. Comput. Geosci. 2013, 59, 1−8. (46) Althuwaynee, O. F.; Pradhan, B.; Park, H.-J.; Lee, J. H. A novel ensemble bivariate statistical evidential belief function with knowledgebased analytical hierarchy process and multivariate statistical logistic regression for landslide susceptibility mapping. Catena 2014, 114, 21− 36. (47) Reprinted from Chinese Journal of Chemical Engineering, 23(12), Gao, H., Xu, Y., Gu, X., et al. Systematic rationalization approach for multivariate correlated alarms based on interpretive structural modeling and Likert scale, 1987−1996, Copyright 2015, with permission from Elsevier. (48) Downs, J. J.; Vogel, E. F. A plant-wide industrial process control problem. Comput. Chem. Eng. 1993, 17 (3), 245−255. (49) Yu, H.; Khan, F.; Garaniya, V.; Ahmad, A. Self-Organizing Map Based Fault Diagnosis Technique for Non-Gaussian Processes. Ind. Eng. Chem. Res. 2014, 53 (21), 8831−8843.

3658

DOI: 10.1021/acs.iecr.5b04268 Ind. Eng. Chem. Res. 2016, 55, 3641−3658