Integrated Framework of Probabilistic Signed Digraph Based Fault

Jun 6, 2011 - signed digraph (PSDG) is proposed and applied to a gas fractionation unit. On the basis of the primary method of PSDG presented...
0 downloads 0 Views 1MB Size
ARTICLE pubs.acs.org/IECR

Integrated Framework of Probabilistic Signed Digraph Based Fault Diagnosis Approach to a Gas Fractionation Unit Ning L€u,† Zhihua Xiong,*,† Xiong Wang,† and Changrui Ren‡ † ‡

Department of Automation, Tsinghua University, Beijing 100084, China Supply Chain/Logistics and Customer Relationship Management, IBM China Research Laboratory, Beijing 100094, China ABSTRACT: An integrated implementation solution and theoretical framework of fault diagnosis approach based on probabilistic signed digraph (PSDG) is proposed and applied to a gas fractionation unit. On the basis of the primary method of PSDG presented in our previous work, a complete framework of PSDG is constructed, including its definition and reasoning to its implementation. Nodes and branches in PSDG contain uncertain information and their a priori conditional probabilistic parameters are decided by using little knowledge of the studied plant; thus, a PSDG model can be built properly. After cycle processing and model simplification, PSDG reasoning can be conducted approximately on the basis of consistent rule. In implementation of PSDG, the probabilities of candidate faults can be computed and arranged, and the most possible fault is found. Therefore the real fault cause can be further confirmed reasonably. Compared with the conventional qualitative SDG, the qualitative ambiguities in PSDG can be reduced to some extent. The proposed method is applied to a gas fractionation unit, and experimental results on real operation data show the validity and advantages of the PSDG framework.

1. INTRODUCTION The idea of signed digraph (SDG) based fault diagnosis was first presented by Iri in 1979.1 As a kind of qualitative causal models, SDG can describe a target system as a qualitative network with nodes and directed arcs, providing direct and visual reflection of the system for the supervisors. Unlike the conventional quantitative model based approaches, SDG approach does not require complete a priori knowledge or precise mathematical descriptions of the system, which are often unavailable in practice. In addition, due to its good diagnostic completeness and in-depth reasoning ability, SDG approach has already been widely applied in the chemical industry. During the past 3 decades, the technology of SDG based fault diagnosis has been extended greatly. In 1980s, the study of SDG based fault diagnosis mainly focused on an entirely qualitative aspect, and the qualitative causal information of the system is only used.2,3 The concept of fault reveal time was also introduced into the SDG.4 In 1990s, many researchers paid attention to combining fuzzy set with SDG.57 Other approaches were also developed by combining SDG with multivariable statistical process monitoring (MSPM) since 1999, such as PCA-SDG,8 PLS-SDG,912 and KPLS-SVRSDG.13 These methods can utilize the advantages of both the model based approach and the data driven approach. Essentially, all above-mentioned approaches are based on the model structure and inference mechanism of conventional qualitative SDG. Very little quantitative information of the system is used to quantify the weight values of the directed arcs in SDG (such as fuzzy-SDG) or to detect qualitative fault symptoms in order to trigger consequent SDG reasoning (such as PCA-SDG, PLS-SDG). However, SDG reasoning usually results in many spurious interpretations about faults, due to a lack of adequate quantitative information of the system. For example, several fault causes that result in the same qualitative symptoms cannot be distinguished by SDG. Therefore, in this case, the diagnostic r 2011 American Chemical Society

conclusion drawn from only the SDG approach will be a candidate fault set including all of these possible fault causes, and these candidate faults cannot be distinguished further. In other words, the SDG approach shows a tendency to consider that each fault cause in the candidate fault set has the equal possibility of occurrence and the contribution of each fault cause to the qualitative symptoms is also equal. From the perspective of probability, the SDG approach will be equivalent to assigning an equal posterior probability to each candidate fault cause. Although the qualitative characteristic of the SDG approach can make it easier to achieve the purpose of fault diagnosis, the qualitative ambiguities inevitably depress the fault resolution. Therefore, the problem of qualitative ambiguities should be solved to improve diagnostic performance. To address this issue, considering a great deal of uncertain information contained in large-scale systems, the idea of probabilistic signed digraph (PSDG) was proposed.1416 There are two different kinds of implementation solutions derived respectively. One solution is to build a PSDG model straightly as a directed acyclic graph by adding some new self-defined cause nodes in the corresponding SDG.14,15 In this sense, the PSDG model is more like a kind of Bayesian network (BN), and the related conclusions of BN can be directly applied to this kind of PSDG. However, after adding new nodes, the workload and difficulty of PSDG modeling is also increased, and more a priori knowledge of these new nodes is required. In addition, some particular properties of SDG, which are very useful for accomplishing fault diagnosis, are not reserved and inherited in this kind of PSDG. Another solution reserves the original model structure of the Received: January 4, 2011 Accepted: June 6, 2011 Revised: May 8, 2011 Published: June 06, 2011 10062

dx.doi.org/10.1021/ie200016t | Ind. Eng. Chem. Res. 2011, 50, 10062–10073

Industrial & Engineering Chemistry Research traditional SDG and introduces probabilistic parameters. This solution was proposed in our previous work,16 and Song17 also presented a similar solution. Compared with the former, this method can obtain the PSDG model from its corresponding SDG model only by introducing probabilistic parameters, without any modification to the model structure. All particular properties of the traditional SDG are reserved for consequent diagnostic reasoning in the PSDG. In the above two solutions of PSDG, except for qualitative information, conditional probabilities as a kind of quantitative information are introduced to describe the relationships between system variables, and then probabilities of candidate faults can be determined. Furthermore, the probabilities of fault occurrence and sensor failure can be used in conjunction with SDG for sensor location.1820 However, these studies1417 on PSDG still focused on the elementary concepts and primary methods and have been far from forming a complete framework. Many theoretical and methodological issues about PSDG modeling and reasoning have not been discussed in detail. Therefore, on the basis of the PSDG concept proposed in our previous work,16 this study suggests an integrated theoretical and methodological framework to construct a basically complete procedure of PSDG based fault diagnosis approach. The paper is organized as follows. In section 2, the definitions of PSDG as well as conventional SDG are reviewed briefly. In section 3, the integrated framework of the PSDG approach is described in detail. First, in section 3.1, a series of evaluation approaches of a priori conditional probabilistic parameters based on little knowledge of the target system are proposed for PSDG modeling. The basic ideas of PSDG reasoning are then described in section 3.2, the consistent rule in SDG is extended to PSDG and the approximate calculation based on this rule is discussed in section 3.3, and cycle processing and model simplification for PSDG reasoning are presented In section 3.4. In section 3.5, the whole procedure of the integrated PSDG framework is illustrated. In section 4, the implementation of the PSDG framework to a gas fractionation unit is presented. The comparison of PSDG with BN is discussed, and the influence of a priori probabilistic parameters on PSDG reasoning results is also analyzed. Finally, conclusions and some directions for future work are discussed.

2. DEFINITION OF PSDG PSDG can be considered as the inheritance and development of the traditional SDG. Therefore, some basic concepts of SDG are reviewed briefly before introducing PSDG. Definition 1:21. An SDG model γ = (G,j) is a combination of a directed graph G and a function j. The directed graph G is composed of four portions (V, E, ∂+, ∂), where V = {v1, v2, ..., vm} is the node set, E = {e1, e2, ..., en} is the branch set, ∂+: EfV is the initial node of a branch, and ∂: EfV is the terminal node of a branch, respectively. The function j is defined as j: E f {+,}, and j(ek) (ek ∈ E) is the sign of the branch ek. In SDG, the nodes denote variables in the target system and the directed branches denote the relationships from the initial node (cause variable) to the terminal node (effect variable). In a conventional three-state SDG model, the node has three states, “0”, “+”, and “”; the directed branch can describe two kinds of relationship, i.e., positive effect and negative effect. More details can be found in refs 1 and 21.

ARTICLE

A PSDG model is obtained on the basis of the SDG model by introducing the probabilistic information of the nodes and branches. The model structure of PSDG is quite similar to that of SDG, but PSDG can contain more information than SDG and obtain more accurate description of the system. The PSDG model has been proposed in our previous work,16 and its definition is reviewed briefly here. Definition 2:16. A PSDG model γ = (G,ψ,p) is a combination of a directed graph G, a function ψ, and a probability distribution p. The directed graph G is the same as the definition in SDG and is also composed of four portions (V, E, ∂+, ∂). The function ψ denotes the state values classified qualitatively, where ψ(vk) is the state value of the node vk (vk ∈ V), and ψ(ek) is the state value of the branch ek (ek ∈ E), respectively. The probability distribution is p: V (or E) f [0,1], where p(vk) is the probability of the node vk (vk ∈ V), and p(ek) is the probability of the branch ek (ek ∈ E), respectively. A node in PSDG denotes an element of the target system and also has some qualitative state values, which are its attributes classified qualitatively. A directed branch in PSDG also connects two nodes, and its direction shows the propagation direction of variation. To describe a complex target system, PSDG can utilize different integer values to denote different state patterns of the nodes. For example, different fault patterns of a fault cause node can be described by different state values. Similarly, except for the determined qualitative relationship between two jointed nodes, the state value of a branch can also denote the propagation rule that is applied by the branch. The most remarkable difference between PSDG and SDG is introduction of probabilistic information of the nodes and branches. In PSDG, each node has a probability distribution, which indicates the probabilities of different states that the node can be in. For example, p(vk=0) = 0.7 indicates that the node vk will be normal (namely, in the state vk = 0) with probability 0.7. For a root node vk, its probability distribution is the marginal distribution p(vk); for a nonroot node vk, it is the conditional probability distribution (CPD) p(vk|Pa(vk)), where Pa(vk) denotes the parents of vk. The CPD is often expressed as a conditional probability table (CPT) when all of the nodes in PSDG are with discrete values. In PSDG, the probability distribution of branches indicates a probability of the rule being applied by a branch. For example, p(ek=+1|∂+ek=+1) = 1 and p(ek=+1|∂+ek=other) = 0 indicate that only when the qualitative state of the initial node is +1, there will be a branch whose qualitative state is +1 between the initial node and the terminal node. Furthermore, if there is a stochastic time delay of the variation propagation from the initial node of a branch to the terminal node, for example, p(ek=+1) = 0.5 and p(ek=+2) = 0.5, it can be denoted that when the state of the initial node changes, the terminal node will change in the same direction one time unit later with probability 0.5 and two time units later with the same probability. These probability distributions of nodes and branches in the PSDG can be obtained by statistical analysis of historical data, simulation, and practical experiences. For example, the marginal distributions of root nodes can be determined by statistical analysis of the malfunction frequencies from historical operation data of the target process. As for the CPD of a nonroot node, it can be decided according to the different impact degree from its corresponding parent node, which will be provided by the practical experience of the operators. Also, the stochastic time delay mentioned above can be obtained by simulation to determine 10063

dx.doi.org/10.1021/ie200016t |Ind. Eng. Chem. Res. 2011, 50, 10062–10073

Industrial & Engineering Chemistry Research

Figure 1. Converging connection structure of PSDG.

the probability distribution of the propagation rule being applied by a branch. More details about node, branch, and probability distribution can be found in our previous work.16 PSDG can be also used for hazard and operability study (HAZOP) and stochastic qualitative simulation.

3. INTEGRATED FRAMEWORK OF PSDG APPROACH The purpose of the PSDG approach is to calculate and rank the probabilities of candidate faults and then confirm the real fault cause. Just like other model based fault diagnosis methods, the PSDG approach can be implemented in two steps: off-line modeling and online diagnostic inference. In practice, the accurate absolute values of the probabilities are not necessary for diagnosis, and the relative values of the probabilities are of greater concern. Therefore, appropriate approximation of the probabilities can make the PSDG approach more convenient and will not influence the final diagnostic conclusion. The idea of approximation runs through the whole PSDG framework, including both parameter evaluation in off-line modeling and calculation in online inference. It should be noted that the approximation in each step must be guaranteed fair to each node and branch, so that the final conclusions remain valid. The integrated framework of PSDG and its theoretical analysis will be introduced in detail as follows. 3.1. Evaluation of a Priori Conditional Probabilistic Parameters. As mentioned above, a PSDG model is obtained by

introducing the probabilistic parameters of the nodes and branches into the corresponding SDG model. Therefore, the model structure of PSDG can be obtained by using SDG modeling methods directly, and then a priori conditional probabilistic parameters of the nodes and branches are determined. A basic converging connection structure of PSDG is considered, as illustrated in Figure 1. For the sake of convenience and without loss of generality, it is supposed that the structure is a simple three-state model with positive branches; i.e., each node vk has only three states {0, +1, 1}, and the state of each branch is only +1. Node v3 has two parents, so that v3 will be influenced by two branches. When the PSDG model is built, the number of a priori conditional probabilistic parameters which should be attached to node v3 will be 3  3  3 = 27. If a node has n parents, 3n+1 parameters will be needed to specify the conditional probability distribution. For a sufficiently large n, it will be very cumbersome to elicit 3n+1 parameters. On the other hand, in many chemical processes, the influence from different branches to the same node is often independent; i. e., the influence mode is often “OR”. Thus, an a priori conditional probability of a node is often elicited from the influence of one single parent’s variation, while other parents are maintained normal. In some cases, without sufficient a priori knowledge about the process, the influence to the same child node cannot be known exactly while all parent nodes are varied. Therefore, on

ARTICLE

the basis of the incomplete knowledge in practice, an approach must be found to construct complete CPTs for PSDG modeling and consequent PSDG reasoning. There have been also similar issues in the Bayesian network (BN). To deal with the case of incomplete knowledge, BN often chooses compact representations, such as noisy-OR,22 context-specific independence,23 and so on. However, it has been proved that those compact representations in BN are presented for single-valued cases and are not applicable in multivalued cases.24 Since PSDG is precisely a kind of multivalued network, the solutions in BN cannot be applied to PSDG directly. As is well-known, the linearity will reduce the data requirements inherent in modeling large PSDG effectively. Considering the particular properties of SDG and the chemical process, an efficient approach is proposed to evaluate all the unknown parameters in this study. Here, for the sake of convenience, taking the converging connection structure in Figure 1, for an example, some propositions are presented. As a precondition before evaluation, it is supposed that the influence of one single parent’s variation can be obtained in advance by statistical analysis of historical data or practical experiences. In Figure 1, as mentioned above, the number of all necessary parameters is 27. Before evaluation in PSDG, it is supposed that five sets of probabilistic parameters (such as p(v3=+1/0/1|v1=+1, v2=0), p(v3=+1/0/1|v1=0,v2=1), p(v3=+1/0/1|v1=0,v2=0), and so on) can be predefined by analyzing variations of v1 and v2, respectively. Then, the rest of the unknown parameters can be evaluated according to the following propositions. Proposition 1. In Figure 1, suppose the parameters p(v3=+ 1|v1=+1,v2=0) and p(v3=+1|v1=0,v2=+1) are predefined, and then the following parameter p(v3=+1|v1=+1,v2=+1) can be evaluated by extending the idea of the noisy-OR gate.       pðv3 ¼ + 1v1 ¼ + 1;v2 ¼ + 1Þ¼ 1  ð1  pðv3 ¼ + 1v1      ¼ + 1;v2 ¼ 0ÞÞð1  pðv3 ¼ + 1v1 ¼ 0;v2 ¼ + 1ÞÞ 

ð1Þ

Similarly, suppose the parameters p(v3=1|v1=1,v2=0) and p(v3=1|v1=0,v2=1) are predefined, and then pðv3 ¼  1jv1 ¼  1;v2 ¼  1Þ ¼ 1  ð1  pðv3 ¼  1jv1 ¼  1;v2 ¼ 0ÞÞð1  pðv3 ¼  1jv1 ¼ 0, v2 ¼  1ÞÞ

ð2Þ

Proposition 2. In Figure 1, if there is no more a priori knowledge for node v3 when nodes v1 and v2 are in opposite variations, then the following six parameters p(v3=+1|v1=+1, v2=1), p(v3=0|v1=+1,v2=1), p(v3=1|v1=+1,v2=1), p(v3=+1| v1=1,v2=+1), p(v3=0|v1=1,v2=+1), and p(v3=1|v1=1, v2=+1) can be evaluated arbitrarily on the basis of experience. Their precise values are not necessary for PSDG reasoning. Even in extreme circumstances, they can be estimated approximatively as follows. pðv3 ¼ + 1jv1 ¼ + 1, v2 ¼  1Þ     pðv3 ¼ + 1v1 ¼  1;v2 ¼ + 1ÞÞ ¼ 0  10064

ð3Þ

dx.doi.org/10.1021/ie200016t |Ind. Eng. Chem. Res. 2011, 50, 10062–10073

Industrial & Engineering Chemistry Research

ARTICLE

computed as pðv1 ¼ + 1jv3 ¼ + 1Þ ¼ Figure 2. Serial connection structure of PSDG.

∑v 2

ð6Þ

      pðv3 ¼ 0v1 ¼ + 1;v2 ¼  1Þ  pðv3 ¼ 0v1 ¼  1;v2 ¼ + 1ÞÞ ¼ 1  

¼ pðv1 ¼ + 1jv2 ¼ + 1Þ pðv2 ¼ + 1jv3 ¼ + 1Þ    + pðv1 ¼ + 1jv2 ¼ 0Þ pðv2 ¼ 0v3 ¼ + 1Þ        + pðv1 ¼ + 1v2 ¼  1Þ pðv2 ¼  1v3 ¼ + 1Þ  

ð4Þ       pðv3 ¼  1v1 ¼ + 1;v2 ¼  1Þ pðv3 ¼  1v1 ¼  1;v2 ¼ + 1ÞÞ ¼ 0   ð5Þ The proof of proposition 2 is given in the Appendix section A.1. According to these propositions, if a node has n parents, only 6n + 3 parameters need to be predefined by statistics or experiences. Furthermore, propositions 1 and 2 can also be extended to the more complex situations of multiple parents. In the complex situations, the a priori probabilistic parameters can also be evaluated according to the methods of propositions 1 and 2, and the more complex forms of the above equations will be paid attention to. 3.2. Basic Ideas of PSDG Reasoning. After the PSDG model of the target system is built, the reasoning mechanism will be implemented to find out fault causes based on fault symptoms. In general, the whole PSDG reasoning can be divided into two functions: candidate faults determination and its probability calculation. Therefore, in our PSDG framework, PSDG reasoning is composed of two parts. First, a reasoning algorithm, which is the same as that in SDG, is utilized to obtain all of the possible candidate fault causes and consistent paths. Then the range of probability calculation will be determined. Second, the relative values of the probabilities of all candidate faults based on the online measured evidence are calculated. Then the candidate fault with the maximal probability can be determined as the most possible fault cause. Up to now, there is no algorithm designed specially for PSDG reasoning reported in the literature. Bayesian reasoning method can be usually chosen to calculate the probabilities of candidate faults in PSDG. However, there are many cycles in our PSDG model that denote the necessary feedback control loops in the target system, and the Bayesian reasoning is disabled due to these cycles. The PSDG model must be modified in advance to obtain the primary acyclic structures if using Bayesian reasoning. The modification methods in our PSDG framework will be discussed in detail in a later section. Instead of Bayesian reasoning, the particular properties of SDG, such as consistency characteristic, can also be utilized to calculate the probabilities approximately. 3.3. Consistent Characteristic and Approximate Calculation in PSDG. The consistent rule is widely used in SDG and can also be extended to PSDG. On the basis of the consistent rule, the approximate calculation of the posterior probability is then obtained. A basic serial connection structure of PSDG is con sidered, as illustrated in Figure 2, and it is supposed that it is a simple three-state model structure with positive branches too. Without loss of generality, ψ(v3)=+1 is assumed to be the only evidence. Then the posterior probability p(v1=+1|v3=+1) is to be

   pðv1 ¼ + 1jv2 ;v3 ¼ + 1Þ pðv2 v3 ¼ + 1Þ 

According to consistency characteristic,21 we have pðv1 ¼ + 1jv2 ¼ + 1Þ pðv2 ¼ + 1jv3 ¼ + 1Þ    . pðv1 ¼ + 1jv2 ¼ 0Þ pðv2 ¼ 0v3 ¼ + 1Þ 

ð7Þ

Proof of eq 7. The left side term of eq 7 can be transformed as pðv1 ¼ + 1jv2 ¼ + 1Þ pðv2 ¼ + 1jv3 ¼ + 1Þ ¼

pðv1 ¼ + 1Þ pðv2 ¼ + 1jv1 ¼ + 1Þ pðv3 ¼ + 1jv2 ¼ + 1Þ pðv3 ¼ + 1Þ ð8Þ

The right side term is also transformed as pðv1 ¼ + 1jv2 ¼ 0Þ pðv2 ¼ 0jv3 ¼ + 1Þ ¼

pðv1 ¼ + 1Þ pðv2 ¼ 0jv1 ¼ + 1Þ pðv3 ¼ + 1jv2 ¼ 0Þ ð9Þ pðv3 ¼ + 1Þ

According to consistency characteristic, pðv2 ¼ + 1jv1 ¼ + 1Þ pðv3 ¼ + 1jv2 ¼ + 1Þ . pðv2 ¼ 0jv1 ¼ + 1Þ pðv3 ¼ + 1jv2 ¼ 0Þ

ð10Þ

Therefore, according to eqs 810, eq 7 is proved. Similarly, p(v1=+1|v2=1) p(v2=1|v3=+1) = 0 approximately according to consistent rule. Therefore, eq 6 can be calculated approximately as follows. pðv1 ¼ + 1jv3 ¼ + 1Þ  pðv1 ¼ + 1jv2 ¼ + 1Þ pðv2 ¼ + 1jv3 ¼ + 1Þ

ð11Þ

Extending the conclusion of eq 11 to a more complicated serial connection structure, the following theorem can hold. Theorem 1. A serial connection structure of PSDG is considered, in which the node set is V = {v1, v2, ..., vm} and the branch set is E = {ev1fv2, ev2fv3, ..., evm1 fvm} orderly. It is supposed that this is a simple three-state model structure; i.e., each node has only three states, 0, +1, and 1, and each branch has two states, +1 and 1. The node vm is the only evidence node whose state 10065

dx.doi.org/10.1021/ie200016t |Ind. Eng. Chem. Res. 2011, 50, 10062–10073

Industrial & Engineering Chemistry Research

ARTICLE

ψ(vm) has been known. Then, generally, the posterior probability can be calculated approximately as follows. pðvk ¼ ψðvk Þjvm ¼ ψðvm ÞÞ m k Y

pðvmi ¼ψðvmi Þjvmi+1 ¼ψðvmi+1 ÞÞ,

i¼1

k ∈ ½1, m  1 ð12Þ

where the left side term of eq 12 should meet the condition ψðvk Þ ¼ ψðvm Þ

m k Y i¼1

ψðevmi f vmi+1 Þ,

k ∈ ½1, m  1 ð13Þ

and the right side term of eq 12 should meet the condition ψðvt Þ ¼ ψðvt+1 Þ ψðevt

f vt+1 Þ,

t ∈ ½1, m  1

ð14Þ

The proof of theorem 1 is given in the Appendix section A.2. The characteristic reflected by theorem 1 can be called the propagation probability of a consistent path. Therefore, the posterior probability of a node in a serial connection structure of PSDG can be calculated approximately using this characteristic instead of Bayesian reasoning. 3.4. Cycle Processing and Model Simplification. 3.4.1. Cycle Processing. As mentioned above, if Bayesian reasoning is used to calculate the probabilities, the cycles in athe PSDG model must be processed in advance to obtain the primary acyclic structure. In our framework, a method of dealing with cycles is proposed on the basis of the concept of a supernode.3 A supernode regards a cycle as a strongly connected component and considers that the cycle connects its source and target nodes on the consistent paths as a whole. Therefore, the cycle can be replaced as a corresponding supernode. The main idea of our cycle processing method is to regard cycles as supernodes at first, so that a global reasoning can be implemented in an acyclic structure. If the measured value of controlled variable (CV) in a cycle has deviated, then the corresponding supernode should be considered as an evidence node; otherwise, it should be considered as a hidden node. And then, if necessary, a local reasoning will enter the cycle to find the potential fault inside the loops. In general, there are many different fault scenarios in a cycle, which can be mainly classified into imperfect control and perfect control. However, in each fault scenario, the nodes in the cycle generally show different states.2527 On the basis of the detected symptoms inside a cycle, a consistent path with definite direction can be obtained by using the conventional SDG reasoning, which will disconnect the cycle for further probabilities calculation. Some details about local reasoning in a cycle are discussed here. Maurya et al. have proposed two unified SDG models for control loops on the basis of the corresponding algebraic equations and differential algebraic equations, respectively.26,27 In those two SDG models, the different effects of both disturbances and structural faults can be easily modeled. And then, the different eight fault scenarios were analyzed with respect to perfect control (including external disturbances, sensor bias, controller signal bias, and valve bias) and imperfect control (including large external disturbances, sensor failure, controller failure, and valve failure). In our local reasoning step, a simply unified SDG model is adopted, illustrated in Figure 3, which can be obtained on the basis of the transfer functions25 of a single-loop control system. In Figure 3, [Kc], [Kv], and [Kp] denote the signs of the proportional gain of the controller, the steady-state gain of the valve, and the steady-state gain of the process, respectively. And then,

Figure 3. SDG model of a single-loop control system.

those fault scenarios mentioned previously can also be analyzed using this model. Because external disturbances in both perfect and imperfect control originate from the real fault causes outside the control loop, the real causes can be found by the global reasoning step of our processing cycle method. Therefore, the local reasoning of our processing cycle method only handles the other six fault scenarios. Except for the measured value of the controlled variable (CVm), it is supposed that the controller signal (CS) can also be measured in the process. Then, in each fault scenario, their initial responses and steady-state responses generally show different states. Therefore, each fault scenario can be distinguished according to the corresponding state combination of CVm and CS, so that a consistent path with definite direction can be obtained. Due to the limited space, the cases of sensor bias in perfect control and controller failure in imperfect control are only discussed here. (i) Sensor bias. The node CVm,bias in Figure 3 denotes this fault cause. Now, it is supposed that there is a positive sensor bias, namely, sign(ψ(CVm,bias)) = +. Then, the initial response of CVm is positive, and the initial response of CS is negative. After feedback regulation of the control loop, finally, the steady-state response of CVm remains normal, and the steady-state response of CS is still negative. It means that the response of CVm is a compensatory response. Therefore, the information flow is shown as follows: CVð  Þ r VPð  Þ r CSð  Þ r CV m ð + Þ r CV m;bias ð + Þ

It is also the consistent path for sensor bias. (ii) Controller failure. The node CSfail in Figure 3 denotes this fault cause. This fault scenario is equivalent to cutting the branch from CVm to CS. And then, the state of CVm is always decided by the state of CS, namely, sign(ψ(CVm)) = sign(ψ(CS)) = sign(ψ(CSfail)) at any time. Therefore, the information flow is as follows: CVm r CV r VP r CS r CSfail. It is also the consistent path for controller failure. 2. Model Simplification. Similar to those in a BN, there are usually some nodes in a PSDG that have no meaning for reasoning with certain evidence nodes. These nodes can be eliminated in advance, which cannot influence the final result. The simplification can reduce the computation complexity for real-time fault diagnosis. According to similar simplification principles in BN,28 the following conclusions are presented without proof. Theorem 2. Let V be an ancestral set in a PSDG γ, i.e., iff "v ∈ V, an(v) ⊆ V, where an(v) denotes the ancestors of node v. γ0 is obtained by eliminating all the other nodes not belonging to V from γ. Then, pγ(V) = pγ0 (V). 10066

dx.doi.org/10.1021/ie200016t |Ind. Eng. Chem. Res. 2011, 50, 10062–10073

Industrial & Engineering Chemistry Research

ARTICLE

Figure 4. Procedure of PSDG approach.

Theorem 3. Let VE be the subset of evidence nodes and VF be the subset of possible candidate fault nodes in a PSDG γ. γ0 is obtained by eliminating all of the other nodes not belonging to an(VE) from γ, where an(VE) denotes the smallest ancestral set containing VE. Then, pγ(VF|VE) = pγ0 (VF|VE). Therefore, according to theorem 3, the following rule of simplifying PSDG can be obtained. Rule 1. Let VE be the subset of evidence (monitoring) nodes in the PSDG γ of the target system, whose states have deviated when reasoning starts. Then, only the smallest ancestral set an(VE) containing VE and all of the other normal monitoring nodes connecting with the nodes in an(VE) will be reserved to obtain the reduced PSDG γ0 for further probability calculation. 3.5. Integrated Procedure of PSDG Framework. As discussed above, the integrated procedure of the PSDG framework is illustrated as Figure 4. The procedure is composed of two parts: off-line analysis and online diagnosis. The main purpose of the off-line stage is to build a PSDG model of the target system, which includes the following steps: (1) build the traditional SDG model of the target system; (2) determine the a priori probabilistic parameters by statistical analysis of historical data and evaluation approach discussed above; (3) build the PSDG model by introducing the probabilistic parameters into the SDG model. The main purpose of the online stage is to achieve the diagnostic results of the PSDG framework, which includes the following steps: (1) determine the fault symptoms, i.e., the deviated evidence nodes, by fault detection; (2) simplify the PSDG model according to the deviated evidence nodes; (3) transform the cycles in the simplified PSDG model into supernodes (then, the probabilistic cause and effect graph (PCEG) is obtained for probability calculation, which is the corresponding contracted acyclic PSDG after steps 2 and 3); (4) calculate the relative values of the probabilities of all candidate faults in the PCEG and display the diagnostic results.

4. PSDG BASED FAULT DIAGNOSIS OF GAS FRACTIONATION UNIT In our previous work,16 some simulation results of PSDG based fault diagnosis were illustrated on a simulation platform of synthetic ammonia inversion process, in which the cases of multiple fault origin were also discussed. In all of those simulation cases, the posterior probability of each real fault cause is dominant (at least more than 0.5) among all the candidate faults; that is to

Figure 5. Diagram of the gas fractionation unit.

say, there will be more than one fault whose posterior probability is dominant for multiple fault cases. Therefore, our PSDG approach is effective in handling both single and multiple fault cases. Here, real operation data from a gas fractionation unit in a petrochemical company are utilized to verify the performance of the proposed PSDG framework further. 4.1. Introduction of the Process. A gas fractionation unit (i.e., fractional distillation) is a separation process, in which a certain quantity of a mixture (for example, liquefied petroleum gas, LPG) is divided into a number of smaller fractions (such as propane, propylene, and isobutane, etc). Gas fractionation is typically performed in a large distillation tower, and the liquid outlets up a tower allow for the withdrawal of different fractions or products with different boiling ranges. The lighter fractions exit from the top of the tower, and the heavier ones exit from the bottom. A distillation tower often uses reflux to achieve more complete separation of fractions, and the reflux is the portion of the condensed overhead liquid product from a tower. The process of the experimental gas fractionation unit is illustrated as Figure 5. The whole unit consists of four towers in series, and the structure of each tower is very similar. LPG from an upstream refining unit is fed to a depropanizer (tower 1). Products in the top of tower 1 mainly include C-2 and C-3 fractions. One part of the top fractions is returned as reflux, and the other part is input into a deethanizer (tower 2) as feed. The bottom fractions of tower 1 are input into a deisobutanizer (tower 4) as feed. As for tower 2, all of the fractions from the top are returned as reflux, and the bottom fractions are input into a propylene tower (tower 3) as feed. As for tower 3, a part of the fractions from the top is returned as reflux, and the other part flows out as final products. The propane fraction from the bottom is input into the downstream gas system to be further processed. As for tower 4, a part of the fractions from the top is returned as 10067

dx.doi.org/10.1021/ie200016t |Ind. Eng. Chem. Res. 2011, 50, 10062–10073

Industrial & Engineering Chemistry Research

ARTICLE

Figure 6. Subsystem tower 1 of SDG model.

reflux, and both the other part and the bottom fractions are sources of the downstream gas processing system. 4.2. SDG and PSDG Model. To build the PSDG model of the process, the entire SDG model must be built at first. According to the structure of the unit with four towers, the entire process can be decomposed into four subsystems according to the tower, and each tower can be further decomposed into several secondary subsystems according to its feedback control loops. Then, the entire SDG model can be built in a hiberarchy structure. Due to the limited space, the model is not entirely illustrated here. Figure 6 only shows the subsystem tower 1 of the SDG model. In Figure 6, parts IVI denote the following secondary subsystem SDG models of tower 1, respectively: feed flow control system, feed temperature control system, bottom temperature control system, bottom liquid level control system, reflux control system, and overhead pressure control system. The local SDG model of the secondary subsystem I is illustrated in Figure 7. In Figures 6 and 7, a solid line denotes a positive effect of the branch, and a dashed line denotes a negative effect. The SDG model of subsystem tower 1 contains 109 nodes and 19 control loops. As for tower 2 to tower 4, the nodes are 51, 73, and 65, and the control loops are 11, 20, and 11, respectively. From the SDG model, the PSDG model can be obtained by introducing probabilistic information. The probabilistic parameters are obtained here by statistical analysis of historical data using the statistical table of malfunctions provided by the petrochemical company. The table includes all of the fault data which have been collected by the unit operators in the period of 900 days. Therefore, the a priori probability of each fault can be determined approximatively according to its occurrence frequency in the table. For example, during the 900 running days, the LC2105 sensor in tower 2 had broken down 15 times. Thus, the a priori fault probability of the corresponding node should be 15/900 = 0.0167, and the a priori normal probability is 1  0.0167 = 0.9833. To the fault which never occurred in the table, if its a priori fault probability is set to zero, then the posterior probability will

Figure 7. Local SDG model of secondary subsystem I.

remain at zero whatever the fault symptoms are. It means that the fault, which has never occurred before, will never occur. Obviously, this is not reasonable. Thus, for the sake of fairness, a small probabilistic value of 0.0005 is assigned to these fault nodes. In addition, the a priori fault probabilities can be updated gradually according to the new statistical fault data as running time goes on. As for the a priori conditional probability between a parent node and a child node, it is decided according to the impact degree from the parent to the child on the basis of the experience of the operators. Here, a bigger probabilistic value of 0.9 is assigned to a stronger impact, and a smaller value of 0.7 is assigned to a weaker impact. The other unknown a priori probabilities can be evaluated by the approach discussed above. Take the converging connection structure in Figure 1 for an example. If the probabilities p(v3=+1|v1=+1,v2=0) and p(v3=+1|v1=0,v2=+1) is decided as 0.9 and 0.7 according to their impact degrees, then the probability p(v3=+1|v1=+1,v2=+1) can be evaluated as 0.97 according to eq 1. The other probabilities can also be evaluated according to propositions 1 and 2. 4.3. Experimental Results. The experimental operation data are collected from the real-time database of the company. Fault symptoms are detected using dynamic kernel partial least-squaressupport vector regression (DKPLS-SVR) method proposed in our previous work.13 10068

dx.doi.org/10.1021/ie200016t |Ind. Eng. Chem. Res. 2011, 50, 10062–10073

Industrial & Engineering Chemistry Research

ARTICLE

Figure 9. Cascade control system of case 2.

Figure 8. PCEG of Case 1.

Table 1. Final Diagnostic Results of Case 1 by PSDGa fault no.

candidate faults

probabilities

1

pump P2001A shut down

0.7588

2 3

pipe of FC2101 stuck pump P2007A shut down

0.2071 0.0011

4

pipe of FC2104 leakage

0.0001

5

pipe of FC2123 leakage

0.0001

6

TC2101 water temperature high

0.0001

a

The boldface is used to emphasize the most possible fault found by the PSDG approach, whose possibility is maximal.

equal. However, by PSDG, the probabilities of these candidate faults are calculated further as shown in Table 1. Thus, the conclusion is obtained that the most possible fault is the pump P2001A shut down, and its possibility is 0.7588 and much higher than those of other candidate faults. And the consistent paths founded by reasoning are as follows: path 1: FC2123ð  Þ r LC2108ð  Þ r FC2104ð  Þ r LC2102ð  Þ r TC2101ð + Þ r FC2101ð  Þ r P2001Að  Þ

path 2: Case 1. Feed Pump P2001A of Tower 1 Shut Down. By the DKPLS-SVR method, when the fault arises, the following symptoms are detected successively: LC2101(+), LC2102(), LC2108(), FC2101(), FC2102(), FC2104(), FC2123(), and TC2101(+), where L denotes level, F denotes flow, and T denotes temperature, respectively. Through simplifying the PSDG model and transforming the cycles into supernodes, the PCEG for probability calculation is illustrated in Figure 8. In this case, the deviated monitoring nodes can be assigned to four secondary subsystems, and each subsystem is a cascade control system. They are the feed flow control system of tower 1 (including LC2101 and FC2101), the feed temperature control system of tower 1 (including TC2101 and FC2102), the bottom liquid level control system of tower 1 (including LC2102 and FC2104), and the bottom liquid level control system of tower 4 (including LC2108 and FC2123), respectively. Therefore, these four cascade control systems are transformed into eight corresponding supernodes in the global reasoning step, which are the nodes with positive or negative states in Figure 8. Under our PSDG framework, the final diagnostic result involving all possible candidate faults and their probabilities are shown in Table 1. There are six candidate faults that may result in the same qualitative symptoms which have been detected. If we only use the conventional SDG based fault diagnosis approach, this candidate fault set list in Table 1 can also be obtained, but these candidate faults cannot be distinguished further by SDG and the most possible fault cause cannot be found out directly and reasonably. The final diagnostic conclusion of the SDG approach will be that the possibilities of all fault causes listed in Table 1 are

LC2101ð + Þ r FC2101ð  Þ r P2001Að  Þ path 3: FC2102ð  Þ r TC2101ð + Þ r FC2101ð  Þ r P2001Að  Þ

Case 2: LC2105 Sensor Bias. LC2105 is the liquid level of the reflux tank of tower 2. When the fault arises, the following symptoms are detected successively by DKPLS-SVR method: LC2105() and FC2111(). These two nodes belong to the reflux control system of tower 2, which is a cascade control system illustrated in Figure 9. Then, the local reasoning step inside a cycle will be implemented. As mentioned above in section 3.4, using the state combination of monitoring nodes, the information flow in the cycle can be determined to obtain a consistent path with definite direction. For example, if controller bias or valve bias arises, the state of LC2105 should be opposite to that of FC2111. Therefore, controller bias or valve bias cannot be a real fault cause and they are not the origin of the information flow in this fault scenario. Through simplification, the PCEG of case 2 is illustrated in Figure 10. The final diagnostic result involving all possible candidate faults and their probabilities are shown in Table 2. The conclusion can also be confirmed that the most possible fault is the LC2105 sensor negative bias. And the consistent path founded by reasoning is FC2111ð  Þ r LC2105ð  Þ r LC2105sensorð  Þ 10069

dx.doi.org/10.1021/ie200016t |Ind. Eng. Chem. Res. 2011, 50, 10062–10073

Industrial & Engineering Chemistry Research

ARTICLE

Table 3. Results of Changing the Proportion of a Priori Probabilities in PSDGa diagnostic results proportion fault 1:fault 2

fault 1

fault 2

case 1: from real data

2:1

0.7588

0.2071

change 1 change 2

3:1 1:1

0.7915 0.6080

0.1728 0.3651

change 3

1:2

0.4456

0.5352

change 4

1:3

0.3517

0.6336

a

The boldface is used to emphasize the most possible fault found by the PSDG approach, whose possibility is maximal.

Figure 10. PCEG of case 2.

Table 2. Final Diagnostic Results of Case 2 by PSDGa fault no.

candidate faults

probabilities

1 2

LC2105 sensor negative bias pipe of FC2111 leakage

0.5810 0.1588

3

FC2111 sensor negative bias

0.0120

a

The boldface is used to emphasize the most possible fault found by the PSDG approach, whose possibility is maximal.

4.4. Discussions. In this section, the relationship between PSDG and BN is discussed, and the influence of a priori fault probabilities on PSDG reasoning results is also analyzed further. From Figure 8 and Figure 10, it can be seen that, if using Bayesian reasoning to calculate the probabilities of candidate faults, the final PCEG for calculation can be considered as a BN model. Therefore, standard BN techniques can be used in PCEG. However, our PSDG approach is not a straightforward application of BN. Usually, when used for fault diagnosis, a BN model involves fault cause nodes and symptom nodes which are induced by some certain fault cause nodes. Therefore, in BN modeling, enough a priori knowledge of the target system, especially a priori knowledge in abnormal conditions, is very necessary to identify all possible fault causes in advance. The dependence on complete prior knowledge of the system increases the difficulty and lacks the ability to diagnose novel faults, so that the BN approach is usually applied for fault diagnosis to instruments and equipment rather than to a complex chemical process. However, as mentioned above, our PSDG model reserves the original model structure of SDG; i.e., our PSDG model reflects the causalities among the variables of the process which is easy to obtain only by a little knowledge of the normal condition. Also, our PSDG approach inherits all the advantages of SDG, and then it is more suitable for application in the chemical industry. Moreover, as discussed in this work, our PSDG approach has a set of integrated theoretical and methodological framework from its definition to its implementation, which is independent of BN. Only when Bayesian reasoning is used to calculate the probabilities, the PSDG should be transformed into PCEG, which can be considered as a kind of BN. If other kinds of reasoning mechanisms of PSDG are developed, the transformation step will be unnecessary. Besides, the SDG reasoning method is used in our PSDG reasoning mechanism and it determines the range of consequent probabilities

calculation, so that it is not like BN approach to query every cause node. Our PSDG approach can also reveal the procedure of fault propagation and evolvement through consistent paths other than probabilities of candidate faults. Therefore, BN only provides a kind of calculation tool for our PSDG reasoning. And, from another point of view, our PSDG framework can also provide a modeling method for BN. In the probability calculation of candidate faults, the posterior probability of a candidate fault node will be influenced by its initial a priori fault probability. Here, the converging connection structure in Figure 1 is taken for an example again. It is assumed that ψ(v3) = +1 is the evidence, and the posterior probabilities of two candidate cause nodes, p(v1=+1|v3=+1) and p(v2=+1|v3=+1), should be determined. Then, the relative values of p(v1=+1|v3=+1) and p(v2=+1|v3=+1) depend on the values of p(v1=+1) and p(v2= +1). When the influence from node v1 to node v3 is equal to that from node v2, if p(v1=+1) > p(v2=+1), then p(v1=+1|v3=+1) > p(v2=+1|v3=+1); of course, if p(v1=+1) = p(v2=+1), then p(v1=+1|v3=+1) = p(v2=+1|v3=+1). That is in accord with common sense. And, it can be proved by the diagnostic results when changing the proportion of the a priori fault probabilities of fault 1 (“pump P2001A shut down”) and fault 2 (“pipe of FC2101 stuck”) in case 1. As listed in Table 3, when the proportion of the a priori fault probabilities between fault 1 and fault 2 is changed, the relative values of posterior probabilities will be also changed correspondingly. And then, the diagnostic conclusion will be changed too. Therefore, it is very important for the learning of initial a priori fault probabilities of candidate fault nodes from operation data to the diagnostic conclusion of our PSDG approach. A similar result also occurs in other probabilistic approaches, including BN. Because of this property, if multiple causes connect the same evidence, then usually the posterior probability of the cause with small initial a priori fault probability will not be dominant even if it is the real fault cause. So, only considering the cause with the maximal posterior probability is not enough for a perfect diagnosis. Several candidate causes should be checked according to the orders of their posterior probabilities for a complete conclusion.

5. CONCLUSIONS On the basis of the PSDG concept proposed in our previous work, an integrated theoretical and methodological framework is proposed to construct a basically complete system of PSDG based fault diagnosis approach. The definition, reasoning mechanism, and theoretical analysis of PSDG are discussed in detail. In 10070

dx.doi.org/10.1021/ie200016t |Ind. Eng. Chem. Res. 2011, 50, 10062–10073

Industrial & Engineering Chemistry Research

ARTICLE

addition to retaining the advantages of SDG, the PSDG framework can calculate and rank the probabilities of all candidate faults given the online measured evidence. Experimental results show the validity and advantages of the PSDG framework. There is, however, still some further work for the PSDG system, for example, how to model a PSDG more conveniently, how to learn and tune the probabilistic parameters utilizing historical experimental data, how to combine PSDG with other fault diagnosis approaches, and so on.

Then in eq A1, except for the a priori parameters X1 and X2, the other parameters have been known in advance. Therefore, in eq A1, p(v1=+1|v3=+1) can be regarded as a function of these two variables X1 and X2. Furthermore, the function is obviously monotone according to eq A1. When the conditions X1 = 1 and X2 = 0 are met synchronously, the theoretical maximum can be obtained as pðv1 ¼ + 1jv3 ¼ + 1Þmax ¼

Σ1 + pðv1 ¼ + 1Þ pðv2 ¼  1Þ Σ2 + pðv1 ¼ + 1Þ pðv2 ¼  1Þ

’ APPENDIX

ðA6Þ

A.1. Proof of Proposition 2. Due to the limited space, in this

section, we only take two parameters p(v3=+1|v1=+1,v2=1) and p(v3=+1|v1=1,v2=+1) as an example to prove the approximate evaluation of proposition 2. The proof can proceed in two steps: (1) these two parameters can be evaluated arbitrarily on the basis of experience; (2) in extreme circumstances, they can be evaluated as eq 3 further. (1) The final purpose of PSDG diagnostic reasoning is to compute the posterior possibility of the influence from any parent node. Here, the fault symptom is assumed to be ψ(v3) = +1. It has been supposed that the structure is a simple three-state model with positive branches. Therefore, the reasoning for explaining the symptom can be v1 = +1 or v2 = +1 or both. Without loss of generality, the posterior probability p(v1= +1|v3=+1) can be computed as follows. pðv1 ¼ + 1jv3 ¼ + 1Þ ¼

pðv1 ¼ + 1;v3 ¼ + 1Þ pðv3 ¼ + 1Þ

∑v pðv1 ¼ + 1;v2;v3 ¼ + 1Þ Σ1 +Δ1 ¼ ¼ Σ2 +Δ1 + Δ2 ∑ pðv1 ;v2 ;v3 ¼ + 1Þ v v 2

ðA1Þ

1 2

Equation A1 represents how to calculate the posterior probability that the state of cause node v1 will be +1, after observing the fault symptom, where Σ1 ¼ pðv1 ¼ + 1Þ pðv2 ¼ 0Þ pðv3 ¼ + 1jv1 ¼ + 1;v2 ¼ 0Þ

On the other hand, when the conditions X1 = 0 and X2 = 1 are met synchronously, the theoretical minimum can be obtained as pðv1 ¼ + 1jv3 ¼ + 1Þmin ¼

ðA7Þ As is well known, the probability of a fault condition is far less than that of a normal condition in the process. Therefore, the following conditions are met.

Σ2 ¼ Σ1 +pðv1 ¼ 0Þ pðv2 ¼ 0Þ pðv3 ¼ + 1jv1 ¼ 0;v2 ¼ 0Þ + pðv1 ¼ 0Þ pðv2 ¼ + 1Þ pðv3 ¼ + 1jv1 ¼ 0;v2 ¼ + 1Þ + pðv1 ¼  1Þ pðv2 ¼ 0Þ pðv3 ¼ + 1jv1 ¼  1;v2 ¼ 0Þ + pðv1 ¼ 0Þ pðv2 ¼  1Þ pðv3 ¼ + 1jv1 ¼ 0;v2 ¼  1Þ    + pðv1 ¼  1Þ pðv2 ¼  1Þ pðv3 ¼ + 1v1 ¼  1;v2 ¼  1Þ  ðA3Þ

Σ2

. pðv1 ¼  1Þ pðv2 ¼ + 1Þ

Σ1

. pðv1 ¼ + 1Þ pðv2 ¼  1Þ

ðA8Þ

pðv1 ¼  1Þ pðv2 ¼ + 1Þ pðv1 ¼ + 1Þ pðv2 ¼  1Þ

ðA9Þ

Therefore, when Σ2 < RΣ1, the function is monotone decreasing; when Σ2 > RΣ1, the function is monotone increasing; when Σ2 = RΣ1, the function is a constant. Generally, Σ2 < RΣ1 is often met in the process. So, when X1 ≈ X2 = 0, the maximum can be obtained as follows. pðv1 ¼ + 1jv3 ¼ + 1Þmax ¼

Σ1 Σ2

ðA10Þ

When X1 ≈ X2 = 1, the minimum can be obtained as follows. pðv1 ¼ + 1jv3 ¼ + 1Þmin ¼

ðA4Þ

Σ1 +pðv1 ¼ + 1Þ pðv2 ¼  1Þ Σ2 +pðv1 ¼  1Þ pðv2 ¼ + 1Þ + pðv1 ¼ + 1Þ pðv2 ¼  1Þ ðA11Þ

Δ2 ¼ pðv1 ¼  1Þ pðv2 ¼ + 1Þ pðv3 ¼ + 1jv1 ¼  1;v2 ¼ + 1Þ Parameters are defined as X1 = p(v3=+1|v1=+1,v2=1) and X2 = p(v3=+1|v1=1,v2=+1).

. pðv1 ¼ + 1Þ pðv2 ¼  1Þ

R ¼ 1+

Δ1 ¼ pðv1 ¼ + 1Þ pðv2 ¼  1Þ pðv3 ¼ + 1jv1 ¼ + 1;v2 ¼  1Þ

ðA5Þ

Σ2

According to eq A8, the maximum value in eq A6 and the minimum value in eq A7 are almost approximately equal. Therefore, the parameters X1 and X2 can be evaluated approximatively and arbitrarily on the basis of experience, and the imprecise values will not influence the final conclusion. (2) In addition, X1 and X2 are often considered to be approximately equal when the influence from each parent to the same child is similar in the process. If it is assumed that X1 ≈ X2, then, in eq A1, p(v1=+1|v3=+1) can be regarded as a function of a single variable. Another weighting parameter R is defined as

+ pðv1 ¼ + 1Þ pðv2 ¼ + 1Þ pðv3 ¼ + 1jv1 ¼ + 1;v2 ¼ + 1Þ ðA2Þ

Σ1 Σ2 + pðv1 ¼  1Þ pðv2 ¼ + 1Þ

Furthermore, the PSDG fault diagnosis concerns more about the maximal possibility of each candidate fault. Thus, in this extreme circumstance, the parameters can be evaluated as X1 ≈ X2 = 0 in eq 3. 10071

dx.doi.org/10.1021/ie200016t |Ind. Eng. Chem. Res. 2011, 50, 10062–10073

Industrial & Engineering Chemistry Research

ARTICLE

Similarly, we can prove that the other four parameters can be evaluated arbitrarily on the basis of experience, and then eqs 4 and 5 can also be proved. A.2. Proof of Theorem 1. The proof can proceed by mathematical induction in two steps: the initial step and the inductive step. (1) Initial Step. When k = m  1, substituting k into eq 12, the left side term can be transformed as pðvm1 ¼ ψðvm Þψðevm1f vm Þjvm ¼ ψðvm ÞÞ

Substituting eq A17 into eq A16, the following eq A18 is obtained. pðvl1 ¼ ψðvm Þ

The right side term is the same as eq A12 when k = m  1. So, eq 12 is clearly true for k = m  1. When k = m  2, it is the same case as the serial connection structure involving three nodes in Figure 2. Substituting k into eq 12, the left side term can be transformed as

The right side term is pðvm2 ¼ ψðvm1 Þψðevm2f vm1 Þjvm1 ¼ ψðvm1 ÞÞ pðvm1 ¼ ψðvm Þψðevm1f vm Þjvm ¼ ψðvm ÞÞ

ðA14Þ

According to the conclusion of eq 11, eq 12 can be verified for k = m  2. (2) Inductive Step. It is assumed that eq 12 is true when k = l (l ∈ [1, m  1]). That is the case of a serial connection structure involving (m  l + 1) nodes. Then, eq 12 must be proved true for k = l  1. Substituting k = l  1 into eq 12, the left side term is pðvl1 ¼ ψðvm

m Q l+1 i¼1

ψðevmi f vmi+1 Þjvm ¼ ψðvm ÞÞ

¼ pðvl1 ¼ ψðvm Þψðevm1 f vm Þψðevm2 f vm1 Þ:::ψðevl f vl+1 Þψðevl1 f vl Þjvm ¼ ψðvm ÞÞ ¼

l

¼ ψðvm Þψðevm1 f vm Þψðevm2 f vm1 Þ:::ψðevl f vl+1 Þψðevl1 f vl Þvm

ðA15Þ

¼ ψðvm Þ;vl Þ pðvl jvm ¼ ψðvm ÞÞ

According to the consistency characteristic and the proofQprocedure of eq 11, only the consistent state ψ(vl) = ψ(vm) ml i=1 ψ(evm-ifvm-i+1) of node vl should be reserved. Then, eq A15 can be transformed as m l+1 Y i¼1

ψðevmi f vmi+1 Þjvm ¼ ψðvm ÞÞ

¼ ψðvmi Þjvmi+1 ¼ ψðvmi+1 ÞÞ ¼

mY l

pðvmi

i¼1

m Y ðl  1Þ

pðvmi

i¼1

¼ ψðvmi Þjvmi+1 ¼ ψðvmi+1 ÞÞ

ðA18Þ

Therefore, eq 12 is proved true for k = l  1 under the assumption. This completes the inductive step. The procedure can be analogized to the case k = 1 successively according to the inductive step. That is the case of a serial connection structure involving m nodes. Therefore, theorem 1 has been proved.

’ AUTHOR INFORMATION Corresponding Author

*Tel.: +86-10-6278-5845. Fax: +86-10-6278-6911. E-mail: zhxiong@ tsinghua.edu.cn.

’ ACKNOWLEDGMENT This research is partially supported by the National Natural Science Foundation of China (Grant NFSC60874049), the National High Technology Research and Development (863) Program of China (Grant No. 2007AA04Z193), and IBM China Research Laboratory 2010 UR-Program. ’ REFERENCES

∑v pðvl1

pðvl1 ¼ ψðvm Þ

i¼1

 pðvl1 ¼ ψðvl Þψðevl1 f vl Þjvl ¼ ψðvl ÞÞ

ðA12Þ

pðvm2 ¼ ψðvm Þψðevm1f vm Þψðevm2f vm1 Þjvm ¼ ψðvm ÞÞ ðA13Þ

m Q ðl  1Þ

ψðevmi f vmi+1 Þjvm ¼ ψðvm ÞÞ

 pðvl1 ¼ ψðvl Þψðevl1 f vl Þjvl ¼ ψðvl ÞÞ pðvl ¼ ψðvl Þjvm ¼ ψðvm ÞÞ

ðA16Þ According to the assumption, eq 12 is true for k = l; then in the right side term of eq A16 mY l pðvmi ¼ ψðvmi Þjvmi+1 pðvl ¼ ψðvl Þjvm ¼ ψðvm ÞÞ  i¼1

¼ ψðvmi+1 ÞÞ

ðA17Þ

(1) Iri, M.; Aoki, K.; O'Shima, E.; Matsuyama, H. An Algorithm for Diagnosis of System Failures in the Chemical Process. Comput. Chem. Eng. 1979, 3, 489. (2) Shiozaki, J.; Matsuyama, H.; Tano, K.; O'Shima, E. Fault Diagnosis of Chemical Processes by the Use of Signed Directed Graphs Extension to Five-Range Patterns of Abnormality. Int. Chem. Eng. 1985, 25, 651. (3) Kramer, M. A.; Palowitch, B. L. A Rule-Based Approach to Fault Diagnosis Using the Signed Directed Graph. AIChE J. 1987, 33, 1067. (4) Shiozaki, J.; Shibata, B.; Matsuyama, H.; O'Shima, E. Fault Diagnosis of Chemical Processes Utilizing Signed Directed Graphs-Improvement by Using Temporal Information. IEEE Trans. Ind. Electron. 1989, 36, 469. (5) Yu, C. C.; Lee, C. Fault Diagnosis Based on Qualitative/ Quantitative Process Knowledge. AIChE J. 1991, 37, 617. (6) Wang, X. Z.; Yang, S. A.; Yang, S. H.; McGreavy, C. The Application of Fuzzy Qualitative Simulation in Safety and Operability Assessment of Process Plants. Comput. Chem. Eng. 1996, 20, 671. (7) Tarifa, E. E.; Scenna, N. J. Fault Diagnosis, Direct Graphs, and Fuzzy Logic. Comput. Chem. Eng. 1997, 21, 649. (8) Vedam, H.; Venkatasubramanian, V. PCA-SDG Based Process Monitoring and Fault Diagnosis. Control Eng. Pract. 1999, 7, 903. (9) Lee, G.; Song, S. O.; Yoon, E. S. Multiple-Fault Diagnosis Based on System Decomposition and Dynamic PLS. Ind. Eng. Chem. Res. 2003, 43, 6145. 10072

dx.doi.org/10.1021/ie200016t |Ind. Eng. Chem. Res. 2011, 50, 10062–10073

Industrial & Engineering Chemistry Research

ARTICLE

(10) Lee, G.; Han, C.; Yoon, E. S. Multiple-Fault Diagnosis of the Tennessee Eastman Process Based on System Decomposition and Dynamic PLS. Ind. Eng. Chem. Res. 2004, 43, 8037. (11) Lee, G.; Tosukhowong, T.; Lee, J. H.; Han, C. Fault Diagnosis Using the Hybrid Method of Signed Digraph and Partial Least Squares with Time Delay: The Pulp Mill Process. Ind. Eng. Chem. Res. 2006, 45, 9061. (12) Ahn, S. J.; Lee, C. J.; Jung, Y.; Han, C.; Yoon, E. S.; Lee, G. Fault Diagnosis of the Multi-stage Flash Desalination Process Based on Signed Digraph and Dynamic Partial Least Square. Desalination 2008, 228, 68. (13) L€u, N.; Wang, X. Diagnosis Based on Signed Digraph Combined with Dynamic Kernel PLS and SVR. Ind. Eng. Chem. Res. 2008, 47, 9447. (14) Yang, F.; Xiao, D. Y. Probabilistic SDG Model Description and Fault Inference for Large-Scale Complex Systems. High Technol. Lett. 2006, 12, 239. (15) Yang, F.; Xiao, D. Y. Probabilistic SDG Model and Approach to Inference for Fault Analysis. Kongzhi yu Juece (Control Decis.) 2006, 21, 487. (16) L€u, N.; Wang, X. A Probabilistic SDG Approach to Fault Diagnosis of Industrial Systems. Proceedings of the 2007 International Conference on Intelligent Computing, Qingdao, China; Springer-Verlag: Berlin Heidelberg, 2007; p 517. (17) Song, Q. J.; Xu, M. Q.; Wang, R. X. Fault Diagnosis Approach Based on Fuzzy Probability SDG Model and Reasoning. Kongzhi yu Juece (Control Decis.) 2009, 24, 692. (18) Bhushan, M.; Rengaswamy, R. Design of Sensor Location Based on Various Fault Diagnostic Observability and Reliability Criteria. Comput. Chem. Eng. 2000, 24, 735. (19) Bhushan, M.; Rengaswamy, R. Comprehensive Design of a Sensor Network for Chemical Plants Based on Various Diagnosability and Reliability Criteria. 1. Framework. Ind. Eng. Chem. Res. 2002, 41, 1826. (20) Bhushan, M.; Rengaswamy, R. Comprehensive Design of a Sensor Network for Chemical Plants Based on Various Diagnosability and Reliability Criteria. 2. Applications. Ind. Eng. Chem. Res. 2002, 41, 1840. (21) Umeda, T.; Kuriyama, T.; Shima, E.O0 .; Matsuyama, H. A Graphical Approach to Cause and Effect Analysis of Chemical Processing Systems. Chem. Eng. Sci. 1980, 35, 2379. (22) Onisko, A.; Druzdzel, M. J.; Waysylu, K. H. Learning Bayesian Network Parameters from Data Sets: Application of Noisy-OR Gates. Int. J. Approximate Reasoning 2001, 27, 165. (23) Boutilier, C.; Friedman, N.; Goldszmidt, M.; Koller, D. Context-Specific Independence in Bayesian Networks. Proceedings of the Twelfth Annual Conference on Uncertainty in Artificial Intelligence; Morgan Kaufmann: San Francisco, CA, 1996; p 115. (24) Zhang, Q. A New Methodology to Deal with Dynamical Uncertain Causalities (I): The Static Discrete DAG Case. Chin. J. Comput. 2010, 33, 625. (25) Chen, J.; Howell, J. A Self-validating Control System Based Approach to Plant Fault Detection and Diagnosis. Comput. Chem. Eng. 2001, 25, 337. (26) Maurya, M. R.; Rengaswamy, R.; Venkatasubramanian, V. A Systematic Framework for the Development and Analysis of Signed Digraphs for Chemical Processes. 2. Control Loops and Flowsheet Analysis. Ind. Eng. Chem. Res. 2003, 42, 4811. (27) Maurya, M. R.; Rengaswamy, R.; Venkatasubramanian, V. A Signed Directed Graph-based Systematic Framework for Steady-state Malfunction Diagnosis inside Control Loops. Chem. Eng. Sci. 2006, 61, 1790. (28) Zhang, N. L.; Poole, D. A Simple Approach to Bayesian Network Computations. Proceedings of the Tenth Canadian Conference on Artificial Intelligence; Banff, Canada, 1994; p 171.

10073

dx.doi.org/10.1021/ie200016t |Ind. Eng. Chem. Res. 2011, 50, 10062–10073