Rough Set-Based Fuzzy Rule Acquisition and Its Application for Fault

Dec 4, 2008 - Data mining techniques can discover experience, knowledge, and operational ... data based on normal distribution of process variables is...
1 downloads 0 Views 200KB Size
Ind. Eng. Chem. Res. 2009, 48, 827–836

827

Rough Set-Based Fuzzy Rule Acquisition and Its Application for Fault Diagnosis in Petrochemical Process Zhiqiang Geng* and Qunxiong Zhu College of Information Science & Technology, Beijing UniVersity of Chemical Technology, Beijing 100029, P. R. China

Data mining techniques can discover experience, knowledge, and operational rules from a large industrial data set to recognize process abnormal situations or faults, further improve production-level, and optimize operational conditions. In this paper, a rough set-based fuzzy rule acquisition approach and a fault diagnosis scheme of industrial process are studied in detail. A new heuristic reduct algorithm is proposed to obtain the optimum reduction set of decision information system. Moreover, a fuzzy discretization model for continuous data based on normal distribution of process variables is put forward to overcome the subjective of selecting fuzzy membership functions and decrease the sensitivity to noise signals. Furthermore, the proposed data mining algorithm and fault diagnosis scheme are applied into a petrochemical process. The validity of the proposed strategy is verified by application of a practical ethylene cracking furnace system, which can discover abnormal process situations and improve plant safety in petrochemical industry. 1. Introduction Because of the highly complex and integrated nature of petrochemical processes, abnormal situations or faults occur in petrochemical processes due to sensor drifts, equipment failures, or changes in process measured variables. These abnormalities have significant safety and environmental impact. An estimation shows that $20 billion is lost annually from the petrochemical industry in the U.S. because of inadequate abnormal situation management and control.1 Early and accurate faults monitoring and diagnosis of industrial processes can minimize downtime, increase safety of plant operations, and reduce manufacturing costs. Distributed control systems (DCS) which monitor, control, and diagnose process variables such as pressure, flow, and temperature have been implemented for various large-scale process systems that generate many process variable values.2 Operators often find it difficult to effectively monitor the process variables, analyze current states, detect and diagnose process abnormalities, and take appropriate actions to control the processes. To assist plant operators, process operational information must be analyzed and presented in a manner that reflects the important underlying trends or events in the process.3 Intelligent decision support systems (IDSS) that incorporate a variety of artificial intelligence (AI) and non-AI techniques can support and carry out this task. Generally speaking, fault diagnosis techniques include the data-driven-, analytical-, and knowledge-based approaches.4 The most popular data-driven process monitoring approaches include principal component analysis (PCA), Fisher discriminant analysis, partial least-squares analysis (PLS), and canonical variate analysis. Analytical methods can be categorized into the two common methods of parameter estimation and observer-based method.5 In the parameter estimation method, a residual is defined as the difference between the nominal and the estimated model parameters, and deviations in the model parameters serve as the basis for detecting and isolating faults. In the observer* To whom correspondence should be addressed. E-mail: [email protected]. Tel.: +86-10-6442-6960. Fax: +8610-6443-7805.

based method, the output of system is reconstructed using the measured values or a subset of measurements with the aid of observers. Knowledge-based approaches as implemented in automated reasoning systems incorporate heuristics and reasoning, which involve uncertain, conflicting, and nonquantifiable information.6 The artificial intelligence technologies which are associated with knowledge-based approaches and adopted for monitoring, control, and diagnosis in industrial process include expert systems, fuzzy logic, machine learning, and pattern recognition. Fuzzy logic as a mechanism for representing uncertain knowledge has been widely adopted in many engineering applications in recent years.7 It is useful for representing process descriptions such as “high or low”, which are inherently fuzzy and involve qualitative conceptualizations of numerical values meaningful to operators. Fuzzy logic supports representation of variables and relationships in linguistic terms. A linguistic variable is a variable with linguistic meaning which takes fuzzy values, and it is often based on a quantitative variable in the process. For example, the linguistic variable of pipe temperature can take the fuzzy values of “low”, “normal”, “high”, and “very high”, and each fuzzy value can be modeled. Complex process behaviors can be described in general terms without precisely defining the complex phenomena involved. However, it is difficult to determine the correct set of rules and membership functions for a reasonably complex system, and fine-tuning a fuzzy solution can be time-consuming. The rough set theory (RST) is a relatively new mathematical and AI technique introduced by Z. Pawlak 8,9 to cope with imprecise or vague concepts. Recently, people have increased the interests in rough set theory and its applications in many fields, for example, machine learning, pattern recognition, expert system, fault diagnosis, et al.10-12 The technique is particularly suited to reasoning about imprecise or incomplete data and discovering relationships in large database. The main advantage of RST is that it does not require any preliminary or additional information about data-like probability in statistics, basic probability assignment in D-S theory or the value of possibility in fuzzy set theory. It has been successfully applied in many fields, and integrated with other AI techniques to resolve industrial problems in recent years.13-15

10.1021/ie071171g CCC: $40.75  2009 American Chemical Society Published on Web 12/04/2008

828 Ind. Eng. Chem. Res., Vol. 48, No. 2, 2009

In the paper, a rough set-based fuzzy rule acquisition approach and a fault diagnosis scheme of industrial process are studied in detail. A new heuristic reduct algorithm is proposed to obtain the optimum reduction of decision information system. At the same time, the fuzzy discretization model of continuous data based on normal distribution of process variable is put forward to overcome the subjective of selecting fuzzy function and reduce the sensitivity to noise signals. Furthermore, the proposed data mining algorithm and the fault diagnosis scheme are applied into petrochemical process. The validity of the proposed solutions is verified by application of a practical ethylene cracking furnace system. 2. Rough Set Preliminaries RST is a relative new mathematical tool to imprecision, incomplete, vagueness, and uncertainty information.16,17 Every object of the universe is associated with some information in an information system. Objects that are characterized by the same information are in an indiscernible relationship, that is, similar, in view of the available information about them. The philosophy of RST is founded on the assumption that every object on a universe of discourse can be associated with some information such as data or knowledge. For instance, when diagnosing a manufacturing system (universe of discourse), the conditions of system malfunction would provide some insights or information about the performance of the system components (objects). Let IS ) (U,A) be an information system, where U is the universe, a nonempty finite set of objects. A is a nonempty finite set of attributes. ∀a ∈ A determines a function fa: U -> Va, where Va is the set of values of attribute {a}. If R ⊆ A, there is an associated equivalence relation IND(R) ) {(x, y) ∈ U × U| ∀ a ∈ R, fa(x) ) fa(y)}

(1)

The partition of U, generated by IND(R) is denoted U/R. If (x,y) ∈ IND(R), then x and y are indiscernible by attributes from R. The equivalence classes of the R-indiscernibility relation are denoted [x]R. The indiscernibility relation is the mathematical basis of RST. A rough set is the approximation of a vague concept by a pair of precise concepts, called lower and upper approximations. The lower approximation is a description of the domain objects which are known with certainty to belong to the subset of interest, whereas the upper approximation is a description of the objects which possibly belong to the subset. Relative to a given set of attributes, a set is rough if its lower and upper approximations are not equal. Let X⊆U,the R-lower approximationR*(X)and R-upper approximation R*(X)of set Xcan be defined as R*(X) ) {x ∈ U|[x]R ⊆ X}

(2)

R*(X) ) {x ∈ U|[x]R ∩ X * L }

(3)

Let P,Q ⊆ Abe equivalence relation over U, then the positive, negative, and boundary region can be defined as

Table 1. A Decision Information Table U

A

B

C

D

X1 X2 X3 X4 X5 X6

yes yes yes no no no

yes yes yes yes no yes

normal high very high normal high very high

no yes yes no no yes

are the objects or records that are considered, and the columns are the attributes for each of the objects. An important issue in data analysis is discovering dependencies between attributes. Dependency can be defined in the following way. For P,Q ⊆ A, P depends totally on Q, if and only if IND(P) ⊆ IND(Q). That means that the partition generated by P is finer than the partition generated by Q. We say that Q depends on P in a degree K (0 e K e 1), defined as K ) rP(Q) )

|POSP(Q)| |U|

(7)

If K ) 1, Q depends totally on P, if 0 e K e 1, Q depends partially on P, and if K ) 0 then Q does not depend on P. The goal of attribute reduction is to remove redundant attributes so that the reduced set provides the same quality of classification as the original. The reduct and the core are built upon the equivalence relation defined in RST as follows: (i) Q ⊆ P is a reduct of P if Q is independent and if IND(Q) ) IND(P). It is denoted by RED(P); (ii) the core of P is the set of all indispensable relations in P and is denoted by CORE(P). For an information system, a reduct is the essential part of knowledge. Using the reduct, all the basic concepts occurring in the considered knowledge can be defined. Let P and Q be two equivalence relations on universe U. The P positive region of Q denoted by POSP(Q) is a set of objects of U, which can be properly classified to the classes of Q by employing the knowledge expressed by the classification P. According to the definition of the positive region, R ∈ P is said to be Q-dispensable in P, if and only if POSIND(P)IND(Q) ) POSIND(P-{R})IND(Q). Otherwise, R is Q-indispensable in P. If every R in P is Q-indispensable, then P is Q-independent. Thus, the Q-reduct of P is denoted by S, S ⊂ P and POSS(Q) ) POSP(Q). As a result, COREQ(P) ) ∩ REDQ(P), where, REDQ(P) is the subset of all reducts of P. In other words, a reduction is the minimal attribute subset preserving the above condition. However, how to find a minimal reduction subset is an NP-hard problem. For example, the Table 1 is a simple information system. We can use the basic concepts described to obtain the approximations of the given sets. Where U is the universal, A, B, and C are the conditional attributes, D is the decision attribute. Assume the relation R ) {A, B}, and the set X ) {x2, x3, x5}. Then we can obtain the upper approximation, lower approximation, positive region, and boundary region of X. U/IND(R) ) {{x1, x2, x3},{x4, x6},{x5}}; let R1, R2, and R3 be the basic subset of R, and R1 ) {x1, x2, x3}, R2 ) {x4, x6}, R3 ) {x5}, then the relation between X and R is in the following:

POSp(Q) ) ∪ P*(X)

(4)

NEGP(Q) ) U - ∪ P (X)

(5)

X ∩ R1 ) {x2, x3} * L, X ∩ R2 ) L, X ∩ R3 ) R3 ) {x5} * L

BNDP(Q) ) ∪ R*(X) - ∪ R*(x)

(6)

So upper approximation, lower approximation, positive region, and boundary region of set X are in the following.

RST requires the data or information to be organized into a so-called information table. The rows of an information table

R*(X) ) R1 ∪ R2 ) {x1, x2, x3, x5}, R*(X) ) R3 ) {x5}, POSR(X) ) R*(X) ) {x5}, BNR(X) ) R1 ) {x1, x2, x3}

X∈U⁄Q

*

X∈U⁄Q

X∈U⁄Q

X∈U⁄Q

Ind. Eng. Chem. Res., Vol. 48, No. 2, 2009 829

For this information system, we can obtain the attribute’s core and reducts on the basis of the dependence relationship between condition attribute and decision attribute. U/IND(D) ) {{x1, x4, x5}, {x2, x3, x6}}, U/IND(A) ) {{x1, x2, x3}, {x4, x5, x6}}, U/IND(B) ) {{x1, x2, x3, x4, x6}, {x5}} U/IND(C) ) {{x1, x4}, {x2, x5}, {x3, x6}}, U/IND(A,B) ) {{x1, x2, x3}, {x4, x6}, {x5}} U/IND(A,C) ) {{x1}, {x2}, {x3}, {x4}, {x5}, {x6}}, U/IND(B,C) ) {{x1, x4}, {x2}, {x3, x6}, {x5}} U/IND(A,B,C) ) {{x1}, {x2}, {x3}, {x4}, {x5}, {x6}} POS{A,B,C}(D) ) {{x1}, {x2}, {x3}, {x4}, {x5}, {x6}} POS{A,B,C}-{A}(D) ) POS{B,C}(D) ) {{x1, x4}, {x2}, {x3, x6}, {x5}} * POS{A,B,C}(D) POS{A,B,C}-{B}(D) ) POS{A,C}(D) ) {{x1}, {x2}, {x3}, {x4}, {x5}, {x6}} ) POS{A,B,C}(D) POS{A,B,C}-{C}(D) ) POS{A,B}(D) ) {x5} * POS{A,B,C}(D) So the attribute B can be removed from Table 1. The relations A and C are independent. {A, C} is a reduct of the conditional attributes. As there is only one reduct of Table 1, the core of Table 1 is also {A, C}. 3. Fuzzy Discretization Based on Normal Distribution of Process Variable 3.1. Linguistic Variables and Linguistic Terms. Frequently, real world decision-making problems are ill defined, that is, their objectives and parameters are not precisely known. Others have dealt with these obstacles, which are due to lack of precision. But, because the requirements on the data and on the environment are very high and many real world problems are fuzzy by nature and not random, the probability applications have not been very satisfactory in a lot of cases. On the other hand, the application of fuzzy set theory in real world decision-making problems has given very good results. Its main feature is that it provides a more flexible framework, where it is possible to redress satisfactorily many of the obstacles due to lack of precision. The traditional system and the fuzzy system are compatible. Recently, Zadeh promoted “soft calculation”18,19 and tried to integrate the fuzzy theory together with other systems and new technology. The fuzzy system can directly convert the experience, which already is successful in an exact system, into fuzzy knowledge through fuzzy number. It also may be converted into fuzzy knowledge through linguistic variants. Getting fuzzy system knowledge widely expands the channel. In such domains a mix of imprecise numeric information upon which linguistic variables are defined and purely linguistic variables for which there is no formal measurement scale often coexist. In this case, the temperature of a furnace at 1000 °C takes a fuzzy value of “normal” and a membership degree of 0.9 represented as normal, or takes a fuzzy value of “high” and a membership degree of 0.1 represented as high. A linguistic variable can also be qualitative. The linguistic variable of “certainty” can take fuzzy values such as “highly certain” or “not very certain”. The process of representing a linguistic variable as a set of fuzzy variables is called fuzzy quantification. Fuzzy logic systems handle the imprecision of input and output variables directly by defining them with fuzzy memberships and sets that can be expressed in linguistic terms.

3.2. Process Variables Fuzzization Based on Normal Distribution. There is a clearer relationship among operational data when process variables are described by fuzzy logic linguistic terms. The process variable values are discrete and vague. The discretizations satisfy a kind of statistical law, their approximate precisions are only described by mathematical expectations. The process variable values which are sampled by DCS or instruments satisfy the normal distribution, especially error values or probability. So we can use the formula r(x,u) ) 2 2 e-(x-u) /2σ to let it be normal distribution, where r(x,u) is the normal distribution density function, u is the mathematical expectation (it can be substituted by set-point of a process variable), and σ is the variance. Each variable has its normal operational range, and the different set-points are used to describe the process characters. For monitoring and diagnosing process faults, the deviation of each variable between its setpoint can represent the process behaviors, which should be monitored carefully. Because of the variable values’ normal distribution characteristics, the deviation of each variable is a normal distribution. The fuzzization method first defines the fuzzy set and integral areas about the deviation of every process variable and then uses the normal distribution density function’s integral to obtain membership degree of every variable’s deviation. A membership degree is “0” that indicates no membership, while a membership degree is “1” that indicates full membership in the set A. The set A is defined in classical logic (commonly it is referred to as a crisp set); it is a special case of fuzzy set in which only two membership degrees of “0” and “1” are allowed. The proposed fuzzization method is easy to model and operate for process variables. The method is described in detail as follows. Let p be the measurable value of a process variable, E be the dynamic variable range comparing with the set point, where pmax > 0, E ) [-pmax, pmax], and p ∈ E. In our method, we classify the measurable value (that is P) into seven ranks: positive big (PB), positive medium (PM), positive small (PS), zero (ZO), negative small (NS), negative medium (NM), negative big (NB). That is to say, the fuzzy set is V ) (PB, PM, PS, NS, ZO, NM, NB) to every p ∈ E; the V’s membership function is µj(p) (j ) 1, 2, . . ., 7). We use formula u ) f(p) ) (p - pmin)/(pmax - pmin) to transfer p (p ∈ E) into U ∈ [0,1]. For a fuzzy set V ) (Vj, j ) 1, 2, . . ., n), its fuzzy subset Vj’s fuzzy integral areas are defined by [xj - Vj, xj + Vj]. The integral areas may partition proportional spacing in a definition region, they also may not be proportional. If a membership of set u is µj(u), then the process variables’ fuzzization model can be obtained by the formula as follows: µj(u) )

1 2Vj



xj+Vj

xj-Vj

r(x, u) dx )

1 2Vj



xj+Vj

xj-Vj

(

exp -

(x - u)2

)

dx, 2σ2 (j ) 1, 2, · · · , n) (8)

4. Fuzzy Rule Acquisition Based on Rough Set 4.1. Attribute Reduct Based on Rough Set. Rough set approach to attributes (or features) selection can be based on the minimal description length principle and tuning methods of parameters of the approximation spaces to obtain high quality classifiers based on selected attributes. One can distinguish two main steps in this approach. In the first step, by using boolean reasoning relevant kinds of reductions from given data tables are extracted. These reductions are preserving exactly the discernible constraints. In the second step,

830 Ind. Eng. Chem. Res., Vol. 48, No. 2, 2009

by means of parameter tuning reduction approximations are extracted. These reduction approximations allow for shorter concept description than the exact reductions and they are still preserving the constraints to a sufficient degree to guarantee the results.20,21 Using rough sets for attribute selection was proposed in several contributions.22 The simplest approach is based on calculations of a core for a discrete attribute data set, containing strongly relevant attributes, and reducts containing a core plus additional weakly relevant attributes, such that the determination of the concepts in the data set by each reduct is satisfactory. On the basis of a set of reducts for a data set some criteria for feature selection can be formed, for instance, a reduct containing a minimal set of attributes. Selecting an optimal reduction R from all subsets of attributes is a difficult task in that it is known that selecting the optimal reduction from all of possible reductions is NPhard. The problem of attribute subset selection will become how to select the attributes from dispensable attributes to form the best reduction with CORE. We use the CORE as an initial reduction subset, then add the other attributes one by one, and compare them by selecting the algorithm to form the best reduction set. 4.2. A Proposed Heuristic Reduct Algorithm. To select an attribute subset from a large database with a lot of attributes, we selected the best attributes one by one from until an optimum reduct is found. Many evaluation criteria are used to get the minimal reduct such as relevance of attributes, criteria based on mutual information, inconsistency count, interclass, minimum concept description, and so on. In this paper, we used the inconsistency count and the gain of mutual information (MI) criteria to reduce the abundant attributes. The significance of an attribute can be evaluated by measuring the effect of removing the attribute from an information table. The number γ(C,D) expresses the degree of dependency between attributes C and D, or the accuracy of the approximation of U/D by C. So the significance of attribute {a} is defined by σ(C,D)(a) )

γ(C - {a}, D) (γ(C, D) - γ(C - {a}, D)) )1γ(C, D) γ(C, D) (9)

Coefficient σ(C,D)(a) can be understood as a classification error which occurs when attribute {a} is dropped. The significance coefficient can be extended to sets of attributes as follows: σ(C,D)(B) )

γ(C - B, D) (γ(C, D) - γ(C - B, D)) )1γ(C, D) γ(C, D) (10)

Another possibility is to consider as relevant the features that come from approximate reducts of sufficiently high quality. Any subset B of C is called an approximate reduct of C and the number ε(C,D)(B) )

γ(B, D) (γ(C, D) - γ(B, D)) )1γ(C, D) γ(C, D)

(11)

where ε(C,D)(B) is called an error of reduct approximation. It expresses how exactly the set of attributes B approximates the set of condition attributes C with respect to determining D. Information entropy theory supplies a different measure of variability for the probability distribution. The entropy is defined by k

info(U) ) I(P) ) -

∑ p log p i

i)1

i

(12)

where U is the set of objects. If a set U of objects is partitioned into disjoint exhaustive classes {Y1, Y2, . . ., Yk} on the basis of the value of decision attribute, then the information needed to identify the class of an element of U, P, is the probability distribution of the partition {Y1, Y2, . . ., Yk}.

(

)

|Yk| |Yi| |Y1| |Y2| , , · · ·, (13) , Pi ) |U| |U| |U| |U| where pi is the probabilities of all the different values that the random variable can have and |/| denotes a cardinality of a set, where we assume a discrete random variable in order to avoid the technicalities associated with the calculation of entropies for continuous distributions. The MI is a measure of the strength of association between two random variables. The MI, between the stimuli S and the neural responses R is defined in terms of their joint distribution p(S, R). When this distribution is known exactly, the MI can be calculated as p)

MI(S, R) )



s∈S,r∈R

p(s, r) log

p(s, r) p(s) p(r)

(14)

where p(s) ) ∑r∈R p(s,r) and p(r) ) ∑s∈S p(s,r) are the marginal distributions over the stimuli and responses, respectively. It is easy to use the MI to test the significant association between the two variables. For a decision table, we just consider which one attribute is significant to the decision attribute. So we can use the gain of MI to judge the significance of one attribute to another. Assume inform ) 〈,C ∪ D, V, f 〉 is a decision table, C is the condition attribute, D is the decision attribute, R ⊂ C, and add one attribute a ∈ C into R, then the gain of MI is gain ) MI(R ∪ {a};D) - MI(R;D) (15) If a condition attribute has the greatest information gain, this condition attribute will be important to the decision attribute. As we all know, selecting attributes should have large cover rate and higher occupancy. So selecting attributes should contain as many instances as possible and should contain as little features as possible if they contain the same number of instances.23 Through our research, we have found that the maximal size of elements from POSR(D)/IND({R,D}) is related to the cover rate and the significance of attribute. Therefore, an indiscernibility class with the maximal size is extracted from POSR(D) during the attribute selection. If an attribute has more values, the more subsets are usually received. Let us consider the positive region, card(POSR(D)) is equal to the number of consistent instances. And the max _card(POSR(D)/IND({R,D})) denotes the maximal size out of indiscernibility classes included in the positive region. So the method for attribute selection can succeed according to the above discussions. When we select a given attribute {a} and add it to the subset R of attributes, the card(POS(R∪{a})(D)) increases faster, and the max _card(POS(R∪{a})(D)/IND({R ∪ {a},D}) is larger than if we had added any other attribute, that is to say, the attribute {a} can be added into the reduction subset. Reference 23 proposes a heuristic algorithm, which uses the cover rate and inconsistency count as the heuristic information to reduce abundant attributes, but cannot deal with the attributes which have the same max_card values. However, if the gains of MI from every variable’s decision attribute are considered, it is easy to overcome the problem of which attribute is to be selected. The proposed algorithm is described in detail as follows.

Ind. Eng. Chem. Res., Vol. 48, No. 2, 2009 831

A heuristic algorithm: Let R be a set of selected condition attributes, C is the conditional attribute set, and R ⊆ C, D is the decision attribute. L denotes a set unselected condition attributes, and T(L) is the power set of L, Ti(L) (i ) 1,2, . . ., m) is the ith power subset of L. U denotes a set of all instances, X denotes a set of contradictory instances, and EXPECT denotes an accuracy threshold also called a quality function according to some user-defined criteria. RED denotes the attribute reducts’ set. First, assume R ) CORE(C,D), L ) C - CORE(C,D), k ) 0. Step 1. Let X ) U - POSR(D) Step 2. If k g EXPECT, where k ) r(C,D) ) |POScD|/|U| then stop. Else if POSR(D) ) POSC(D) then stop. Step 3. Let i ) 1, flag ) 0, Z ) A ) Φ, Y ) Ti(L); Step 4. Arbitrarily select y ∈ Y, A ) R ∪ y, if POSA(D) ) POSC(D), then {if flag ) 0, then {Z ) A, flag ) 1;}; else if card(U/Z) > card(U/A), then Z ) A}. Step 5. If flag ) 1, then calculate Vy ) card(POS(R∪y)(D)) and Step 6. Choose the best attribute y, that is, with the largest value of Vy × My, let Y ) Y - y, R ) R ∪ y, and then let RED ) R. Step 7. If the same max_card values of different attributes are found, then MI(D|{aj}) is calculated. Let Mj ) max(MI(D|{aj})), so the max mutual information {aj} with respect to D is selected, and then let RED ) RED ∪ {aj}. Step 8. Let i ) i + 1, then go to Step 1. Generally speaking, to search the minimal reduct is NP-hard. The reduct algorithm’s complexity is decided by the number of attributes’ combination and the number of objects in an information system. When the number of attributes and the number of objects are small, the searching space may be not so large, but it grows exponentially when one of two numbers increases. In general, given a search space, the more you search it, the better the subset you can find. However, the computer resource is not unlimited, we have to sacrifice in optimality of selected subsets. So we must keep the optimality of an attribute subset as much as possible while spending as little search time as possible. Heuristic search is obviously much faster than exhaustive search since it only searches a part of subsets and finds a nearoptimal subset. There are two characteristics: one is that it is not necessary to wait until the search ends, the other is that it is unknown when the optimal set shows up, although a better one is known when it appears there. The proposed algorithm’s time complexity includes four parts computing time-consumption: (1) the time of calculating CORE (C,D) need compute |C| times, that is, the time complexity of this part is O(|C|); (2) the time of calculating the positive region need (|C| + |C-1| +. . . , + 1) ) |C| × (|C|+1)/2 times, that is, the complexity of this part is O(|C|2); (3) the time of calculating γ(C,D) (significance of attribute) need (|C||U|2), namely the time complexity of this part is O(|C||U|2); (4) the time of calculating the gain of MI need (|P||U|2), P is the number of attributes which remain after reduction of the decision table and cannot distinguish the max_card of attributes with respect to decision attribute. The worse situation is |P| ≈ |C|; that is, the worse time complexity of this part is O(|C||U|2). So in the worse situation the entire time complexity of the algorithm is O(|C|2) × O(|C||U|2) ) O(|C|3 × |U|2). Now we illustrate the algorithm from the given example. From Table 2 we can get the following equivalent classes. In

Table 2. An Information Decision Table U

a

b

c

d

e

f

X1 X2 X3 X4 X5 X6 X7 X8 X9

1 1 1 1 2 2 2 1 2

0 0 2 2 1 1 1 2 2

2 2 0 2 0 1 2 1 2

1 0 0 1 0 0 1 0 2

0 0 0 0 1 0 0 0 2

1 1 2 0 2 2 1 2 1

Table 3. Select {a} U

a

b

f

X3 X4 X5 X6 X7 X8 X9

1 1 2 2 2 1 2

2 2 1 1 1 2 2

2 0 2 2 1 2 1

U

b

c

f

X3 X4 X5 X6 X7 X8 X9

2 2 1 1 1 2 2

0 2 0 1 2 1 2

2 0 2 2 1 2 1

U

b

d

f

X3 X4 X5 X6 X7 X8 X9

2 2 1 1 1 2 2

0 1 0 0 1 0 2

2 0 2 2 1 2 1

U

b

e

f

X3 X4 X5 X6 X7 X8 X9

2 2 1 1 1 2 2

0 0 1 0 0 0 2

2 0 2 2 1 2 1

Table 4. Select {c}

Table 5. Select {d}

Table 6. Select {e}

Table 2, {a, b, c, d, e} is the condition attributes and {f} is the decision attribute. U/{b} ) {{x1, x2}, {x5, x6, x7}, {x3, x4, x8}} U/{f} ) {{x4}, {x1, x2, x7, x9}, {x3, x5, x6, x8}} We can obtain attribute {b} of the positive region of {f}: POSb(f) ) {x1, x2}. Assume R ) CORE ) {b}, L ) {a, c, d} and X ) {x3, x4, x5, x6, x7, x8, x9}. Assume EXPECT ) 1, the termination condition will be k g 1. Since k ) r(C,D) ) |POScD|/|U| ) 2/9 < 1, R is not a reduct, so we must continue to select condition attributes. The next candidates are {a}, {c}, {d}, or {e}. Table 3, Table 4, Table 5, and Table 6 give the results of adding {a}, {c}, {d}, and {f} to R, respectively.

832 Ind. Eng. Chem. Res., Vol. 48, No. 2, 2009 Table 7. Comparisons of Different Reduct Algorithm for Five Databases selected attributes_n data set

sample_n

attribute_n

genetic algorithm

dynamic reducts

our algorithm

Monk1 Heart Disease Mushroom Breast Cancer Slope Collapse

124 294 8124 699 3436

6 14 22 10 24

3 or 4 3 or 4 or 5 5 or 6 or 7 5 or 6 or 7 9 or 10 or 11

3 or 4 3 or 4 or 5 or 6 5 or 6 or 7 or 8 5 or 6 or 7 9 or 10 or 11 or 12

3 3 5 5 9

Table 8. The Key Process Measured Variables of Ethylene Cracking Furnace System ID

measured variables

descriptions

medium

units

normal arrange min/normal/max

1 2 3 4 5 6 7 8 9 10 11

FICA1061 FICA1062 FICA1063 FICA1064 FICA1065 FICA1066 FICA1067 FICA1068 FI1069 FI10613 TI10624A

flow-rate of A group oil inlet flow-rate of B group oil inlet flow-rate of C group oil inlet flow-rate of D group oil inlet steam flow-rate of A group steam flow-rate of B group steam flow-rate of C group steam flow-rate of D group fuel flow-rate of sidewall fuel flow-rate of bottom COT of A group

NAP NAP NAP NAP DS DS DS DS FG FG vapor of cracking

kg/h kg/h kg/h kg/h kg/h kg/h kg/h kg/h Nm3/h Nm3/h °C

4988/7125/7838 4988/7125/7838 4988/7125/7838 4988/7125/7838 2500/3563/4275 2500/3563/4275 2500/3563/4275 2500/3563/4275 -/2080/2500 -/3130/3760 s/837∼842/855

Table 9. The Fuzzization Table of Process Measured Variables ID

FICA1061

FICA1062

FICA1063

FICA1064

FICA1065

FICA1066

FICA1067

FICA1068

FI1069

FI10613

COT

1 2 3 4 5 6 7 8 9 10

PS PS PS PS ZO PS PS PS PS PS

NS NS NS PS PS NS NS NS NS NS

NS NS NS NS NS NS NS NS NS NS

NS NS NS NS PS NS NS NS NS NS

NS NM NS NM NM NS NM NS NS NM

NM NM NM NM NM NM NM NM NM NM

NM NM NM NM NM NM NM NM NM NM

NS NS NS NS NS NS NS NS NS NS

PS PS PS ZO ZO ZO NS ZO ZO ZO

NS NS NS NM NM NB NM NS NS NM

PB PB PB PB PB PB NB NB NB NB

From Tables 3,-6, we obtain the following families of equivalent classes:

|ψj|. The obtained rules are obvious because the decision table is very simple. So the rules can be obtained as follows.

U/{f} ) {{x3, x5, x6, x8}, {x4}, {x7, x9}}, U/{a, b} ) {{x3, x4, x8}, {x5, x6, x7},{x9}}

(b,0) f (f,1)(1,0.5), (b,2)∧(d,0) f (f,2)(1,0.5)

U/{b, c} ) {{x3}, {x4, x9}, {x5}, {x6}, {x7}, {x8}}, U/{b, d} ) {{x3, x8}, {x4}, {x5, x6},{x7},{x9}} U/{b, e} ) {{x3, x4}, {x5}, {x6, x7}, {x8}, {x9}} POS{a,b}(f) ) {x9}, max_card(POS{a,b}(f)/{a, b, f}) ) 1 POS{b,c}(f) ){x3, x5, x6, x7, x8}, max_card(POS{b,c}(f)/ {b, c, f}) ) 1 POS{b,d}(f) ) {x3, x4, x5, x6, x7, x8, x9}, max_card(POS{b,d}(f)/ {b, d, f}) ) {x5, x6} ) {x3, x8} ) 2 POS{b,e}(f) ){x3, x4, x5, x8, x9}, max_card(POS{b,e}(f)/ {b, e, f}) ) {x3, x8} ) 2 MI({b, d}; f) ) 0.258, MI({b, e}; f) ) 0.172 So the gain of adding {d} is bigger than adding {e}, then {d} should be selected into RED. The attributes’ values can be considered as discrete values in Table 2, which need not be discretized. The reduct results, that is, Table 5, can be considered as the approximation of Table 2. And then using the attribute value reduct algorithm which was proposed in the literature,24,25 we can obtain the decision rules. Generally speaking, a decision rule with two parameters that are the confidential degree and coverage degree, is defined by φi(i ) 1, 2, . . ., n) f ψj (j ) 1, 2, . . ., m)(con, cov), where con ) |φi∧ψj|/|φi| and cov ) |φi∧ψj|/

(b,2)∧(d,1) f (f,0)(1,1), (b,1)∧(d,0) f (f,2)(1,0.5) (b,1)∧(d,1) f (f,1)(1,0.25), (d,2) f (f,1)(1,0.25) From Tables 3-6, we can see that choosing feature {a} cannot reduce the number of contradictory instances, but if selecting {c}, {d}, and {e}, then all instances become consistent. According to our algorithm, the maximal set is in U/{b, d, f} and U/{b, e, f}. Then the gains of MI about {d} and {e} are computed, and the larger gain of MI is selected, so the attribute {d} should be selected. Finally the selected attributes’ subset is {b, d} and, at the same time, POS{b,d}{f}/ U ) 1. To verify the algorithm’s validity, we select several databases, namely, Monk1, Heart Disease, Mushroom, Breast-Cancer, and Slope Collapse, in the known UCI machine learning repositories (it can be downloaded from: http://www.ics.uci.edu/∼ml/) to verify it. We compare two reduct algorithms, namely, the dynamic reduct 26,27and standard genetic algorithm.28,29 The genetic algorithm’s fitness function f is defined by f(B) ) (1 - R)

cos t(A) - cos t(B) + cos t(A)

{

Rmin ε,

}

|[sin S|S ∩ B * L | (16) |S|

where S is the set of sets corresponding to the discernibility function. The parameter R defines a weighting between subset cost and hitting fraction, while ε is relevant in the case of approximate solutions. The two algorithms are used in the

Ind. Eng. Chem. Res., Vol. 48, No. 2, 2009 833

Figure 1. Diagnosis process diagram based on a rough set.

software Rosetta, it can be download from the Rosetta home page: http://www.idi.ntnu.no/∼rosetta/. The reduct results of the two algorithms were obtained by running Rosetta. The comparisons are showed in Table 7. From numerical experiments and comparisons, the standard genetic algorithm and dynamic reducts algorithm obtain several reduct sets; one cannot decide which one is the optimal. So the proposed method of attribute selection has been proved to be an effective approach and a potential application in dealing with information intelligently. 5. Applying the Fault Diagnosis of Ethylene Cracking Furnace 5.1. Fault Diagnosis Approach Based on Rough Set. In this section, we will use the proposed method to monitor and diagnose faults and propose a diagnosis scheme in the actual petrochemical process industry. The developed diagnosis system of ethylene cracking furnace has been used in a refinery of China. The diagnosis scheme for the petrochemical industry based on a rough set is described in detail as shown in Figure 1. The detecting and diagnosis process mainly includes four parts: data sampling, rough set method, faults detecting, and diagnosis and faults maintaining or restoration. (1) Data sampling. The data includes process variables of industrial process, history database, distributed control system (DCS), operational experience, expert knowledge, etc. To describe the industrial process exhaustivly, we should collect as much relative data as possible, which will include monitoring variables and relative variables in a suitable sampling period. (2) Rough set method. By using the heuristic algorithm to reduce the decision information table, we can obtain the monitoring and diagnosis fuzzy rule set to build a knowledgebased diagnosis repository, which mainly includes process variables fuzzy discretization, fuzzy rule acquisition, and rules analysis. The fuzzy rules acquisition process diagram is shown in detail in Figure 2. The acquisition process mainly includes two parts, data reprocessing and the mining algorithm. The details are explained step by step as follows: Step 1. Set up an information (condition-decision) table using domain knowledge, namely data’s sampling. Step 2. Select conditional attributes and decision attributes and then construct the decision table. Step 3. Discretize the attributes’ values by fuzzy number if the values are continuous and then rebuild the new decision table. Step 4. Partition the decision table into two decision tables so that one is for training, the other is for testing. Step 5. Determine the reduct algorithms of the condition attributes with respect to the decision attribute.

Figure 2. The fuzzy rules acquisition process diagram.

Figure 3. The ethylene cracking furnace system.

Step 6. Compute the value reduct of each decision fuzzy rule and eliminate redundant rules. Step 7. Estimate the mining results and generate the minimal fuzzy rule sets. (3) Fault detecting and diagnosis. This part includes online data reprocessing, incremental rule acquisition, and rule learning in knowledge repository. Online data reprocessing deals with the process variable value including data filtering and data rectification to ensure the reliability and validity. Incremental rule acquisition is finding the diagnosis rule out of the

834 Ind. Eng. Chem. Res., Vol. 48, No. 2, 2009

Figure 4. The basic control system.

knowledge repository, and by the rule learning method add knowledge into repository to use in the future for diagnosing the industrial process. (4) Faults maintaining or restoration. If the faults are detected and malfunction causations are found, operators can dispose of malfunctions by maintaining or restoring techniques to guarantee safe production in the plant. 5.2. Process Description of Cracking Furnace System. Cracking furnace is an important and key unit of olefin processing, which involves endothermic pyrolysis reactions.30,31 It is carried out in large, gas-fired furnaces containing parallel tubular reactor coils. The furnaces consist of three major sections: the convection section, the cracking reactor in the radiant box, and the transfer line exchange (TLE) section. The basic structure schematic diagram of a classical ethylene cracking furnace is showed in Figure 3. The cracking furnace reactor system consists of a number of parallel tubular reactors, residing in and sharing a common furnace. The product is combined at a manifold before being chilled by an exit heat exchanger. Multiple small-bore reactors facilitate better heat transfer as compared with a single large bore reactor and allow for better process control, resulting in a better overall product quality. However, because of the physical connections at the common header and manifold, significant process interactions are also present. The primary reaction in the reactor is endothermic, with many byproducts in the tube reactors. The cracking of hydrocarbons within the temperature range 800-850 °C produces a mixture of hydrogen, methane, ethylene, propylene, butenes, butadiene, aromatics, acetylenes, etc. The hydrocarbon feedstock and its diluent steam are preheated in the convection section to 600-650 °C and then fed to the radiant section where the main cracking reactions occur. The effluent is cooled in a few milliseconds in the TLE in order to stop the secondary reactions. The least desirable byproducts deposit on the inside of the reactor tubes and form a solid layer. This deposition impedes the flow of process gases and reduces heat transfer efficiency. Various abnormal states often accelerate solid layer deposition, even sometimes resulting in early system abnormal shutdown. Whether the operation is stable or not will affect the quality and quantity of total ethylene production and also affect the stable operation of downstream equipments (equipments of aromatic hydrocarbon). Therefore, it is important to monitor early occurring faults of the ethylene cracking furnace to provide a valid message to operators and managers to reduce the number

of alarms and ensure that the cracking furnace runs stably. Therefore, early prediction and diagnosis of an abnormal state are very important for industrial process safety. 5.3. Process Control Strategy. In the endothermic reaction system, the feed rate is manipulated to regulate the reaction temperature. Feed also serves as a heat sink; heat entering the reactors in excess of that required for reaction is carried off. Each reactor in the furnace box is controlled independently of the others. The product composition is analyzed, in which the analyzers are part of the closed loop. It is very important that only correctly analyzed and validated estimates for compositions are used for model parameter adjustment. These are used to determine the degree of reaction that, in turn, is used to set the reaction temperature set point. The furnace box has a single fuel control system. It is normally set to maintain a constant heat input to the furnace. The basic control strategy is shown in Figure 4. 5.4. Process Monitoring and Diagnosis of Key Variables. The key variables in the cracking furnace system consist of total charge of raw oil, flow-rate of oil inlet for four groups, steam flow-rate for four groups, coil outlet temperature (COT) for four groups, average temperature of four COTs, and the flow-rate of fuel, etc. The key process measured variables will be monitored are listed in Table 8. COTs are important key variables to identify the states of a cracking furnace. In this case, we just detect the faults of variables that effect COT in an actual cracking furnace system. We use the proposed fuzzy discretization method to discretize the continuous variable values, and adopt the seven fuzzy number ranks to describe the process variables’ deviation compared to the set points; that is, every process variable’s situation can be expressed by fuzzy set V ) (PB, PM, PS, ZO, NS, NM, NB). The operatiing data from one period were selected to analyze the process variables; when COT is NB and PB, the cracking furnace was considered as abnormal states. The abnormal situations were selected to compose the fault information table. From one data period, we find 256 original records as the data mining source that are fuzzy discretized by using seven fuzzy ranks, from which only two situations are considered (NB and PB) as shown in Table 9. The proposed rough set-based integration scheme is used to reduce the information table, and mine diagnose decision rules and build the diagnosis knowledge repository. For one operating period, we find 14 rules as follows, where we just consider the degree of confidence (strength of rule) is bigger than 0.5, and simultaneously the coverage of the rule is bigger than 0.3, namely, when the “con” is less than 0.5, or the “cov” of a rule is less than 0.3, then the fuzzy rule is neglected. Rule 1: If FICA1067 is NS and FI1069 is NS and FI10613 is NM then COT is NB Rule 2: If FICA1064 is NM and FICA1066 is NS and FI10613 is NM then COT is NB Rule 3: If FICA1062 is PS and FICA1065 is NS and FI10613 is NM then COT is NB Rule 4: If FICA1062 is PS and FICA1066 is NS and FI10613 is NM then COT is NB Rule 5: If FICA1061 is PS and FICA1066 is PS and FI10613 is NM then COT is NB Rule 6: If FICA1061 is PS and FICA1067 is NS and FI10613 is NS then COT is NB Rule 7: If FICA1062 is PS and FICA1067 is NM and FI10613 is NB then COT is NB Rule 8: If FICA1063 is NM and FI1069 is PS then COT is PB

Ind. Eng. Chem. Res., Vol. 48, No. 2, 2009 835

Rule 9: If FICA1067 is NS and FI10613 is PS then COT is PB Rule 10: If FICA1065 is NS and FI1069 is PS then COT is PB Rule 11: If FICA1066 is NS and FI1069 is PS then COT is PB Rule 12: If FICA1067 is NS and FI1069 is PS then COT is PB Rule 13: If FI1069 is PS and FI10613 is PS then COT is PB Rule 14: If FICA1061 is NM and FI10613 is PS then COT is PB In practice, we first select the moving data window to monitor the process variables, and fuzzy discretization is used with the variable values. When the average COT is PB or NB, the diagnosis rules are matched with operational conditions. One can find early the variables’ faults and take appropriate actions to reduce the abnormal situations or alarms. As we all know, process monitoring and safety control are very important to modern plants. Many monitoring techniques and faults diagnosis and isolation (FDI) systems are studied and have been recently developed. Statistical process monitoring, that is, principal component analysis (PCA), fisher discriminant analysis, partial least-squares analysis (PLS), and so on are popular multivariate statistical process control (MSPC) techniques. They can detect the malfunctions efficiently, but it is difficult to find the causations of one abnormal state, so they have to combine other techniques for diagnosing the faults, for example, expert system, artificial neural network, and so on. At the same time, these techniques are very sensitive to the noise of process data, because they are data-driven. However, the proposed fuzzy discretization method is robust to process noise, and can monitor the abnormal states early. The model-based monitoring method is popularly used in the industrial process, but it requires an extensive understanding of the process, and it is very difficult to build a precise mathematical model of complex systems. Expert systems and signal directed graph (SDG) techniques are qualitative modeling techniques of complex processes, but they need the thorough knowledge of process systems. It is very difficult and timeconsuming to acquire diagnosis knowledge of a complex system, but on the other hand, reasoning knowledge can result in explosions within the large complex systems. So data mining techniques do not need to build the complex mathematical model and can put forward diagnosis knowledge automatically which is extracted from data to overcome the “bottleneck” of knowledge acquisition of expert systems. Therefore, the proposed diagnosis scheme and fuzzy rule acquisition algorithms based on rough set are effective in monitoring and diagnosing a petrochemical process for a safe production plant. 6. Conclusion This paper studies the fuzzy rule acquisition method and process diagnosis scheme, which is an efficient technique to monitor and diagnose malfunctions for modern petrochemical processes. The rough set and fuzzy set are used successfully in different fields. Although, the proposed technique has been tested by practical industrial processes, how to improve the diagnosis precision of process variables and how to improve the diagnosis speed need much further research. Moreover, how to integrate rough set with other AI techniques efficiently and how to apply this in the process industry to improve plant safety need be studied deeply not only in theory but also in applications.

Acknowledgment We would like to acknowledge the generous financial support of HI-Tech Research and Development Program of China (No.2007AA04Z170), National Natural Science Foundation of China (No. 60774079), and Qingnian Natural Science Foundation of BUCT (No. QN0626). Literature Cited (1) Ian, N. Adequately address abnormal situation operations. Chem. Eng. Prog. 1995, 91 (9), 36–45. (2) Varanon, U.; Chan, C.,W.; Paitoon, T. Artificial intelligence for monitoring and supervisory control of process systems. Eng. Appl. Artif. Intell. 2007, 20, 115–131. (3) Rengaswamy, R. Venkatasubramanian, V. An Integrated Framework for Process Monitoring, Diagnosis, and Control using Knowledge-based Systems and Neural Networks; IFAC Workshop, Newark, Delaware, 1992; IFAC: Oxford, U.K., 1992; pp 49-54. (4) Chiang, H. L.; Russell, L. E.; Braatz, D. R. Fault Detection and Diagnosis in Industrial Systems; Springer: London, 2001. (5) Garcia, A. E.; Frank, M. P. On the relationship between observer and parameter identification based approaches to fault detection. Proceedings of the 13th IFAC World Congress; Piscataway, NJ, 1996; IFAC: Oxford, U.K, 1996; Vol. N, 2529. (6) Luo, X.; Zhang, C.; Jennings, R. N. A hybrid model for sharing information between fuzzy, uncertain and default reasoning models in multiagent systems. Int. J. Uncertainty, Fuzziness Knowledge-Based Syst, 2002, 10 (4), 401–450. (7) Aggarwal, K. R.; Xuan, Y. Q.; Johns, T. A.; Li, F.; Bennett, A. Novel approach to fault diagnosis in multicircuit transmission lines using fuzzy ART map neural network. IEEE Trans. Neural Networks 1999, 10, 1214–1221. (8) Zdzislaw, P.; Andrzej, S. Rough sets and Boolean reasoning. Inf. Sci. 2007, 177, 41–73. (9) Zdzislaw, P.; Andrzej, S. Rough sets: Some extensions. Inf. Sci. 2007, 177, 28–40. (10) Jaroslaw S.; Katarzyna K. Hybrid classifier based on rough sets and neural networks. Proc. Electron. Notes Theor. Comput. Sci. 2003, 82, No. 4. (11) Lisxiang, S.; Francis; Tay, E. H.; Liangshen, Q.; Yudi, S. Fault Diagnosis Using Rough Sets Theory. Comput. Ind. 2000, 43, 67–72. (12) He, Y.; Hu, S. A decision analysis method based on rough-fuzzy sets integration model. Control Decis. 2004, 19 (3), 315–318. (13) Irizarry, R. Fuzzy classification with an artificial chemical process. Chem. Eng. Sci. 2005, 60, 399–412. (14) Tzu-Liang, Bill; Tseng, Yongjin; Kwon, Yalcin; Ertekin, M. Feature-based rule induction in machining operation using rough set theory for quality assurance. Robot. Comput.-Integrat. Manuf. 2005, 21, 559–567. (15) Mario, R. R. Integrating rough sets and situation-based qualitative models for processes monitoring considering vagueness and uncertainty. Eng. Appl. Artif. Intell. 2005, 18, 617–632. (16) Wang, Q.; Li, J. A rough set-based fault ranking prototype system for fault diagnosis. Eng. Appl. Artif. Intell. 2004, 17, 909–917. (17) Xiangyang, Wang; Yang, Jie; Xiaolong, Teng; Weijun, Xia; Jensen, R. Feature selection based on rough sets and particle swarm optimization. Pattern Recognit. Lett. 2007, 28, 459–471. (18) Zadeh, L. A. A new direction in AI: Toward a computational theory of perceptions. AI Mag. 2001, 22, 73–84. (19) Zadeh, L. A. Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets Syst. 1997, 19, 111–127. (20) Ying-Chieh, T.; Ching-Hsue, C.; Jing-Rong, C. Entropy-based fuzzy rough classification approach for extracting classification rules. Expert Syst. Appl. 2006, 31, 436–443. (21) Li, T.; Ruan, D.; Geert, W.; Song, J.; Xu, Y. A Rough Sets based Characteristic Relation Approach for Dynamic Attribute Generalization in Data Mining. Knowledge-Based Syst. 2007, 20, 485–494. (22) Tseng, T. L.; Huang, C. C. Rough set-based approach to feature selection in customer relationship management. Omega 2007, 35, 365– 383. (23) Ning, Z.; Juzhen, D.; Setsuo, o. Using Rough sets with Heuristics for Feature selection. J. Intell. Inf. Syst. 2001, 16, 199–214. (24) Wang, G. Y.; Fisher, P. S. Rule generation based on rough set theory. Data mining and knowledge discovery: theory, tools and technology. Proc. SPIE 2000, 4057, 181–189.

836 Ind. Eng. Chem. Res., Vol. 48, No. 2, 2009 (25) Wang, G. Y.; Wu, Y.; Liu, F. Generating rules and reasoning under inconsistencies. IEEE International Conference on Industrial Electronics, Control and Instrumentation, IECON’2000: Nagoya, Japan, 2000, 2536-2541. (26) Bazan., J. G.; A comparison of dynamic and non-dynamic rough set methods for extracting laws from decision tables. In Rough Set in Knowledge Discovery: Methodology and Applications; Physica-Verlag: Heidelberg, Germany, 1998; Chapter 17, pp 321-365. (27) Bazan, J. G.; Skowron, A.; Synk, P. Dynamic reducts as a tool for extracting laws from decision tables. Methodologies for Intelligence Systems in Lecture Notes in Artificial Intelligence; Springer-Verlag, Heidelberg, Germany, 1994. (28) Vinterbo, S.; Øhrn, A. Minimal approximate hitting sets and rule templates. Int. J. Approx. Reason. 2000, 25 (2), 123–143.

(29) Wroblewski, J. Finding minimal reducts using genetic algorithms. Proc. Int. Joint Conf. Inf. Sci., 2nd 1995, 186–189. (30) Misra, M.; Henry, H.; Yue, S.; Joe, Q.; Ling, C. Multivariate process monitoring and fault diagnosis by multi-scale PCA. Comput. Chem. Eng. 2002, 26, 1281–1293. (31) Kampjarvi, P.; Sourander, M.; Tiina, K. Fault detection and isolation of an on-line analyzer for an ethylene cracking process. Control Eng. Pract. 2008, 16, 1–13.

ReceiVed for reView August 29, 2007 ReVised manuscript receiVed January 25, 2008 Accepted October 27, 2008 IE071171G