Self-Organizing Self-Clustering Network: A Strategy for Unsupervised

Jump to Unsupervised Pattern Classification Scheme - In unsupervised pattern classification, a system performs both training and operational processes...
0 downloads 0 Views 4MB Size
Ind. Eng. Chem. Res. 2008, 47, 4209–4219

4209

Self-Organizing Self-Clustering Network: A Strategy for Unsupervised Pattern Classification with Its Application to Fault Diagnosis Bharat Bhushan and Jose A. Romagnoli* Gordon A. and Mary Cain Department of Chemical Engineering, 110 Chemical Engineering, South Stadium Road, Louisiana State UniVersity, Baton Rouge, Louisiana 70803

In this work, we propose a method for unsupervised pattern classification called self-organizing self-clustering network. This method incorporates the concept of fuzzy clustering into the learning strategy of the selforganizing map. The number of nodes in the network is determined incrementally during the training. The advantage of the proposed strategy over other existing clustering techniques is its ability to determine network size and the number of clusters in data sets automatically. Since the methodology is based on learning, it is computationally less expensive, and the result is not affected by the initial guess. A data set with Gaussian distribution is used to illustrate this method, and results are compared with fuzzy C-mean clustering. Furthermore, the proposed strategy is applied for the fault detection and diagnosis of a twin-continuous tank reactor virtual plant. The result shows that this strategy can be used as a process-monitoring tool in an industrial environment. 1. Introduction The concept of pattern is universal in nature, and we always perceive these patterns in our day-to-day life. Pattern classification is the task of grouping objects together into classes according to their perceived likenesses or similarities. It is expected that objects within a class are more similar to each other than objects belonging to different classes. Depending on the training strategy, the pattern classification techniques/ methods are categorized in two groups: supervised and unsupervised. In supervised pattern classification, a priori information about the classes of the training patterns is available and the system is trained on the basis of this information. Once trained using the external help, the system goes freely to classify new patterns. However, it happens quite frequently that a priori information about the classes is not available and only the input/ feature patterns are available. Hence, the pattern classification system needs to learn without any external help. This type of patternclassificationistermedasunsupervisedpatternclassification. The area of unsupervised pattern classification techniques, which is most closely related to Kohonen’s self-organizing algorithm, is cluster analysis. Because of its importance, cluster analysis is an area of active research, and some of the texts related to different approaches to handle the problem of classification are by Kohonen,1,2 Looney,3 Bezdek4 and Dubes and Jain.5 Although Kohonen’s self-organizing map (SOM) is one of the most used methods, several limitations of this method are reported in the literature.6 One of the main limitations of Kohonen’s model is the predefined size of the network, and no specific guidelines are available that would allow choice of a suitable network size in advance.6 The result is highly dependent on the initial conditions,7 and the final weight vectors usually depend on the order of the input sequence.8 Several attempts have been made to overcome these limitations. In the growing cell structures algorithm proposed by Fritzke,6 the network size and structure are determined automatically. There are other frequently used clustering techniques such as C-mean clustering,9 distance threshold clustering,10 maximin-distance selforganizing algorithm,11 ISODATA,12 mean minimum distance * To whom correspondence should be addressed. Tel.: (225) 5781377. Fax: (225) 578-1476. E-mail: [email protected].

clustering,13 and the dog-rabbit strategy.14 These techniques are quite useful for disjoint partitioning of the data into a known number of classes. Also, many methodologies for classification are proposed by combining the idea of the fuzzy concept with classical clustering algorithms such as the fuzzy Kohonen clustering network15 and the fuzzy SOM.16–18 Baraldi and Blonda19,20 present a good review of existing fuzzy clustering algorithms. An ideal unsupervised pattern classification technique should be (i) independent of the order of input sequence, (ii) independent of initial conditions, (iii) able to determine the network size automatically, (iv) able to determine the number of classes/clusters automatically, and (v) able to consider the partial belongingness in the classes. We present a technique called the self-organizing self-clustering network, which completely addresses the above-mentioned issues. We have combined the concepts of fuzzy sets with some of the existing techniques with novel methods for learning and determination of the number of classes. It can be easily seen that fault diagnosis is also a pattern classification and recognition task. It is highly possible in a real plant that a large historical data set is available, but the information about the fault classes is not known. Under these circumstances, the task is to identify the group/class of input vectors that belong to different fault classes. This kind of fault diagnosis is part of unsupervised pattern classification. The self-organizing self-clustering network (SOSCN) discussed in this work can accomplish such tasks very well. This network can learn to detect regularities and correlations in their input and adapt future responses to that input accordingly. Apart from this, this network calculates the degree of membership of an incoming input vector with respect to the defined classes. The remaining part of the paper is organized as follows. We shall discuss the basics of unsupervised pattern classification in section 2. Different measures for performing unsupervised pattern classification are discussed in this section. Section 3 is dedicated to a detailed description of the proposed strategy. We shall prove the efficiency of the proposed strategy with an example and application of the strategy for fault diagnosis of a twin continuous stirred tank reactor in section 4, followed by a summary in section 5.

10.1021/ie071549a CCC: $40.75  2008 American Chemical Society Published on Web 05/24/2008

4210 Ind. Eng. Chem. Res., Vol. 47, No. 12, 2008

Figure 1. Automated pattern recognition and classification system. In (a) training mode and (b) operational mode.

Figure 2. Two-dimensional Gaussian membership function and (b) contour of membership function.

Figure 3. Learning function for different value of iterations (iteration ) 1, 2, 3, 4, 5, 10, and 50).

2. Unsupervised Pattern Classification Scheme In unsupervised pattern classification, a system performs both training and operational processes only on input feature vectors. The sample feature vectors are presented to the system in a sequence. The system based on certain criteria tries to accommodate the presented sample vector into a class already learned. If it does not succeed, it establishes a new class with a unique class identifier and adjusts the parameter to map the feature vector into that identifier. Therefore, an automated pattern recognition and classification system determines the system classes, which in a training mode contains (i) an input subsystem that accepts sample pattern vectors and (ii) a classification subsystem based on an algorithm to learn a set of population from a sample of training pattern vectors. In an operational mode, it contains a recognition system

that decides the classes to which an input pattern vector belongs. In other words, it partitions the population into subpopulations, which are the classes (Figure 1). It can be easily seen that the gap between clustering and unsupervised pattern classification is very narrow, and in this work, they should be considered the same unless otherwise stated. There are two main types of unsupervised classification approaches: (1) statistical classification and (2) neural net-based classification.10,20,21 Statistical classification is an established and classic approach of unsupervised pattern classification. It is essentially based on probabilistic models for feature vector distributions in the classes to derive classifying functions. This approach is suitable if the patterns are distributed in the feature space according to simple topologies and is preferable with

Ind. Eng. Chem. Res., Vol. 47, No. 12, 2008 4211

known probability distributions. Contrary to this approach, neural net-based classifiers are model free and are capable of adjusting to any desired output topology of classes in the feature space.22 Algorithms developed for classification need to classify patterns such that patterns in the same cluster are as alike as possible and patterns in different clusters are as dissimilar as possible. Hence, we need some kind of similarity measure (or dissimilarity measure). The similarity measure (or dissimilarity measure) in its numerical form indicates the natural association or degree of resemblance between patterns in the class, between a pattern and a class, and between classes. Many different functions have been suggested for similarity measures; we shall present a few which are the most commonly used. Euclidean Distance. The Euclidean distance is the simplest and most frequently used measure. Let Z ) [z(1), z(2), z(n)] be a set of n sample pattern vectors in k-dimensional feature space. The ith pattern, z(i), is denoted as a column vector

Figure 4. Data points generated in two-dimensional feature space using nngenc.

Figure 5. Generation of nodes among data points.

(i) (i) T z(i) ) (z(i) 1 , z2 , ..., zk ) ,

i ) 1, 2, ..., n

(1)

The Euclidean distance between pattern z(i) and z(j) is represented as d(i, j) )

[

k

∑ (z

(i) (j) 2 m - zm

)

m)1

]

1/2

) [(z(i) - z(j))T(z(i) - z(j))]1/2 (2)

A weighted Euclidean distance is used when the relative size of the dimension is insignificant. It is defined as dw(i, j) )

[

k

∑ R (z m

m)1

]

(i) (j) 2 m - zm )

1/2

(3)

where Rm is the mth weighting coefficient. Mahalanobis Distance. The squared Mahalanobis distance is another distance measure used in cluster analysis. It incorporates the correlation between feature vectors and standardizes

4212 Ind. Eng. Chem. Res., Vol. 47, No. 12, 2008

Figure 6. Plot of measure d versus epoch.

Figure 7. Classification of data points in six different classes.

each feature to zero mean and unit variance. The square Mahalanobis distance between pattern z(i) and z(j) is represented as d(i, j) )  (z(i) - z(j))TΣ-1(z(i) - z(j)) 

(4)

where Σ is the pooled sample covariance matrix. If Σ is the identity matrix, the squared Mahalanobis distance is the same as the squared Euclidean distance. Classical classification clustering algorithms generate classes such that each pattern belongs to only one class. However, often patterns cannot adequately be assigned strictly to one class. In these cases, fuzzy classification methods provide a better tool. The distance based similarity measures are not suitable for fuzzy classification. Rather, we need to use membership based similarity measures. A Gaussian membership function can be considered as one of the measures for fuzzy classification which is discussed in the next section.

3. Proposed Strategy Gaussian Membership Function. A one-dimensional Gaussian membership function is represented as

([

])

(u - c)2 (5) 2σ2 where F(u) is the membership value of the point u and c and σ are the center and spread, respectively. For k-dimensional feature space with a feature vector z ) (z1, z2, ..., zk), the membership value can be defined as F(u) ) exp -

k

µ(z) )

∏ i)1

( [ ])

Fi(zi) ) exp -

k

(zi - ci)2

i)1

2σ2i



(6)

where c ) (c1, c2, ck) is the center and σ ) (σ1, σ2, σk) is the spread of the k-dimensional Gaussian membership function. It is the product of the k one-dimensional Gaussian membership function for each coordinate of feature space. Figure 2 shows a

Ind. Eng. Chem. Res., Vol. 47, No. 12, 2008 4213

Figure 8. Result of fuzzy C mean clustering for eight clusters.

Figure 9. Result of fuzzy C mean clustering for six clusters.

two-dimensional Gaussian membership function and a contour of membership value. 3.1. Step 1. Creation of Nodes. Let S ) {z(1), z(2), z(n)} be a set of n sample feature vectors in a k-dimensional feature space. These are the feature vectors, which need to be classified (number of classes unknown). The process begins with randomly selecting a feature vector, say z(p). The first node characterized by a Gaussian membership function is generated. The values of the mean and spread of a node are assigned as follows:

cN1 ) z

(p)

(0) (0) σN1 ) (σ(0) 1 , σ2 , ..., σk )

(7)

where σj , j ) 1, 2, k, are prespecified constants.Movement of Nodes. Each feature vector is presented sequentially to the existing nodes. Let us say we present pattern p to the existing nodes (there will be only one node in the first iteration denoted by N1). The membership value also called firing strength or degree of inclusion of pattern p with node Nm is calculated using (0)

4214 Ind. Eng. Chem. Res., Vol. 47, No. 12, 2008

Figure 10. Process flow diagram of the twin CSTR virtual plant.

a new node. Otherwise, the center and spread of all the nodes is learned. This approach of learning is different from Kohonen’s SOM1,2 in which only the weights of the winning node and its neighbor are learned. In addition, in the case of Kohonen SOM, only the center of the clusters of the input vectors are learned, whereas, in this case both the center and the spread is learned. This learning is based on the evaluation function proposed by Nomura and Miyoshi17,18 and is expressed as k

E1(z(p), cNm) )



1 2 (cNm - z(p) j ) 2 j)1 j

(9)

k

E2(z(p), σNm) )



1 (((σNj m)2 - (cNj m1 - zNj m)2)m ) 1, 2, ..., c 2 j)1 (10)

Using eqs 9 and 10, the following update rule is obtained for the parameters by gradient descent: Figure 11. Software and hardware architecture for implementation.

eq 6 as follows:

( ∑[

µp,Nm(z(p)) ) exp -

k

N 2 (z(p) j - cj )

j)1

2σ2j

1

])

m ) 1, 2, ..., c

(8)

where c is the number of nodes in the current iteration. A new node is generated using eq 7 if µp,Nm < µ0; p ) 1, 2, ..., q; m ) 1, 2, ..., c where µ0 is the threshold to decide the condition to generate

∆cNj m ) γ(zpj - cNj m)

(11)

∆σNj m ) 2γσNj m((cNj m - zpj )2 - (σNj m)2)m ) 1, 2, ..., c

(12)

where γ is the learning parameter. The selection of the learning function is quite critical for effective classification and proper convergence of the algorithm. McKenzie and Alder14 proposed a strategy to move the node toward each data point according to the dynamics and using the ideas of lateral inhibition to move one node significantly more than others and the idea of habituation to stabilize the nodes when they move near clusters

Ind. Eng. Chem. Res., Vol. 47, No. 12, 2008 4215 Table 1. Nominal Value and Standard Deviation of Measured Variables variable number

nominal value

measured variable

1

cooling water flow rate (Fc,in) feed flow rate (Fin) outlet flow rate (Fout) liquid level in reactor (Lvl) inlet cooling water temperature (Tc,in) outlet cooling water temperature (Tc,out) reactor Temperature (T) input liquid temperature (Tin) concentration of component A in reactor (C_A)

2 3 4 5 6 7 8 9

standard deviation

0.11161 L/min

0.06

0.8468 L/min 0.8472 L/min 10% 298.01 K

0.008 0.0008 0.017 0.093

343.12 K

0.025

320.82 K 323.30 K

0.066 0.034

0.8 mol/L

0.0003 Figure 12. Plot of measure d versus epoch.

Table 2. Operating and Conditions Faults ID

operating condition

description

N

normal operating condition

F1 F2

F5

catalyst deactivation level set point change (+ve) level set point change (-ve) feed flow valve stiction (+ve) feed flow valve stiction (-ve) bias in reactor temp measurement (+ve) bias in reactor temp measurement (-ve) level controller failure

operation at the normal operating conditions, no disturbances activation energy increases setpoint change for the level of the reactor

F6

cooling coil fouling

F_2 F3 F_3 F4 F_4

dead band for the feed flow valve span the reactor temperature measurement has a bias the signal from level controller stays at its last value heat transfer coefficient decreases

Table 3. Change in Associated Variable for Different Levels of Deterioration fault ID

variable

level 1

F1 F2 F_2 F3 F_3 F4 F_4

E/R (%) Lvl (SP) Lvl (SP) dead band (%) dead band (%) T T

+2.0 +2.0 -2.0 +2.0 -2.0 +2.0 -2.0

Table 4. Input Parameters to the Self-Organizing Self-Clustering Network first network no. of epochs initial standard deviation (σ0) threshold for creation of new node (τ0) dmin

50 0.03 0.1 0.001

in the data. Different learning parameters were used for the closed node and all other neighboring nodes. The function proposed is distance based and is for the purpose of classical (or crisp) classification only. In our approach, we need to formulate a learning function, which can support the fuzzy classification. The function describing a learning parameter should (i) be nonnegative, (ii) tend to zero when the membership value of a pattern with a node tends to zero, and (iii) reach its maximum value of 1 when membership value is 1. Additionally, as the iterations increase, the movement of the node should be retarded so that the nodes have a tendency to stay close to their existing more familiar

neighborhood. We formulate a function which encompasses all the properties mentioned above. It is represented as γp,Nm(z(p)) )

µp,Nm(z(p))

(1 + exp(-µp,N (z(p))) * epoch) m

epoch ) 0, 1, ..., iter - 1 (13) Here iter is the iteration number. Figure 3 depicts a representation of the dynamics of this function with respect to iteration number. It can be noticed that the function value is low for the pattern, which has low membership value. In addition, the movements of the nodes are retarded as the iterations increase, which also helps in convergence of the algorithm. In addition, there is no additional parameter needed in this function, which is one of the advantages of the proposed strategy over existing learning functions. Lateral inhibition is another well-known phenomenon of biological neurons in which the movement of the nodes is inhibited except for that of the closest node. A function is required, which can map the membership function with a degree of inhibition, say R(µp,Nm). If the node Nm is close to the data point p but not the closest one, then two nodes are attracted toward the same data and hence Nm should not move far implying that R(µp,Nm) should be small, that is, R(µp,Nm) f 0 as Vp,Nm f 1. On the other hand, an outlying node should be attracted toward the data, that is, R(µp,Nm) f 1 as µp,Nm f 0. The Z membership function with parameters (0, 1) fulfills these criteria. It can be represented as R(µp,Nm) )

{

1 - 2(µp,Nm)2, 0 e µp,Nm e 0.5 2(1 - µp,Nm)2, 0 . 5 e µp,Nm e 1

(14)

Finally, presented with a pattern z(p), the new center and spread of the node Nm, m ) 1, 2, ..., c can be expressed as follows: For the closest node,

cNj m,new ) cNj m +

(

µp,Nm(z(p) j )

(1 + exp(-µp,Nm(z(p)j )) * epoch)

σNj m ) σNj m + 2σNj m

(

µp,Nm(z(p) j )

)

N (z(p) j - cj )

(1 + exp(-µp,N (z(p)j )) * epoch)

((cNj

m

m

- zpj )2 - (σNj m)2)

m

)

×

J ) 1, 2, ..., k For nodes other than the closest node,

(15)

4216 Ind. Eng. Chem. Res., Vol. 47, No. 12, 2008

Figure 13. Clustering of feature vectors in 10 different classes. Table 5. Center and Standard Deviation of the Classes center

standard deviation

class

comp 1

comp 2

comp 3

comp 1

comp 2

comp 3

1 2 3 4 5 6 7 8 9 10

-0.7792 -0.1160 -0.9618 -0.8772 -0.0380 -0.3584 0.6779 0.0669 0.6634 0.9777

-0.7821 -0.2296 -0.9177 -0.8702 0.0417 -0.2743 0.6796 0.0423 0.5228 0.9801

-0.8141 -0.0749 -0.9865 -0.9149 -0.1113 -0.4409 0.6249 0.0860 0.6924 0.9769

0.0222 0.0255 0.0578 0.0419 0.0198 0.0061 0.0571 0.1316 0.0804 0.0376

0.0215 0.0244 0.0980 0.0569 0.0186 0.0061 0.0528 0.1227 0.0770 0.0342

0.0189 0.0252 0.0264 0.0287 0.0202 0.0050 0.0587 0.1317 0.0777 0.0384

cNj m,new ) cNj m + R

(

(µp,N ) m

σNj m ) σNj m + 2R

µp,Nm(z(p) j )

(1 + exp(-µp,N (z(p)j )) * epoch) m

N (z(p) j - cj )

σNj m

(µp,Nm)

(

×

m

µp,Nm(z(p) j )

(1 + exp(-µp,N (z(p)j )) * epoch) m

cNj m - zpj 2 -

((

)

)

σNj m 2

(

of the similarity matrix will be the same as the number of nodes, and the diagonal entries are all 1s. Once the similarity matrix is generated, we use column searching to find out the number of classes and the nodes belonging to the same class. Column searching is a recursive process that comes back to the remaining row in each column after it has run out of rows for the current column. When no more columns remain for searching, all nodes have been assigned to the current class number. Next, the class number is incremented by one. We find a new node that has not been assigned and begin the column searching again on the column number for that node. The algorithm terminates when there are no more unassigned nodes. If P classes have been found, the center and spread of the classes can be calculated as n(p)

)

Cpj ) ×

)) J ) 1, 2, ..., k

(16)

Termination Criterion. There are two termination criteria for the algorithm: (i) maximum number of iterations, max_iter, is reached, and (ii) the total movements of the nodes are less than a user defined value, dmin. At the end, we will get a set of nodes distributed among the pattern. 3.2. Step 2. Determination of Classes. From the algorithm in part I, we shall get the membership value of each pattern with respect to each node. In this part of the algorithm, a graph similarity matrix M is created based on the following criteria. If µj,Ni(Nj) > τ, then m(i, j) ) 1; otherwise, m(i, j) ) 0. Thus, the similarity matrix contains 0s and 1s, which depend upon the threshold τ. It is clear that the number of rows and columns

(∑ ) n(p)

ξpj )



1 cm n(p) m)1 j

(17)

1/2

(σmj )2

m

J ) 1, 2, ..., k;

p ) 1, 2, ..., P

(18)

Here, n(p) is the number of nodes in class p, Cp and ξp are the center and spread of class p, respectively, and cm,p and σm,p represent the center and spread of the mth node in class p. 4. Results and Discussion 4.1. Example. To test the efficiency of the proposed method, a set of randomly clustered data points in two-dimensional feature space was generated using the function “nngenc” of MATLAB. The input parameters to this function are a matrix of cluster bounds, number of clusters, number of data points in each cluster, and standard deviation of clusters. We selected these values as {(0, 1), (0, 1)}, 8, 10, and 0.05, respectively. Figure 4 shows the data points generated under these conditions.

Ind. Eng. Chem. Res., Vol. 47, No. 12, 2008 4217

Figure 14. Validation data in feature space (scenario one).

Figure 15. Degree of membership of the validation data with respect to different classes (scenario one).

By visual inspection, we can notice that only six distinct clusters are generated and two clusters are overlapping. The proposed strategy started with assigning the value of some user defined parameters as follows: (0)

σ

) (0.05, 0.05) ;

µ0 ) 0.1;

τ ) 0.25;

max_iter ) 50; dmin ) 0.05

Figure 5 shows the distribution of the nodes among the data points in the feature space after the completion of the first part of the algorithm where 23 nodes were generated and properly distributed in the data clouds. The termination criteria were (i) epochs ) 50 or (ii) dmin < 0.05, whichever comes first. A plot of measure d with respect to epoch number is shown in Figure 6. It should be noted that

d is continuously decreasing and hence ensuring the convergence of the algorithm. Furthermore, part 2 of the algorithm was applied, and the result is shown in Figure 7. The proposed algorithm could classify the data points into six classes. All the points have partial membership with the classes; however, for display purposes, each is shown in the class for which its membership value is highest. The quality of the classification is similar to the one done by a human being. This algorithm was run 20 times with random organization of the feature vectors, and the result converged to similar classes at all times. Therefore, there is no effect of initialization on this algorithm. Although the classification result depends on parameters like σ(0), µ0, and τ, which need to be selected based on the requirements, it is very similar to other clustering methods like the graph theoretic algorithm, ISODATA, and so forth. To do a comparative study, similar data are fed to a fuzzy C mean clustering algorithm. The number of clusters needs to be specified in advance in this technique. We generated the result of classification both for eight clusters (value used for generating clusters in nngenc) and six clusters (visual inspection). The result for eight clusters is shown in Figure 8 . It is clear that the fuzzy C mean algorithm failed to separate the classes completely. The fuzzy C mean clustering for six clusters were run repeatedly to the same data points four times, and the result is shown in Figure 9 a-d. Two points need to be noted. First, the FCM algorithm failed to classify the classes correctly in all cases, and second, the fact that every time the algorithm converged to a different result shows that the algorithm is highly dependent on the initial conditions. In both these respects the proposed algorithm shows superior results compared to the fuzzy C mean algorithm. 4.2. Twin Continuous Stirred Tank Reactor. To test the overall strategy for process monitoring, a twin continuous stirred tank reactor (CSTR) virtual plant is developed. It contains two CSTRs, a mixer, a feed tank, and a number of heat exchangers. Material from the feed tank is heated before being fed to the first reactor and the mixer. The effluent from the first reactor is then mixed with the material in the mixer before being fed to

4218 Ind. Eng. Chem. Res., Vol. 47, No. 12, 2008

Figure 16. Validation data in feature space (scenario two).

Figure 17. Degree of membership of the validation data with respect to different classes (scenario two).

the second reactor (Figure 10). The level and concentration of the reactors are controlled using feedback controllers. The virtual plant is controlled using Schneider Concept PLCSIM32. This is 32-bit simulator which can simulate any PLC unit (Quantum, Compact, Momentum, and Atrium) and its signal states. Since the equation governing the operations of the reactors, mixer, feed tank, heat exchangers, splitters, and junctions are nonlinear differential and algebraic equations, it is implemented in the Concept Programming Unit. The user interfaces for this plant is developed in Citect environment. It resides in another process computer and uses TCP/IP protocol to communicate with Schneider Concept PLCSim32. The algorithm for pattern classification and process monitoring is developed in Matlab, running on a separate computer. Dynamic data exchange (DDE) facilitates real-time data communication between Citect and

Matlab, and hence code written in Matlab is executed to do real-time monitoring. The complete setup is shown in Figure 11. Each CSTR consists of a reaction vessel, a cooling coil, and a stirrer. The measurement variables consist of the volume flow rates, concentration, and temperature of the streams and the temperature, concentration, and level of vessels. To make the virtual plant more realistic a random noise is added to all the measurements. The plant was running continuously, and samples of nine measurement variables related to the first CSTR (Fin (feed flow rate), Tin (temperature), Fc (volume flow rate of incoming cooling stream), Fout (flow rate of output stream), T (temperature of the reactor), C_A (concentration of component A in reactor), Lvl (Level), Tci (temperature of incoming cooling stream), and Tco (temperature of outgoing cooling stream)) at 5 s intervals were recorded. The nominal value and standard deviation of each measurement variables under normal operating condition are shown in Table 1. The historical database generated using virtual plant for the present case study is designed to include both normal operating periods and a wide variety of abnormal situations or “faults”. The 10 operating conditions (normal and faults) are described in Table 2. The operating conditions in Table 2 include a wide range of disturbance and fault types that are normally encountered in a typical historical database.23 The fault conditions include disturbances (step change in Tc,in), instrumentation faults (bias in reactor temperature measurement), equipment failure (valve stiction, catalyst deactivation, cooling coil fouling), process upset (level set point changes, concentration set point changes), and controller failure (level controller failure, concentration controller failure). Table 3 shows the amount of changes induced in a variable to simulate faults. These 1000 input training vectors (100 vectors for 10 different operating conditions) were considered for training of SOSCN. It is assumed that the fault classes are not known in advance and only the input vectors are available. These vectors were projected to a lower dimensional feature space using the RBFPL algorithm.24 Three components could explain more than 88% of the variance of the input vectors. Therefore, feature vectors

Ind. Eng. Chem. Res., Vol. 47, No. 12, 2008 4219

corresponding to each of the input vectors in the threedimensional feature space were calculated. These feature vectors were presented to SOSCN for training. Table 4 shows the input parameters provided for the training of SOSCN. After training, 70 nodes were created and properly distributed in the data cloud. The termination criteria were (1) epoch ) 50 or (2) dmin e 0.001. A plot of measure d with respect to epoch numbers is shown in Figure 12. It should be noted that the value of d is continuously decreasing, hence ensuring the convergence of the algorithm. Furthermore, part 2 of the algorithm was applied, and the result is shown in Figure 13. The proposed algorithm could classify the feature vectors into 10 classes. The center and standard deviation of all these classes are shown in Table 5. The network is now ready to be used for the fault diagnosis task. We carried out the simulations of the plant in two different scenarios. In each of these scenarios, the plant was run for 500 s (100 samples at 5 s) under normal conditions followed by introduction of an abnormality in the process. The data under abnormal conditions were collected for 1500 s (300 samples at 5 s intervals). These data were projected to the feature space which also contains the classes determined by SOSCN. The result for scenario one is shown in Figure 14 where the pink points represent the validation data. The degree of membership of these data with respect to different classes is shown in Figure 15. The class for which the degree of membership is negligible is not shown in the figure. Figures 16 and 17 present the validation data in the feature space and the degree of membership of these data with respect to different classes in scenario two, respectively. The simulated conditions in scenarios one and two were similar to the conditions of the data in class 4 and class 8, respectively. These results demonstrate that the network correctly diagnosed the cause of the faults. The classes determined by the methodology can be leveled based on the experience of the operator and based on the conditions of the plant when the historical data were collected. 5. Conclusion In this paper, a novel methodology for pattern classification based on unsupervised learning was proposed. The size of the network was determined incrementally during the training. Each node of the network was described by a multidimensional Gaussian function. Each data point in the feature space had a membership degree between zero and one to the nodes. Similar feature vectors had a high membership value for the nodes, which were closer and vice versa. The distribution of the nodes in the classification space was closely related to the probability density of the feature space. The region where the probability distribution was high had a higher number of nodes compared to the lower density region. Further, the nodes were classified using a novel fuzzy approach. The advantage of the proposed method over existing clustering techniques is its ability to determine the number of nodes and the number of clusters automatically. Since the methodology is based on learning, it is computationally less expensive and the result is not affected by the initial guesses. The proposed strategy was compared with the fuzzy C mean algorithm, and the results showed that the

proposed algorithm was superior for classification. The methodology is quite generic in nature and can be used for pattern classification, clustering, and data visualization. Furthermore, this strategy is applied successfully for fault diagnosis of a chemical virtual plant. It is demonstrated that the proposed methodology can be used as a tool for process monitoring in a process plant. Literature Cited (1) Kohonen, T. Self-organization and associatiVe memory; Springer: Berlin, 1989. (2) Kohonen, T. Self-organizing maps; Springer: Berlin, 1997. (3) Looney, C. G. Pattern recognition using neural network; Oxford University Press: New York, 1996. (4) Bezdek, J. C. Pattern recognition with fuzzy objectiVe function algorithms; Plenum: New York, 1981. (5) Dubes, R.; Jain, A. Algorithms that cluster data; Prentice-Hall: Englewood Cliffs, NJ, 1988. (6) Fritzke, B. Growing cell structures-A self-organizing network for unsupervised and supervised learning. Neural Networks 1994, 7, 1441– 1460. (7) Tou, J.; Gonzalez, R. Pattern recognition principles; AddisonWesley: Reading, MA, 1974. (8) Duda, R.; Hart, P. Pattern classification and scene analysis; Wiley: New York, 1973. (9) Macqueen, J. Some methods for classification and analysis of multivariate data. 5th Berkley Symposium of Probability and Statistics; University of California Press: Berkley, 1967. (10) Bow, S.-T. Pattern recognition: Application to large data-set problems; Marcel Dekker: New York, 1984. (11) Batchelor, B. G.; Wilkins, B. R. Methods for location of clusters of patterns to initialize a learning machine. Electron. Lett. 1969, 5, 481– 483. (12) Ball, G. H.; Hall, D. J. ISODATA: an iterative method of multivariate data analysis and pattern classification. Presented at IEEE International Communications Conference, Philadelphia, 1966. (13) Yin, P.-Y.; Chen, L.-H. A new non-iterative approach for clustering. Pattern Recognit. Lett. 1994, 15, 125–133. (14) Mckenzie, P.; Alder, M. Unsupervised learning: the dog-rabbit strategy. Int. Conf. Neural Networks 1994, 2, 616–621. (15) Tsao, E. C.-K.; Bezdek, J. C.; Pal, N. R. Fuzzy Kohonen clustering networks. Pattern Recognit. 1994, 22, 757–764. (16) Vuorimaa, P. Fuzzy self-organizing map. Fuzzy Sets and Systems 1994, 66, 223–231. (17) Nomura, T.; Miyoshi, T. An adaptive rule extraction with the fuzzy self-organizing map and a comparison with other methods. Proc. ISUMANAFIPS’95, Maryland 1995, 311, 316. (18) Nomura, T.; Miyoshi, T. An adaptive fuzzy rule extraction using hybrid model of the fuzzy self-organizing map and the genetic algorithm with numerical chromosomes. Journal of Intelligent and Fuzzy Systems 1998, 6, 39–52. (19) Baraldi, A.; Blonda, P. A survey of fuzzy clustering algorithms for pattern recognition-part I. IEEE Transactions on Systems, Man and Cybernetics, Part B 1999a, 29, 778–785. (20) Baraldi, A.; Blonda, P. A survey of fuzzy clustering algorithms for pattern recognition-part II. IEEE Transactions on Systems, Man and Cybernetics, Part B 1999b, 29, 786–801. (21) Fu, K. S.; Yu, T. S. Statistical pattern classification using contextual information; Research Studies Press: Chichester, U.K., 1980. (22) Fausett, L. V. Fundamentals of neural networks: architectures, algorithms, and applications; Prentice-Hall: Englewood Cliffs, NJ, 2002. (23) Johannesmeyer, M. C.; Singhal, A.; Seborg, D. E. Pattern matching in historical data. AIChE J. 2002, 48, 2022–2038. (24) Bhushan, B.; Romagnoli, J. A. Strategy for process monitoring based on Radial Basis Function network and Polygonal Line Algorithm. Ind. Eng. Chem. Res. 2007, 46, 5131–5140.

ReceiVed for reView November 14, 2007 ReVised manuscript receiVed March 2, 2008 Accepted March 3, 2008 IE071549A