Ind. Eng. Chem. Res. 1999, 38, 4345-4358
4345
Combining Conceptual Clustering and Principal Component Analysis for State Space Based Process Monitoring X. Z. Wang* and R. F. Li Department of Chemical Engineering, The University of Leeds, Leeds LS2 9JT, U.K.
Multivariate statistics and unsupervised machine learning have recently been studied by many researchers for process monitoring and fault diagnosis. These approaches often depend on calculating a similarity or distance measure to group data sets into clusters. Apart from giving predictions, they are not able to give causal explanations on why a specific set of data is assigned to a particular cluster. In this work, a conceptual clustering approach is presented for designing state space based monitoring systems, which is able to generate conceptual knowledge on the major variables which are responsible for clustering, as well as projecting the operation to a specific operational state. A critical issue in this approach is how to conceptually represent dynamic trend signals. For this purpose, principal component analysis is used for concept extraction from real-time dynamic trend signals. The method is introduced using a continuous stirred tank reactor as a case study. Application of the approach to a refinery methyl tert-butyl ether process is also presented. Introduction Being able to collect and display to operators a large amount of information has been regarded as one of the major advances provided in distributed control systems (DCS) over earlier analogue and direct digital control.1 For example, a typical olefin plant has over 5000 measurements to be monitored including 600 trend diagrams.2 As a consequence, supervisors and operators, as part of the overall control system, are responsible not only for many feedback control tasks that are not automated, such as switching from one feedback to another, but probably more importantly also for developing an understanding of the plant operations. This understanding can be used to3 identify problems in the current operation; identify deteriorating performance in instruments, energy usage, equipment, or catalysts; and identify better operating regions leading to improved product or operating efficiency. In other words, in many occasions operators are more concerned with the current status and evolving patterns of behavior than with the values of specific variables. Apparently this requires operators to be able to not only access the data timely but more importantly to assimilate the large volume of data quickly and correctly, especially when abnormal operations occur. This is a very challenging task due to a number of reasons. First, the data volume is too large. Second, the data are multidimensional and it is well-known that human beings are not good at analyzing problems involving more than three dimensions. Third, the variables are interrelated and therefore need to be considered simultaneously in the analysis. Other factors include noise and uncertainty as well as dynamics in the data. Recently there has been a significant progress in automating data analysis for process monitoring and fault diagnosis. The methods can be roughly divided into two categories: supervised and unsupervised. Supervised techniques are associated with assignment or mapping of a set of data to previously known classes * Corresponding author. E-mail:
[email protected].
according to a distance or similarity measure. Supervised methods need a large number of data sets with known classes as training data to train the models. A typical example of supervised learning is the feedforward neural network that has been widely studied. Though supervised methods can generally give accurate results, they are not applicable when training data are not available. In this situation, unsupervised approaches have to be used. Distinguished from supervised approaches that learn from known to predict unknown, unsupervised approaches can learn from unknown: automatically grouping data sets into classes in a way that intraclass similarity is high and interclass similarity is low. Representative unsupervised learning methods which have been studied for operational state identification include linear and nonlinear principal component analysis,4-8,11 adaptive resonance theory (ART2),9-11 and Bayesian automatic classification (AutoClass).11 In the above-mentioned supervised and unsupervised approaches, the notion of similarity is fundamental and is often determined by calculating a distance or similarity measure; therefore they are often called similarityor distance-based clustering. A major limitation of similarity- or distance-based clustering is that it gives predictions but not causal and qualitative explanations. This means that in process operational decision support it is not able to indicate to operators what variables are responsible for the observed operational state, and so provides no clues for operational adjustment. Conceptual vs Distance- or Similarity-Based Clustering. In this work we present a conceptual clustering approach for designing process monitoring systems, which is able to not only project a set of data concerned with the operation of the process over a specific period of time to a point in the operational state space, but also generate causal knowledge indicating the variables that are responsible for the projection. The causal knowledge can be used by operators to adjust the relevant variables. Designing process monitoring systems using conceptual clustering can be illustrated
10.1021/ie990144q CCC: $18.00 © 1999 American Chemical Society Published on Web 10/14/1999
4346
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999
The Conceptual Clustering Approach
Figure 1. Illustration of state space based operational plane.
using Figure 1. It indicates that a conceptual clustering system for process monitoring has the following functions. First, it is able to automatically assimilate the realtime measurements, project the process to a specific state, and indicate the move of the operating point in real time. It is known that a process or unit operation can be operated at multiple steady modes and abnormal states. The most well-known example that a process can be operated in multiple steady states is the exothermic continuous stirred tank reactor (CSTR).12,13 Multiplesteady-state behavior has also been found in distillation columns,14 reactive distillation processes,15 and refinery fluid catalytic cracking processes.16 These steady-state operational states are not obvious without careful data analysis. Abnormal operations can be in various forms and are more difficult to predict. Nevertheless, with the accumulation of knowledge we now know more about the operational behavior of various unit operations than before. Second, a conceptual clustering system for process monitoring is able to identify various new operational states, either normal or abnormal. Third, it should be able to give information about the most important variables that are responsible for the movement of operational points and in such a way provide guidelines for operators to adjust the process. It is this last capability that makes conceptual clustering distinguished from other approaches that depend on calculating distance or similarity measures. The rest of the paper is organized as follows. The conceptual clustering approach used in this study is first briefly introduced. In applying the approach, a critical step is in dealing with dynamic trend signals because the conceptual clustering technique can only handle discrete attributes effectively. An approach is therefore proposed in the subsequent section for concept formation from dynamic trend signals using principal component analysis (PCA). The overall procedure is then illustrated by reference to a case study of a small-scale process, the continuous stirred tank reactor (CSTR). In a separate section, application of the approach to a refinery methyl tert-butyl ether (MTBE) is presented to demonstrate the applicability of the approach to more complex processes.
The conceptual clustering approach used here is based on inductive learning which attempts to acquire a conceptual language for describing an object by drawing inductive inference from observations. The focus is on deriving rules or decision trees from unordered sets of examples, especially attribute-based induction, a formalism where examples are described in terms of a fixed collection of attributes. Inductive learning distinguishes from such learning methods as feedforward neural networks which learn to develop implicit rather than explicit and transparent rules or decision trees. It is relatively easier for human experts to document cases than for them to articulate their expertise explicitly and clearly. Several approaches to inductive learning have been proposed, and the most successful one was developed by Quinlan.17,18 Given a database of objects (or in other words data sets) that are described in terms of a collection of attributes, each attribute measures some important feature of an object, and each object belongs to one of a set of mutually exclusive classes, the task is to develop a classification rule that can determine the class of any object from its values of the attributes. The decision tree generated can be used for conceptual clustering. The procedure is iterative, which can be summarized as follows:17,18 1. Select a random subset of the given training examples (called the window) 2. Repeat (a) develop a decision tree which correctly classifies all objects in the window (b) find exceptions to this decision tree in the remaining examples (c) form a new window by adding the incorrectly classified objects to the window Until there are no exceptions to the decision tree The crux of the problem is how to develop a decision tree for an arbitrary collection of objects in the window. To form a decision tree is to select the root attribute. Assume that there are only two classes in all the data, P and N (although extension to any number of classes is not difficult). The method to find the root attribute is adopted from an information-based method that depends on two assumptions. Suppose the window C contains p objects of class P and n objects of class N. The assumptions are the following: 1. Any correct decision tree for the window C will classify objects in the same proportion as their representation in C. An arbitrary object will be determined to belong to class P with probability p/(p + n) and to class N with probability n/(p + n). 2. When a decision tree is used to classify an object, it returns a class. A decision tree can thus be regarded as a source of message “P” or “N” with the expected information needed to generate this message given by
I(p,n) ) -
p p+n
log2
p p+n
-
n p+n
log2
n p+n
(1)
If attribute A with values {A1, A2, ..., Av} is used for the root of the decision tree, it will partition the window C into {C1, C2, ..., Cv}, where Ci contains those objects in C that have values Ai of A. Suppose Ci contains pi objects of class P and ni of class N. The expected information required for the subtree for Ci is I(pi,ni). The expected information required for the tree with A as root is then obtained as the weighted average
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999 4347 v
E(A) )
pi + ni
I(pi,ni) ∑ i)1 p + n
(2)
where the weight for the ith branch is the proportion of the objects in C that belong to Ci. The information gained by branching on A is therefore
gain(A) ) I(p,n) - E(A)
(3)
The approach calculates the gain for all attributes and chooses the attribute having the biggest gain as root node. The root node will have as many branches as its values. The branches will divide the database into a number of subsets. For each subset, we develop the root node following the same criteria. The approach has been implemented in a commercial software package C5.0,19 which has evolved from its early versions C4.517 and ID3.18 A major limitation of ID3 was that it assumed that the values of all attributes are discrete, for instance color is red or green. Though C4.5 was claimed to be able to deal with continuousvalued attributes, it is still weaker compared with dealing with discrete-valued attributes, as indicated by Quinlan.20 Though Quinlan20 made a further effort to improve the method for dealing with continuous-valued attributes, our study showed that it is still not very satisfactory. Nevertheless C5.0 has become one of the most well-known tools for data mining and knowledge discovery, especially in domains only concerning discrete values. In process operational state identification, we need to deal with variables whose values are dynamic trend signals. This means that the value of a variable for a specific set of data is a dynamic trend. Apparently, when a variable takes such dynamic trends as the values, no approach is available for transforming a trend into a concept. Existing approaches for concept formation are based on discretization and are therefore not able to deal with trend signals. Concept Formation from Dynamic Trend Signals Using PCA In DCS systems, nearly all important process variables are recorded as dynamic trends. A dynamic trend is the visualization of a variable’s changing trajectory over time and consists of several tens of sampling values. However, to make effective use of trends in a computer system, it is required to compress the dynamic trend data and to use reduced dimensions to represent the trend features. The earliest attempt to deal with trends in the real-time expert system G221 was to use qualitative expressions such as increase and decrease. Later there were various approaches developed including the episode approach,22,23 neural networks,9 and more recently wavelets.11,24-26 Though these approaches are able to capture the feature of a dynamic trend in a reduced dimensional space and remove noise, none can be used for concept formation. In this work we propose to use principal component analysis for concept formation. The method of PCA was developed about 100 years ago,27,28 but has now reemerged as an important technique in data analysis. The recent interest in PCA in process industries has been on developing multivariate process monitoring systems.7,8,11,25,29-33 The central idea of PCA is to reduce the dimensionality of a data set
Figure 2. The CSTR reactor.
which consists of a large number of interrelated variables, while retaining as much as possible of the variation present in the data set. This is achieved by transforming a new set of variables, the principal components (PCs), which are uncorrelated, and which are ordered so that the first few retain most of the variation present in all of the original variables. Given a vector X of n dimensions,
X ) [x1, x2, ..., xn]T
(4)
whose mean vector M and covariance C are described by
M ) E(X) ) [m1, m2, ..., mn]T
(5)
C ) E[(X - M)(X - M)T]
(6)
Calculate the eigenvalues λ1, λ2, ..., λn, and eigenvectors P1, P2, ..., Pn; arrange them according to their magnitude.
λ1 g λ2 g ... g λn Select d eigenvectors to represent the n variables, d < n. Then the P1, P2, ..., Pd are called the principal components. PCA for data preprocessing here is for the following purposes: concept formation for subsequent conceptual clustering, noise removal, and reduction in data dimensionality. The above purposes are achieved by processing the dynamic trend signals of a variable for all data sets using PCA and plotting the first two PCs in a twodimensional plane. To illustrate the approach more clearly, we introduce a case study of CSTR. A single, nonisothermal continuous stirred-tank chemical reactor (CSTR) is shown in Figure 2. A single reaction A f B takes place in the reactor. Detailed description and parameter values can be found in the book by Marlin13 and therefore are not described here. A dynamic simulator was developed for the CSTR, which has included three controllers as shown in Figure 2. To generate a data set or data case, run the simulator at steady state and introduce a disturbance or fault and at the same time start to record the dynamic responses. Eighty-five data sets were generated, which are summarized in Table 1. For each data set, the eight variables shown in Table 2 were recorded, including Fi, Ti, Ci, Twi, Fw, TR, Co, and L. In each data set, each variable was recorded as a dynamic trend consisting of 150 sampling points. Therefore, for each variable the data size is a matrix 85 (the number of data sets) × 150 (the number data points representing a dynamic trend).
4348
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999
Table 1. Data Used for the CSTR Case Study data sets
data detail
1-11 12-15 16-20 21-24 25-31 32-36 37-39 40-43 44-46 47-52 53-56 57-60 61-66 67-70 71-80 81-85
all control loops at AUTO and S.P. of TR ) 350 K. Change Ti (K). all control loops at AUTO and S.P. of TR ) 350 K. Change Ci (kmol/m3). all control loops at AUTO and S.P. of TR ) 350 K. Change Fi (m3 /min). all control loops at AUTO and S.P. of TR ) 350 K. Change L (%). all control loops at AUTO and S.P. of TR ) 405 K. Change Ti (K). all control loops at AUTO and S.P. of TR ) 405 K. Change Twi (K). all control loops at AUTO and S.P. of TR ) 405 K. Change Ci (kmol/m3). all control loops at AUTO and S.P. of TR ) 405 K. Change Fi (m3 /min). all control loops at AUTO and S.P. of TR ) 405 K. Change L (%) all control loops at AUTO and S.P. of TR ) 380 K. Change Ti (K). all control loops at AUTO and S.P. of TR ) 380 K. Change Twi (K). all control loops at AUTO and S.P. of TR ) 380 K. Change Ci (kmol/m3). all control loops at AUTO and S.P. of TR ) 380 K. Change Fi (m3 /min). all control loops at AUTO and S.P. of TR ) 380 K. Change the S.P. of L (%). all other control loops at AUTO and S.P. of TR ) 380 K. Change the output of the CSTR level controller (%). all other control loops at AUTO. Change the output of the controller TR (%).
Table 2. First Five Principal Components of the Variables first five principal components variables
PC-1
PC-2
PC-3
PC-4
PC-5
Fi Ti Ci Twi Fw TR Fo L
71.9486 93.3967 88.5149 86.1071 97.8797 94.8269 98.9227 98.9645
1.5598 0.6440 0.5818 0.7656 1.2681 3.5246 0.4522 0.1589
0.6416 0.1880 0.2853 0.3325 0.4335 0.4958 0.2581 0.0490
0.6014 0.1344 0.2517 0.3085 0.1216 0.0958 0.0363 0.0399
0.5937 0.1323 0.2386 0.2890 0.0690 0.0559 0.0180 0.0189
Figure 4. PCA two-dimensional plot of Fo.
Figure 3. PCA two-dimensional plot of TR.
PCA is applied to such a matrix of each variable for concept extraction. Table 2 gives the eigenvalues of the first five PCs for each variable and shows that the first two PCs can capture most of the information. Therefore, we can use only two components to replace the dynamic trends for subsequent conceptual clustering. Figure 3 shows the plotting of the first two principal components of the reactor temperature TR. It clearly shows that the dynamic trends of TR for the 85 data sets are grouped into four groups, namely A, B, C, and D. The plotting of the first two PCs in a two-dimensional plane, such as Figure 3, is called concept formation. If the dynamic trend of TR falls within the region B of Figure 3, we define TR ) B. This means that the dynamic trends of the variable TR are conceptualized into a value space of four, A, B, C, and D. The advantage of the conceptualization becomes clearer when the conceptual clustering procedure is illustrated in the next section. For some variables the grouping is clear, such as in Figures 3 and 4. For others it is not so clear, but it is
Figure 5. PCA two-dimensional plot of Fw.
still possible to make the grouping, as demonstrated in Figure 5 for the cooling water flow rate Fw. The plotting for the other variables including Fi, Ti, Ci, Twi, and L is shown in Figure 6. Identification of Operational States. The next step is identification of operational states. In this case study, PCA is used again because there are only eight variables. For more complex processes, more sophisticated approaches need to be used as will be described later. The first two PCs of the eight variables, i.e., TR, Fo, Fw, Fi, Ti, Ci, Twi, and L, are used for further analysis using PCA. The result is shown in Figure 7, which plots the first two PCs. The five groups imply that the 85 data cases can be classified into 5 classes
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999 4349
Figure 6. PCA two-dimensional plots of variables for the CSTR.
corresponding to 5 different operational modes. Examination of the clusters found that they are reasonable. Application to the CSTR Case Study Once the dynamic trend signals are conceptualized and operational states identified, the next step is to learn to generate knowledge correlating variables and operational states. This requires generating a file as
shown in Table 3. In fact, each data set in Table 3 can be interpreted as a production rule. For example, the first case is equivalent to the following rule: IF PC-L ) C in Figure 6e AND PC-TR ) D in Figure 3 AND PC-Fo ) A in Figure 4 AND PC-Fw ) D in Figure 5 AND PC-Twi ) B in Figure 6d AND PC-Ci ) A in Figure 6c
4350
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999
Table 3. Data Structure Used by the Conceptual Clustering Tool PC•L
PC•TR
PC•Fo
PC•Fw
PC•Twi
PC•Ci
PC•Ti
PC•Fi
states
C C ‚‚‚ A A ‚‚‚
D D ‚‚‚ C C ‚‚‚
A A ‚‚‚ D D ‚‚‚
D D ‚‚‚ A A ‚‚‚
B B ‚‚‚ B B ‚‚‚
A A ‚‚‚ A A ‚‚‚
D E ‚‚‚ C C ‚‚‚
B B ‚‚‚ B B ‚‚‚
NOR1 NOR1 ‚‚‚ ABN1 ABN1 ‚‚‚
variable name
value space
figures
PC•L PC•Fo PC•Fw PC•Twi PC•TR PC•Ci PC•Ti PC•Fi states
[A, B, C, D] [A, B, C, D] [A, B, C, D] [A, B, C, D] [A, B, C, D] [A, B, C] [A, B, C, D, E] [A, B, C, D] [NOR1, NOR2, NOR3, ABN1, ABN2]
Figure 6e Figure 4 Figure 5 Figure 6d Figure 3 Figure 6c Figure 6b Figure 6a Figure 7
Figure 7. PCA two-dimensional plot of the CSTR operational states.
Figure 8. Decision tree developed for the CSTR after conceptualization of the variable trends.
AND PC-Ti ) D in Figure 6b AND PC-Fi ) B in Figure 6a THEN states ) NOR1 in Figure 7 Obviously this is simply an explanation of the databases and the decision tree developed will be very complex. The C5.0 approach is able to develop a simpler tree. A simple tree is preferable because it can usually perform better than a complex tree for data cases outside the training data set. The decision tree developed for the CSTR case study is shown in Figure 8. The decision tree can be converted to production rules as shown in Table 4. The C5.0 has identified the reactor temperature as the root node. It states that if TR is in the region of A, B, or D of Figure 3, then the operation will be in the region ABN2 (abnormal mode 2), NOR2 (normal operation mode 2), or NOR1 (normal operation mode 1) of Figure 7. If TR is in the region C of Figure 3, then there are two situations depending on Fo. If Fo is in the region D of Figure 4, then the operation will be ABN1 (abnormal operation 1); if Fo is in A or B of Figure 4, then the operation will be NOR3 (normal operation 3). The result effectively states that it is possible to focus on monitoring TR in Figure 3. If TR is in region C, then Fo in Figure 4 should be monitored. This also shows what variables are responsible for the location if the operation is in a specific region of Figure 7.
Table 4. Production Rules Converted from the Decision Tree of Figure 8 rule 1: IF TR ) A in Figure 3 THEN operational state ) ABN 2 in Figure 7 rule 2: IF TR ) B in Figure 3 THEN operational state ) NOR 2 in Figure 7 rule 3: IF TR ) C in Figure 3 AND Fo ) A or B in Figure 4 THEN operational state ) NOR 3 in Figure 7 rule 4: IF TR ) C in Figure 3 AND Fo ) D in Figure 4 THEN operational state ) ABN 1 in Figure 7 rule 5: IF TR ) D in Figure 3 THEN operational state ) NOR 1 in Figure 7
The decision tree shown in Figure 8 and the rules in Table 4 provide the knowledge for clustering the operation. It is worth mentioning that Bakshi and Stephanopoulos34,35 also investigated the application of inductive learning for automatic decision tree generation from data. However, the approach was only used for discretevalued variables. For example, they presented a case study in which pressure takes values from N (normal), H (high), and L (low), temperature takes values from a discrete valued space [92, 95, 99, 104, 105, 106, 108], colors from N (normal) and A (abnormal), and product quality from G (good) and B (bad). It is apparent that the approach addressed in this study is different be-
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999 4351
Figure 9. Simplified flow sheet of the MTBE process.
cause in on-line monitoring each variable takes values of dynamic trends. Application to the Refinery MTBE Process In the above sections the conceptual clustering approach has been illustrated using a CSTR case study. In this section we apply the approach to a more complicated case study, the refinery methyl tert-butyl ether (MTBE) process. MTBE is an important industrial chemical because large quantities of MTBE are now required as an octane booster in gasoline to replace tetraethyllead. MTBE ((CH3)3COCH3) is produced from the reaction between isobutene (CH3)2CdCH2) and methanol (CH3OH). The principal flow sheet of the process is shown in Figure 9. It mainly consists of a catalytic reactor R201 in which the desired reactions take place, a reactive distillation column C201 which separates the MTBE at the bottom and allows the remaining reactants to complete the reactions, as well as a water extraction column C202 and a methanol distillation column C203 for methanol recovery. In this study only the section of feed, reactor R201, and reactive distillation column C201 are considered. Columns C202 and C203 are beyond the discussion. The reactions mainly take place in the reactor R201. Unreacted reactants will continue the reactions on certain trays of the distillation column C201. A small amount of extra methanol is fed to the column via pump P208, valves Sw1 and HC211D, and tank D211, to support the reactions. In practice, there is a small amount of recovered methanol recycled from column C203 to the methanol feed, but that is not considered in this study. Apart from the control loops related to RC, FC201D, and FC202D, all other controllers are conventional feedback and cascade arrangements. RC is a ratio control arrangement that controls the ratio of the two feedstream flow rates.
The process was built in 1992. Before start-up, a dynamic simulator was developed which was used initially for training operators in start-up of the process. Later the model was updated to study various operational strategies. The current version of the simulator is able to simulate various start-up, shutdown, and normal operations and the dynamic behavior under disturbances and faults. Random noises are introduced to emulate the real scenario. The high fidelity and flexibility of the simulator provides a useful test bed for process monitoring system development. In this case study, a database of 100 sets was obtained by carrying out various tests on the simulator. The data are summarized in Table 5. Thirty-six data sets correspond to various operations which are regarded as abnormal or under significant disturbances. The rest are considered as under normal operations. We restricted ourselves to studying 100 data sets since we have found that increasing the size of the data set by including more normal operational data does not make a difference in the result. A limited data size eases the analysis and presentation of the result. Each data set consists of twenty-one variables which are listed in Table 6. Each variable represents a dynamic trend consisting of 256 sampling points. Therefore the size of the data to be analyzed is 100 × 21 × 256. Concept Formation from Dynamic Trends Using PCA. Early discussion has indicated that with two PCs it is possible to capture most of the feature of a dynamic trend. To further demonstrate this, we examine the example shown in Figures 10 and 11. Figure 10 shows the dynamic trends of the temperature of MTBE stream leaving the bottom of column C201 (T_MTBE), for data sets 1, 2, 14, 15, 16, 40, and 80. Figure 11 shows on the PCA two-dimensional plane the groupings, i.e., [cases 1 and 2], [cases 40 and 80], [case 16], and [cases 14 and
4352
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999
Table 5. Data Sets Analyzed for the MTBE Process case detail
cases 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37-100
control valve of LC201D pump P201 switch valve SW3 pump P202 switch valve SW2 manual valve HC202D control valve of FC202D control valve of FC202D control valve of FC202D control valve of FC202D control valve of FC202D pump P208 switch valve SW1 manual valve HC211D manual valve HC211D manual valve HC211D manual valve HC211D manual valve HC211D control valve of TC201R control valve of TC201R control valve of TC201R control valve of TC201R control valve of FC203E control valve of FC203E control valve of FC203E control valve of FC203E control valve of FC203E control valve of FC203E control valve of FC201C control valve of FC201C control valve of FC201C control valve of FC201C control valve of FC202D control valve of FC202D control valve of FC202D control valve of FC202D
case description 50% f 0% failure on f off failure on f off 20% f 0% 33% f 13% 33% f 26% 33% f 46% 33% f 59% 33% f 3% failure on f off 18% f 40% 18% f 60% 18% f 100% 18% f 0% 18% f 10% 39% f 30% 39% f 20% 39% f 10% 39% f 0% 37% f 33% 37% f 29% 37% f 16% 37% f 6% 37% f 0% 37% f 57% 40% f 30% 40% f 20% 40% f 10% 40% f 0% 33% f 37% 33% f 40% 33% f 50% 33% f 55% normal
Table 6. Variables Recorded as Dynamic Trends for the MTBE Process no.
variable
description
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
F•D201•in F•D201•out L•D201 F•D202•in1 F•D202•out1 F•D202•out2 L•D202 T•E201•out F•E201•steam T•R201•top T•R201•mid T•R201•bot F•C201•ref F•C201•out T•C201•top T•C201•mid T•C201•bot T•MTBE L•C201 F•D203•out L•D203
inlet flow of D201 outlet flow of D201 liquid level of D201 adding methanol flow of D202 methanol flow from D202 to R201 methanol flow from D202 to D211 liquid level of D202 outlet temperature of E201 steam flow through E201 top temperature of reactor R201 middle temperature of reactor R20112 bottom temperature of reactor R201 reflux flow of C201 product flow of MTBE top temperature of C201 middle temperature of C201 bottom temperature of C201 MTBE temperature liquid level of C201 liquid flow from D203 to C202 liquid level of D203
15]. Comparing Figures 10 and 11, it is clear that similar dynamic trends in Figure 10 are grouped closer in the PCA two-dimensional plane in Figure 11. We need to point out that the 7 data cases are processed using PCA together with the rest 93 cases, but here only the 7 are shown for illustrative purposes. The projections onto the PCA plane of the dynamic trends of some variables are shown in Figure 12. Only those variables that will later appear on the decision
liquid level control of C4 hydrocarbons tank D201 pump after the D201 switch valve after P201, on the feed stream of C4 hydrocarbons pump after D202, on the feed stream of methanol switch valve after P202, on the feed stream of methanol manual valve on the feed of methanol to tank D202 feed flow rate controller of methanol ‚‚‚ ‚‚‚ ‚‚‚ ‚‚‚ switch valve on the feed stream of methanol directly to column manual valve on the feed stream of methanol directly to column ‚‚‚ ‚‚‚ ‚‚‚ ‚‚‚ bottom T controller of the reactive distillation column C201 ‚‚‚ ‚‚‚ ‚‚‚ steam flow rate controller at the bottom of the column C201 ‚‚‚ ‚‚‚ ‚‚‚ ‚‚‚ ‚‚‚ reflux F controller of the reactive distillation column C201 ‚‚‚ ‚‚‚ ‚‚‚ feed flow rate controller of methanol feed flow rate controller of methanol feed flow rate controller of methanol feed flow rate controller of methanol normal operation
trees are shown in Figure 12. In Figure 12a, the regions A, C, and D are clearly recognized. However, the region B is fuzzy. This simply means that the cases in region B including 4-13, 17, 18, and 23-100 cannot be distinguished in Figure 12a. Their differences can only be identified through other variables. A similar explanation can be applied to Figure 12j, which classified the responses of the variable F_D202_in1 into two classes for all the data sets. This means that it requires other variables to discriminate the data sets. When all the variables considered together to predict the clusters, some variables might be more important than others to the classification. As shown in Figure 10, there are noises in the original signals. PCA was proven in previous research to be able to remove the noise effects very effectively. It was also found that, for most of the 21 variables, the groupings based on visual examination of the two-dimensional PCA plane are straightforward. Even though for a few variables their groupings may not be very clear, it does not affect the final result significantly. When the grouping for one variable is not clear, other variables will play more important roles in the operational state clustering. Identification of Operational States. In the CSTR case study, PCA was used to classify the operational states and was found to be able to give a satisfactory result. In this case study, both PCA and the adaptive resonance theory (ART2), an unsupervised classification method, were used. In early studies,10,11 we found that ART2 is very sensitive to noise contained in dynamic trend signals and to the vigilance value which affects
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999 4353
Figure 10. Dynamic trends of the temperature (T•MTBE) of the MTBE stream leaving the reactive distillation column C201.
Figure 11. Projection of the dynamic trends of Figure 10 on the two-dimensional PCA plane.
the clustering result and has to be given by users. We proposed to use wavelet multiscale analysis to replace the data preprocessing component of ART2 and thus developed a new framework ARTnet which proves to be robust even at high noise-to-signal ratio.10,11,26 In here, the first two PCs of each variable are used as inputs, and then ART2 can be used directly. This is because PCA has removed the noise component.11 The PCA result is shown in Figure 13, which gives 5 clusters, and the ART2 result is shown in Table 7 predicting 12 clusters. Both results are found to be reasonable, but ART2 gives a more detailed picture and is thought to be more accurate. Sammon36 indicated that, since PCA is a linear operation, it may not give an adequate representation only using the first few PCs when the number of the original variables is large and visual examination may not be possible. He also gave an example where data generated to give five groups in four dimensions are projected into the space of the two principal eigenvectors. Visual examination of this projection shows only four groups, since two of the clusters overlap completely in the two-dimensional space. In process operational data analysis, similar comments were made by Kresta et al.,33 Kramer,6 Dong and McAvoy,5 Raich and Cinar,30 and Zhang et al.4 In the following discussion, we will only consider the ART2 clustering result. It is apparent that clusters with only one data set are correct. These are clusters 3, 4, and 8. Data sets 1, 2, and 3 in cluster 1 all cause the flow rate of the C4
hydrocarbons feed to fall to zero and so should be in the same class. Data sets in cluster 2 comprising 4, 5, 7, 8, and 11 cause the methanol flow to the mixer M201 to be either completely cut off or greatly reduced to cause them to be in the same class. Data sets in class 5, which include cases 10, 35, and 36, correspond to changes of the output of the controller FC202D from 33% to 59%, 33% to 50% and 33% to 55%, respectively. Cluster 6 has cases 12, 13, 17, and 18 representing reduction or cut in methanol flow to the tank D211 and column C201. The two cases in class 7, 14 and 15, represent the changes in the opening of the valve HC211D from 18% to 40% and 60%, respectively. Cases 19-22 in cluster 9 refer to changes of the output of the controller TC201R from 39% to 30%, 20%, 10%, and 0%. Cases 23-27 correspond to changes in the output of the controller FC203E from 37% to 33%, 29%, 16%, 6%, and 0%, and are classified as cluster ten. Cluster 11 has four cases, 29-32, corresponding to changes of the output of the controller FC201D from 40% to 30%, 20%, 10%, and 0%. The last cluster, cluster 12, has the normal operational data sets 37-100. The assignment of cases 28, 33, and 34 to this cluster is not apparent but is nevertheless not unreasonable given that they represent insignificant changes. Automatic Generation of the Decision Tree. Conceptual clustering not only predicts the operational states but also interprets the prediction using causal knowledge in terms of decision trees or production rules. Here, a variable takes discrete values from a region of the two-dimensional PCA plane of the variable. For example, the liquid level at the bottom of column C201, L_C201, takes values from its PCA plane in Figure 12b including A, B, C, D, and E. For the data case number 24, L_C201 takes the value of D. For each data set, the operational state takes values from Table 7. For example, data set 24 takes the value of ABN10. The decision tree developed is shown in Figure 14. The decision tree can be easily translated into rules. For example, the rules that lead to ABN4 and ABN11 are as follows: Rule 7: IF T•MTBE ) B in Figure 12(a) AND F•D202•out1 ) C in Figure 12(e) THEN operational state ) ABN4 Rule 11: IF T•MTBE ) B in Figure 12(a) AND F•D202•out1 ) A in Figure 12(e) ANDT•C201•top ) A or B in Figure 12(k) THEN operational state ) ABN11
4354
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999
Figure 12. PCA two-dimensional plots of variables for MTBE.
The root node is T•MTBE, the bottom temperature of the reactive distillation column C201. This indicates that it is the most important variable that distinguishes operational modes representing the 100 data cases. Detailed examination of the decision tree and all the dynamic responses in conjunction with the MTBE process flow sheet revealed that the tree is reasonably good. An example to illustrate this is the rules leading to ABN11. From Table 7, it is known that ABN11 covers data cases 29, 30, 31, and 32, corresponding to changes
of the output of the controller FC201C (reflux flow rate control) at manual mode from 40% to 30%, 40% to 20%, 40% to 10%, and 40% to 0%. Apparently the most important variable to discriminate these data sets from others is the column top temperature T•C201•top. This is confirmed from Figure 14, in which the nearest node to ABN11 is T•C201•top. In Figure 14, the numbers at the bottom nodes indicate data sets. For instance, the node ABN8 has only one data set, which is 16. Comparing Figure 14 and
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999 4355
Figure 13. PCA two-dimensional plot of operational states for the MTBE process. Table 7. Clusters of Operational States Using ART2 clusters 1 2 3 4 5 6 7 8 9 10 11 12
cluster name ABN1 ABN2 ABN3 ABN4 ABN5 normal ABN7 ABN8 ABN9 ABN10 ABN11 normal
cases 1, 2, 3 4, 5, 7, 8, 11 6 9 10, 35, 36 12, 13, 17, 18 14, 15 16 19, 20, 21, 22 23, 24, 25, 26, 27 29, 30, 31, 32 28, 33, 34, 37-100
Table 7, it is found that the decision tree of Figure 14 gives correct predictions except for the nodes ABN10 and NORMAL. Data sets 23 and 24 were assigned to ABN10 by ART2 as shown in Table 7 but to the node NORMAL in Figure
Figure 14. Decision tree developed for the MTBE process.
14 by C5.0. Data sets 23 and 24 represent the cases that the output of the steam flow rate controller FC203E at the bottom of the reactive distillation column C201 was changed from 37% to 33% and 37% to 29%. In fact these are insignificant changes, so assigning them to NORMAL operational state is acceptable. This slight inconsistency can be attributed to two factors. First, ART2 is based on numerical calculation, which is more accurate than conceptual clustering using C5.0. Second, the conceptualization of variables through visual examination of PCA two-dimensional plane may bring some inaccuracy. Data sets 12, 13, 17, and 18 were clustered in a separate class by ART2, cluster 6, as shown in Table 7, but were mixed up with other cases in the node labeled NORMAL in Figure 14. Referring to Table 5 and Figure 9, all four cases cause flow rate changes on the methanol stream to tank D211 and then to the column C201. In reality, this flow rate is very small compared to the total methanol and C4 hydrocarbon flows to the mixer M201 (about 1/12 or 1/100, respectively). As a result, changes of the methanol flow to D211 are insignificant. Consequently it is reasonable to regard cases 12, 13, 17, and 18 as being in the same class as NORMAL operation. In the above discussion, the dynamic trend signals of a variable are converted to qualitative concepts in a PCA two-dimensional plane and then the inductive learning approach is used. An alternative way is using the eigenvalues of the first two PCs of a variable directly. The resulting decision tree derived in this way is shown in Figure 15. The tree is found to be completely unreasonable because most clusters are overlapped; therefore it cannot be used for predictions. Figure 16 shows the discretization of two variables T•MTBE and F•D220•out2. The dashed lines are the boundary values for the discretization. Since for each variable the discretization is always binary, this is obviously not
4356
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999
Figure 15. Decision tree developed if variables are not conceptualized.
Figure 16. Binary discretization carried out by C5.0.
satisfactory. For example, Figure 12i shows that the variables L•D201 can clearly take three values. A two-
valued discretization is not able to capture all the features, which causes the inaccuracy of the tree.
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999 4357
Concluding Remarks A conceptual clustering approach is developed for designing state space based process monitoring systems. Different from those clustering systems that depend on calculating a distance or similarity measure, conceptual clustering is able to not only cluster operational states and project the operational point to the state space, but also to give causal explanations in the form of production rules and decision trees. A critical step in the approach is extracting concepts from dynamic trend signals. For this purpose, an approach using principal component analysis is proposed. The causal relationships between the conceptual features of dynamic trends of variables and the operational states can be established as a decision tree automatically through inductive machine learning. The approach is illustrated using a CSTR reactor and applied to a refinery MTBE process. It is found that the approach is able to give an accurate projection of operating states using very simple decision trees. Comparison is also made in the MTBE case study between two ways of applying the inductive learning approach: (1) conceptualizing dynamic trends first in PC1-PC2 two-dimensional planes before the inductive learning method is used; and (2) using the eigenvalues of the first two PCs directly in inductive learning. It is found that the first way gives very accurate results while the second gives completely incorrect predictions. The reason is attributed to the fact that the inductive learning method can only perform binary discretization. Acknowledgment The authors acknowledge financial support of EPSRC (GR/L61774) and Aigis Systems Inc. on this work. Nomenclature A ) representing a region in Figures 3-6 and 12 ART2 ) adaptive resonance theory, an unsupervised neural network AutoClass ) a Bayesian automatic classification system developed by NASA B ) representing a region in Figures 3-6 and 12 C ) covariance C ) representing a region in Figures 3-6 and 12 C ) a collection of arbitrary selected samples for developing a tree, also called the window C4.5 ) early version of the inductive learning system C5.0 C5.0 ) the inductive machine learning system Ci ) a group of objects in the window C that have values of Ai of A, where A is an attribute CSTR ) continuous stirred tank reactor G2 ) a real-time expert system tool d ) number of selected eigenvalues to represent the original variables D ) representing a region in Figures 3-6, and 12 DCS ) distributed control systems E ) representing a region in Figures 3-6, and 12 E ) mathematical expectation Fi ) inlet feed flow rate to the CSTR reactor Fo ) outlet flow rate from the CSTR reactor Fw ) cooling water flow rate ID3 ) an early version of C5.0 L ) liquid level M ) mean vector m ) the mean of individual attributes n ) number of attributes P ) eigenvectors PC ) principal components PC-1 ) the first principal component
PC-2 ) the second principal component PCA ) principal component analysis T ) as superscript, transpose Ti ) inlet feed temperature TR ) reactor temperature Twi ) cooling water inlet temperature X ) the vector of attributes x ) an attribute λ ) eigenvalues
Literature Cited (1) Lukas, M. P. Distributed control systems: their evaluation and design; Van Nostrand Reinhold Company: New York, 1986. (2) Yamanaka, F.; Nishiya, T. Application of the intelligent alarm system for the plant operation. Comput Chem. Eng. 1997, 21, s625. (3) Howat C. S. Analysis of plant performance. In Perry’s Chemical Engineers’ Handbook; Perry, R. H., Green, D. W., Eds.; McGraw-Hill: New York, 1997. (4) Zhang, J.; Martin, E. B.; Morris, A. J. Fault detection and diagnosis using multivariate statistical techniques. Trans Inst. Chem. Eng. 1996, 74A, 89. (5) Dong, D.; McAvoy, T. J. Nonlinear principal component analysis-based on principal curves and neural networks. Comput Chem. Eng. 1996, 20, 65. (6) Kramer, M. A. Nonlinear principal component analysis using autoassociative neural networks. AIChE J. 1991, 37, 233. (7) Negiz, A.; Cinar, A. Statistical monitoring of multivariable dynamic processes with state-space models. AIChE J. 1997, 43, 2002. (8) Nomikos, P.; MacGregor, J. F. Monitoring batch process using multiway principal component analysis. AIChE J. 1994, 40, 1361. (9) Whiteley, J. R.; Davis, J. F. Knowledge-based interpretation of sensor patterns. Comput. Chem. Eng. 1992, 16, 329. (10) Wang, X. Z.; Chen, B. H.; Yang, S. H.; McGreavy, C. Application of wavelets and neural networks to diagnostic system developments2. An integrated framework and its application. Comput. Chem. Eng. 1999, 23, 945. (11) Wang, X. Z. Data mining and knowledge discovery for process monitoring and control; Springer: London, 1999. (12) Stephanopoulos, G. Chemical process control: an introduction to theory and practice; Prentice Hall: New York, 1984. (13) Marlin, T. E. Process control: designing processes and control systems for dynamic performance; McGraw-Hill: New York, 1995. (14) Bekiaris N.; Morari, M. Multiple steady states in distillation: infinity/infinity predictions, extensions, and implications for design, synthesis and simulation. Ind. Eng. Chem. Res. 1996, 35, 4264. (15) Schrans, S.; DeWolf, S.; Baur, R. Dynamic simulation of reactive distillation: an MTBE case study. Comput. Chem. Eng. 1996, 20, s1619. (16) Arandes, J. M.; DeLasa, H. I. Simulation and multiplicity of steady-states in fluidised FCCUs. Chem. Eng. Sci. 1992, 47, 2535. (17) Quinlan, J. R. C4.5: programs for machine learning; Morgan Kaufmann Publishers: San Mateo, CA, 1993. (18) Quinlan, J. R. Induction of decision trees. Machine learning 1986, 1, 81. (19) C5.0, http://www.ruleguest.com, 1999. (20) Quinlan, J. R. Improved use of continuous attributes in C4.5. J. Artif. Intell. Res. 1996, 4, 77. (21) Moore, R. L.; Kramer, M. A. Expert systems in on-line process control. In Proceedings of the third international conference on chemical process control; Morari, M., McAvoy T. J., Eds.; Elsevier: Asilomar, CA, 1986. (22) Janusz, M. E.; Venkatasubramanian, V. Automatic generation of qualitative descriptions of process trends for fault detection and diagnosis. Eng. Appl. Artif. Intell. 1991, 4, 329. (23) Cheung, J. T. Y.; Stephanopoulos, G. Representation of process trendss1. A formal representation framework. Comput. Chem. Eng. 1990, 14, 495. (24) Bakshi, B. R.; Stephanopoulos, G. Representation of process trendss3: multiscale extraction of trends from process data. Comput. Chem. Eng. 1994, 18, 267.
4358
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999
(25) Bakshi, B. R. Multiscale PCA with application to multivariate statistical process monitoring. AIChE J. 1998, 44, 1596. (26) Chen, B. H.; Wang, X. Z.; Yang, S. H.; McGreavy, C. Application of wavelets and neural networks to diagnostic system developments1. feature extraction. Comput. Chem. Eng. 1999, 23, 899. (27) Pearson K. On lines and planes of closest fit to systems of points in space. Philos. Mag. 1901, 2 (6), 559. (28) Hotelling, H. Analysis of a complex of statistical variables into principal components. J. Educ. Psych. 1933, 24, 417, 498. (29) Jaeckle, C. M.; MacGregor, J. F. Product design through multivariate statistical analysis of process data. AIChE J. 1998, 44, 1105. (30) Raich, A.; Cinar, A. Statistical process monitoring and disturbance diagnosis in multivariable continuous processes. AIChE J. 1996, 42, 995. (31) Chen, J. G.; Bandoni, J. A.; Romagnoli, J. A. Robust PCA and normal region in multivariate statistical process monitoring. AIChE J. 1996, 42, 3563. (32) Dunia, R.; Qin, S. J.; Edgar, T. F.; McAvoy, T. J. Identification of faulty sensors using principal component analysis. AIChE J. 1996, 42, 2, 2797.
(33) Kresta, J. V.; MacGregor, J. F.; Marlin, T. E. Multivariate statistical monitoring of process operating performance. Can. J. Chem. Eng. 1991, 69, 35. (34) Bakshi, B. R.; Stephanopoulos, G. Reasoning in time: modelling, analysis and pattern recognition of temporal process trends. In Intelligent Systems in Process Engineering: Paradigms from Design and Operations; Stephanopoulos, G., Han, C., Eds.; Academic Press: New York, 1996. (35) Saraiva, P. M. Inductive and analogical learning: datadriven improvement of process operations. In Intelligent Systems in Process Engineering: Paradigms from Design and Operations; Stephanopoulos, G., Han, C., Eds.; Academic Press: New York, 1996. (36) Sammon, J. W. A nonlinear mapping for data structure analysis. IEEE Trans. Comput. 1969, C18, 401.
Received for review February 24, 1999 Revised manuscript received August 17, 1999 Accepted August 23, 1999 IE990144Q