Article pubs.acs.org/IECR
Distributed PCA Model for Plant-Wide Process Monitoring Zhiqiang Ge* and Zhihuan Song State Key Laboratory of Industrial Control Technology, Institute of Industrial Process Control, Department of Control Science and Engineering, Zhejiang University, Hangzhou 310027, Zhejiang, China ABSTRACT: For plant-wide process monitoring, most traditional multiblock methods are under the assumption that some process knowledge should be incorporated for dividing the process into several sub-blocks. However, the process knowledge is not always available in practice. In this case, the monitoring scheme should be implemented through an automatic way. This paper intends to develop a new sub-block principal component analysis (PCA) method for plant-wide process monitoring, which is named as distributed PCA model. By constructing sub-blocks through different directions of PCA principal components, the original feature space can be automatically divided into several subfeature spaces. The constructed distributed PCA models in different subspaces can not only reflect the local behavior of the process change but also enhance the monitoring performance through the combination of individual monitoring results. Both of the monitoring and fault diagnosis schemes are developed based on the distributed PCA model. Two simulation case studies are carried out, on the basis of which the effectiveness of the proposed method is confirmed.
1. INTRODUCTION As a key technique in process system engineering, process monitoring has received lots of attentions in the past years. Particularly, data-based process monitoring methods such as principal component analysis (PCA) have become very popular; this is because large amounts of data sets have been generated and collected by the widely used distributed control system. Successful studies of the data-based process monitoring method have been reported in both academic and industrial areas.1−12 However, it has also been reported that the conventional PCA method may not function well for process monitoring in the plant-wide process, which is much more complex and always has many different parts and operation units.13,14 Due to its importance in modern industries, plant-wide process monitoring has become a hot research spot in recent years. Several different methods such as hierarchical and multiblock statistical based monitoring approaches have been developed.13−21 MacGregor et al.13 developed monitoring and diagnosis charts for each sub-block as well as a global monitoring chart for performance enhancement. Westerhuis et al.14 provided a comprehensive analysis of several multiblock and hierarchical PCA and PLS algorithms and gave those algorithms in a unified notation. Qin et al.15 also gave a detailed analysis of several multiblock and hierarchical PCA and partial least squares (PLS) algorithms.14 Besides, Qin et al.15 defined block and variable contributions upon T2 and SPE statistics for decentralized monitoring using multiblock PCA. A framework for sequential multiblock component methods has been proposed, and several sequential multiblock methods have been reviewed.16 Recently, Ge and Song21 proposed a two-step multiblock monitoring method, based on which the monitoring performance and fault interpretation have both been improved. However, to our best knowledge, almost all of these multiblock methods need some process knowledge for block division, which is an initial and also a key step for the development of the monitoring approach. In practice, however, © 2013 American Chemical Society
the process knowledge is not always available or only a part of them is available, especially in the complex plant-wide process. In this case, the block division step should be carried out automatically, which is purely based on the process data. Therefore, when the process knowledge is not effective to support the method for monitoring plant-wide processes, the automatic form of the data-based monitoring method is of particular interest. Basically, a good block division result requires different subblocks to be as diverse as possible, while the combination of all sub-blocks covers the whole plant-wide process. As a representative data-based monitoring approach, PCA has been widely accepted in the process monitoring area, which is also very simple for practical implementation. To this end, this paper intends to build the new automatic monitoring scheme based on this method, which is termed as distributed PCA model. First, an initial PCA decomposition is carried out upon the whole process variables. As a result, principal components can be obtained through various principal directions, which are uncorrelated with each other. If we divide the plant-wide process through these uncorrelated directions, the diversity of different sub-blocks can be enlarged, which is in accordance with the block division aim. Next, a subset of process variables can then be selected in different principal directions to form these distributed sub-blocks. Precisely, the subset of variables in each sub-block can be selected by their contributions to the corresponding principal component. Different from the traditional multiblock method, the new block division method may have variables overlapped in different sub-blocks. However, it does not lose the efficiency of this method for process monitoring purpose, since the combination of those distributed sub-blocks can easily cover Received: Revised: Accepted: Published: 1947
March 27, 2012 December 5, 2012 January 11, 2013 January 12, 2013 dx.doi.org/10.1021/ie301945s | Ind. Eng. Chem. Res. 2013, 52, 1947−1957
Industrial & Engineering Chemistry Research
Article
the contribution indices defined in eqs 2 and 3, the individual sub-block PCA model can be developed as follows
the whole plant-wide process. After the variables in each subblock have been selected, distributed sub-PCA models can be built in different sub-blocks. For online process monitoring, sub-block monitoring results are first generated in distributed sub-blocks, which are then aggregated to form the final monitoring decision. In addition, a corresponding fault diagnosis method is also proposed under the new distributed monitoring framework. The rest of the paper is organized as follows. In section 2, a detailed illustration of the new distributed PCA model based monitoring scheme is given, followed by two case studies in the next two sections. Finally, conclusions are made and some discussions are also provided.
T X i = TP i i + Ei
where i = 1, 2, ..., k + 1. The number of principal components in each sub-block can be determined by the cumulative percentage variance (CPV) rule. 2.2. Online Process Monitoring Based on Distributed PCA Model. For monitoring of a new data sample z, it is first projected by the distributed PCA model into each sub-block, the score vector is calculated as follows ti = PTi z,
Ti2(z) =
(1)
≤ Ti2,lim (6) (7)
Ti2,lim =
ki(n − 1) Fki ,(n − ki), α n − ki
SPEi ,lim
⎡ ⎤1/ h0 cα 2θ2h02 θ2h0(h0 − 1) ⎥ ⎢ = θ1 1 + + ⎢ ⎥ θ1 θ12 ⎣ ⎦
Σj m= k + 1λrj
(8)
(9)
where θr = for r = 1, 2, 3, h0 = 1 − α is significance level, cα is the normal deviation corresponding to the upper 1−α percentile. To evaluate the plant-wide performance, all sub-block monitoring results are aggregated as follows
(2θ1θ3)/(3θ22),
2 Tglob = Com{T 2(1), T 2(2), ..., T 2(k + 1)}
(10)
SPEglob = Com{SPE(1), SPE(2), ..., SPE(k + 1)}
(11)
2
where T (i), SPE(i), i = 1, 2,..., k + 1 represent the monitoring results of T2 and SPE in each sub-block. It is noted that how to aggregate the sub-block monitoring results will greatly influence the global performance. If we define the aggregation rule aggressively, for example, the fault is considered to be detected by the T2/SPE statistic as long as one individual control limit of the T2/SPE statistic is violated, the type I error will also increase. This is because some normal process disturbance may also exist in the process. The new method intends to examine the process change in different monitoring directions; thus, the small change of the process can be detected more easily. However, at the same time, some small changes may be resulted from normal process disturbances, which should not be considered as the fault. Therefore, to reduce the monitoring risk of the process, the aggregation rule should not be determined too aggressively. On the other side, if the fault is considered to be detected when l ≫ 1 individual models are violated, the aggregation rule is conservative, where l is the number of violated models. However, due to the alleviation and
(2)
l=1
=
λi , j
where i = 1, 2, ..., k + 1 and λi,j is the eigenvalue of the individual sub-block PCA model, ki is the number of selected principal components in each distributed sub-block. The confidence limits of T2 and SPE can be determined as follows
m
mean([pv2̃ 1 pv2̃ 2 ···pv2̃ (m − k) ])
ti2, j
T T SPEi(z) = z(I − PP i i )z ≤ SPEi ,lim
CT(v , k + 1) = qv2 /∑ ql2 qv2
ki
∑ j=1
m l=1
(5)
Then, two traditional monitoring statistic values T and SPE are calculated and compared to their corresponding confidence limits4
where the first element represents principal components extracted in the principal component subspace (PCS), and the second item corresponds to the residual subspace (RS). If k principal components are selected in PCS, according to the principal rule of the PCA method, these principal components are uncorrelated to each other. Due to the diversity requirement of the block division method, if we build individual sub-block models through these uncorrelated directions, the diversity between these sub-blocks can be well satisfied. On the other hand, if the most relevant variables are selected in each of these uncorrelated sub-blocks, the most important information can be retained; thus, the accuracy of the individual sub-block model can also be guaranteed. Therefore, a total of k distributed sub-blocks can be constructed. Since the residual part of the PCA model is uncorrelated to any one of the principal component, an additional sub-block can be constructed in the residual subspace. The retained variables in each distributed sub-block can be selected due to their contributions, which are given as follows
CT(v , w) = pij2 /∑ plj2
i = 1, 2, ..., k + 1 2
2. PLANT-WIDE PROCESS MONITORING BASED ON THE DISTRIBUTED PCA MODEL (DPCA) In this section, the distributed PCA model based monitoring method is demonstrated, including distributed sub-block division and model development, process monitoring, fault diagnosis, and algorithm complexity analysis. 2.1. Distributed Sub-block Division and Model Development. Suppose the originally collected process data set is represented as Xorig ∈ 9 n×m, where m is the number of process variables, and n is the number of samples for each variable. An initial PCA decomposition is carried out on Xorig, thus1 ̃ P̃ Torig X orig = TorigPTorig + Torig
(4)
(3)
where v = 1, 2, ..., m and w = 1, 2, ..., k. Therefore, based on the contributions indices provided in eqs 2 and 3, those variables that have high contribution values should be retained in the corresponding sub-block. After the variables have been selected in each distributed sub-block by 1948
dx.doi.org/10.1021/ie301945s | Ind. Eng. Chem. Res. 2013, 52, 1947−1957
Industrial & Engineering Chemistry Research
Article
may not be able to obtain a precise fault diagnosis result; instead, it may only provide a subset of most responsible variables for the detected fault. Therefore, it would be very useful if this method can be combined with additional fault diagnosis methods (e.g., expert knowledge for specific processes). 2.4. Algorithm Complexity Analysis. The time complexity of calculating the m × m covariance matrix of the original data set Xorig ∈ Rn×m by PCA is given as follows
self-protection effects of process equipments, the fault cannot be propagated to all parts of the process. Therefore, the monitoring performance may deteriorate obviously if l is selected as a big value. Besides, if a fault only influences one or several sub-blocks, an inevitable false alarm may happen when a significant value of l is selected. In general, with the increase of l, the aggregation rule changes from aggressive to conservative. If we want to monitor some small faults, l should be chosen as a small value. However, there is a risk that the false alarm rate may also increase. In contrast, if we only want to monitor obvious process faults and care more about false alarm rate, l can be selected as a big value. 2.3. Fault Diagnosis. After some fault has been detected, the next important step is fault diagnosis, which is to determine the root causes of the detected fault. The well-known contribution plot based method was used for fault diagnosis in the traditional PCA method.22,23 However, the number of process variables in plant-wide processes is always very large, and many variables may have comparable effect to the detected fault. It is very difficult to determine the real root causes among these large process variables. Although this method is easy to use, practical experiences have shown that this method increases the chance of false identification of detected faults. Another popular fault diagnosis method is called reconstruction-based method, which has been studied for many years.24−27 Although this method has higher reliability for fault diagnosis than the contribution plot based method, it is under the assumption that the fault subspace should be initially determined by some process knowledge. In the present paper, a new fault diagnosis method is proposed under the distributed monitoring framework. Suppose a fault has influenced several sub-blocks. Without loss of generality, we divide the sub-blocks into two different clusters, which are given in eqs 12 and 13. The first cluster includes those sub-blocks in which the fault can be detected and the second cluster consists of the rest sub-blocks. FS = sub‐block{A(i), i = 1, 2, ···, f }
(12)
NS = sub‐block{B(j), j = 1, 2, ···, g }
(13)
TPCA = Torig = O(nm2)
where m is the number of process variables and n is the sample number for each variable. Now we consider the time complexity of the distributed PCA models, suppose r variables have been selected in each sub-block, thus X i = X(: , I(i)) ∈ Rn × r ,
TDPCA = O[(k + 1)nr 2]
(19)
3. A NUMERICAL EXAMPLE The numerical system is given as follows ⎡ x1 ⎤ ⎡ 0.5768 0.3766 ⎤ ⎡ e1 ⎤ ⎥ ⎢x ⎥ ⎢ ⎢e ⎥ ⎢ 2 ⎥ ⎢ 0.7382 0.0566 ⎥⎡ s1 ⎤ ⎢ 2 ⎥ ⎢ x3 ⎥ = ⎢ 0.8291 0.4009 ⎥⎢ ⎥ + ⎢ e3 ⎥ ⎥⎣ s 2 ⎦ ⎢ e ⎥ ⎢x ⎥ ⎢ ⎢ 4 ⎥ ⎢ 0.6519 0.2070 ⎥ ⎢ 4⎥ ⎢⎣ e5 ⎥⎦ ⎣⎢ x5 ⎦⎥ ⎢⎣ 0.3972 0.8045 ⎥⎦
(20)
T
where [s1 s2] are Gaussian distributed data sources, [e1 e2 e3 e4 e5]T are zero-mean white noises with standard deviation 0.01. To simulate a large operation scale of the plant-wide process, three operating conditions are constructed by different data sources. 200 normal samples of each operating condition are generated by eq 20. Therefore, a total of 600 training samples are used for development of the distributed PCA model. As a result, one principal component was selected for the PCA model, which can explain 95.26% of the process information. Therefore, 2 sub-blocks can be constructed for plant-wide process monitoring. Based on the variable contribution index, three variables are selected in each sub-block. Again, one principal component is selected in each of the two sub-block PCA models, which can explain 99.44% and 92.81% of the subblock information, respectively. For process monitoring, the aggregation rule should be selected. In this numerical example, only two sub-blocks have been constructed, the value of the aggregation rule l is selected as 1. In order to evaluate the superiority of the developed
(14)
(15)
By integrating these two variable sets, the fault responsible variable set (RVS) can be made as follows RVS = INV − (INV ∩ EXV)
(18)
where k + 1 is the number of constructed subspaces. Compare eq 17 and eq 19, if m/r > (k + 1)1/2, then the time complexity of distributed PCA is lower than that of PCA, thus TPCA > TDPCA . Otherwise, if m/r < (k + 1)1/2, then TPCA < TDPCA holds. Actually, the number of selected variables r in each sub-block is a tuning parameter of the distributed PCA method. If we want to reduce the time complexity of the method, r can be selected to satisfy m/r > (k + 1)1/2. However, r should not be too small, because the selected data set may not contain necessary information to model the sub-block, which will cause poor performance when used for process monitoring.
where J(B(i)) represents the variable sequence number in subblock B(i). Then the common variables which are included in the FS set can be determined as INV = J(A(1)) ∩ J(A(2)) ∩ ··· ∩ J(A(f ))
i = 1, 2, ···, k + 1
Then, the time complexity of distributed PCA can be calculated as
where f and g are the sizes of two sub-block sets, f + g = k + 1. A and B are vectors that contain sub-block numbers in the corresponding sub-block cluster. First, we determine the variables which should be excluded in the fault responsible variable set. The excluded number of variables (EXV) can be determined as EXV = J(B(1)) ∩ J(B(2)) ∩ ··· ∩ J(B(g ))
(17)
(16)
through which the possible responsible variables can be further narrowed. Therefore, when a fault has been detected, this method can first exclude the variables in the normal sub-block, and then determine the common variables in the faulty subblocks. Finally, the responsible variables or the root causes of the fault can be inferred. However, it is noted that this method 1949
dx.doi.org/10.1021/ie301945s | Ind. Eng. Chem. Res. 2013, 52, 1947−1957
Industrial & Engineering Chemistry Research
Article
Figure 1. Monitoring results of normal process data: (a) distributed PCA sub-block 1; (b) distributed PCA sub-block 2; (c) PCA.
Figure 2. Variable contributions in two sub-blocks: (a) the first sub-block; (b) the second sub-block.
5 are selected in the second sub-block. Therefore, if a process change happened in variable 1, 3, or 4, it can be detected in the first sub-block. Otherwise, if a process change happened in variable 1, 2, or 5, it is much easier to detect it in the second sub-block. Next, we consider the faulty data set that is generated in case 2. The monitoring results of the two cases are given in Figure 3. It can be seen that the conventional PCA method cannot detect this fault. However, the distributed PCA method can successfully detect this fault. Particularly, the fault can be detected immediately when it is introduced in the process. The reason why the fault can be detected in the first sub-block is straightforward. Since variable 3 was only selected in the first sub-block, when the fault is driven by this variable, it can be successfully detected in the first sub-block. According to the fault diagnosis method, the root causes of this fault should be
method, the conventional PCA model is also built for process monitoring. For testing purpose, two data sets were generated as follows, each of which has 200 samples • Case 1: normal operation condition. • Case 2: the system is initially running at a normal condition, and then a step change of 2 is added to variable 3 starting from sample 101 to the end of the operating duration. First, the conventional PCA method is carried out on the testing data set generated in case 1. The monitoring results of distributed PCA and PCA are shown in Figure 1, in which one can find that both two methods show good monitoring performance. Figure 2 provides contributions of process variables in each sub-block. As can be seen, variables 1, 3, and 4 are selected in the first sub-block, while variables 1, 2, and 1950
dx.doi.org/10.1021/ie301945s | Ind. Eng. Chem. Res. 2013, 52, 1947−1957
Industrial & Engineering Chemistry Research
Article
Figure 3. Monitoring results of fault 1: (a) distributed PCA sub-block 1; (b) distributed PCA sub-block 2; (c) PCA.
Figure 4. Control system of the Tennessee Eastman process.
determined as variable 3 and variable 4. Actually, when the fault is introduced at variable 4, similar monitoring results will be obtained. Again, based on the fault diagnosis method, the root causes of the fault are also determined as variable 3 and variable 4. Therefore, the most responsible variables for the fault can be located. Although the precise fault diagnosis result has not been obtained, the developed method is still useful. This is because it can successfully narrow down the responsible variables for the fault. Hence, if the numbers of sub-blocks and variables are both large in the plant-wide process, the new fault diagnosis
method could save a lot of efforts for further examination of the root causes for the detected fault.
4. CASE STUDY OF TE BENCHMARK PROCESS As a benchmark simulation, the Tennessee Eastman process was considered as a representative plant-wide process, which has been widely used to test the performance of various monitoring approaches.28 This process consists of five major unit operations: a reactor, a condenser, a compressor, a separator, and a stripper. The control structure is shown schematically in Figure 4. 1951
dx.doi.org/10.1021/ie301945s | Ind. Eng. Chem. Res. 2013, 52, 1947−1957
Industrial & Engineering Chemistry Research
Article
There are 41 measured variables and 12 manipulated variables in the TE process. In order to simulate process changes, a set of 21 programmed faults are introduced to the process. The details of the process description are well explained in Downs and Vogel.28 For process monitoring purposees, 33 variables were selected, which are tabulated in Table 1. Among these 33 variables, 22 are continuous process
Table 2. Faults Description of TE Process fault no. 1
meas. variables
no.
meas. variables
no.
manipulated variables
23
D feed flow valve
8
24
E feed flow valve
9
D feed temp. (stream 2)
25
A feed flow valve
10
C feed temp. (stream 4)
26
total feed flow valve compressor recycle valve purge valve
11
reactor cooling water inlet temp.
12
condenser cooling water inlet temp.
13 14 15 16 17 18 19 20 21
reaction kinetics reactor cooling water valve condenser cooling water valve unknown unknown unknown unknown unknown valve position constant (stream 4)
1
A feed
12
2
D feed
13
3
E feed
14
4
total feed
15
product separator level product separator pressure product separator underflow stripper level
5
recycle flow
16
stripper pressure
27
6
17
stripper underflow
28
18
stripper temp.
29
8
reactor feed rate reactor pressure reactor level
19
stripper steam flow
30
9
reactor temp.
20
compressor work
31
10
purge rate
21
32
11
product separator temp.
22
reactor cooling water outlet temp. separator cooling water outlet temp.
7
type
2 3 4 5 6 7
Table 1. Monitoring Variables in the TE Process no.
process variable A/C feed ratio, B composition constant (stream 4) B composition, A/C ratio constant (stream 4) D feed temp. (stream 2) reactor cooling water inlet temp. condenser cooling water inlet temp. A feed loss (stream 1) C header pressure loss-reduced availability (stream 4) A, B, C feed composition (stream 4)
33
separator pot liquid flow valve stripper liquid product flow valve stripper steam valve reactor cooling water flow condenser cooling water flow
step step step step step step step random variation random variation random variation random variation random variation slow drift sticking sticking unknown unknown unknown unknown unknown constant position
Table 3. Selected PC Number in Each Sub-block sub-blocks
variables and 11 are manipulated variables. 960 data samples were generated for training the distributed PCA and PCA models. On the other hand, one normal and 21 fault data sets were also generated for testing purpose. Each fault testing data set consists of 960 samples, while the normal testing data set consists of 500 data samples. All faults in the testing data sets were introduced in the process at sample 161. The detailed descriptions of these 21 faults are tabulated in Table 2. To build the distributed PCA monitoring model, an initial PCA decomposition should be carried out. Depending on the CPV rule, the number of principal components for the traditional PCA is selected as 14, which can explain 85.15% of the entire process information. Hence, a total of 15 subblocks can be constructed. Before developing the distributed sub-block PCA model in each sub-block, the parameter r should be determined first. Due to the algorithm complexity analysis in the section 2.4, if r is selected to satisfy m > r(k + 1)1/2, the computational burden will be greatly reduced. In order to control the algorithm complexity, we can choose r < 33/(15)1/2 = 8.52. In this case study, r = 8 was selected. Then, 15 sub-block models can be developed. The selected principal component number in each sub-block can also be determined by the CPV rule to meet the requirement that CPV > 85%, which are listed in Table 3. In the monitoring results aggregation step, one more parameter l should be selected. We first select l = 1, 2, 3 to examine the false alarm rates in monitoring normal process. The false alarm rates of distributed PCA and PCA in monitoring normal process data are tabulated in Table 4. As shown in this table, when the parameter was selected as l = 2, the false alarm rate is greatly decreased.
PC no. sub-block PC no.
1
2
3
4
5
6
7
8
2 9 6
4 10 6
5 11 6
4 12 7
4 13 6
4 14 6
4 15 3
3
Table 4. False Alarm Rates of Distributed PCA and PCA DPCA_T2 DPCA_SPE
statistics false alarm rates
l=1 l=2 l=3
1.00% 0.6% 0.0%
4.60% 1.2% 0.20%
PCA_T2 PCA_SPE 1.04%
0.93%
Actually, when l > 3 was selected, the false alarm rate was rarely changed, which can also be seen clearly in Figure 5. Therefore, we can choose l = 2 or l = 3. For monitoring performance enhancement, l = 2 was selected in the present paper. Having tested all of the 21 process faults, we put the monitoring results (Type II error) together in Table 5, in which the monitoring results of PCA are also given for comparison. The minimum value achieved for each process fault is highlighted by a bold number. As shown in Table 5, distributed PCA outperforms the conventional PCA method for most fault cases. Specially, type II errors of distributed PCA for fault 5 and fault 10 are much lower than those of PCA. Fault 5 is a step change of the condenser cooling water inlet temperature. When it happens, the flow rate of the outlet stream from the condenser to the separator also increases, which can cause an increase in temperature in the separator, and thus affects the separator cooling water outlet temperature. 1952
dx.doi.org/10.1021/ie301945s | Ind. Eng. Chem. Res. 2013, 52, 1947−1957
Industrial & Engineering Chemistry Research
Article
Figure 5. False alarm rates of T2 and SPE with the change of l: (a) T2 statistic; (b) SPE statistic.
still continuously manipulated while the separator cooling water outlet temperature has returned to their corresponding setpoints, which can be seen in Figure 7. If one judged from the monitoring result of PCA, he/she would probably conclude that a fault entered the process and then corrected itself in about 10 h, which is wrong and may be very harmful to the process. On the other hand, the distributed PCA method can successfully detect the fault after sample 350, which can be reflected in two sub-blocks (4 and 6). The monitoring results of these two sub-blocks are given in Figure 8. It can be found that the fault can still be detected after sample 350 in both subblocks. In order to find the root cause of this fault, the variable contributions in each sub-block should be examined. Figure 9 shows the variable contributions in both sub-blocks, through which we can find that four variables (15, 17, 30, and 33) have dominant contributions in both sub-blocks. On the other hand, these four variables are not included in the excluded variable set of other sub-blocks. Depending on the new fault diagnosis method, these four variables can be determined as the root causes of fault 5. Fault 10 is a random change of the temperature in stream 4 (C feed). The simulation results of this fault by PCA and distributed PCA are shown in Figures 10 and 11, respectively. Comparing the monitoring results in these two figures, it can be found that the performance has been greatly improved by the distributed PCA method. Particularly, good monitoring results are mainly shown in three sub-blocks (1, 2, and 15). In contrast, the fault can hardly be detected in other sub-blocks. Precisely, Figure 12 shows the monitoring results of this fault in four sub-blocks (4, 5, 9, and 13). It can be seen that the fault cannot be detected in these sub-blocks. To find the root cause of this fault, three contribution plots in sub-block 1, 2, and 15 are given together in Figure 13. The common responsible variables in these three sub-blocks include variable 18, 19, and 27, which correspond to stripper temperature, stripper streamflow, and compressor recycle valve position. Again, these three variables are not included in the excluded variable set of other sub-blocks. By examining and analyzing the TE process, the stripper temperature and its streamflow have strong relationships with the temperature change of stream 4, and the position of the compressor recycle valve can also influence the temperature in the stripper. Therefore, these process variables can be determined as the root causes of the fault.
Table 5. Monitoring Results of the TE Process fault no.
PCA T2
PCA SPE
DPCA T2
DPCA SPE
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
0.009 0.016 0.989 0.771 0.761 0.009 0.000 0.031 0.989 0.720 0.583 0.016 0.064 0.008 0.988 0.875 0.238 0.108 0.894 0.678 0.585
0.001 0.039 0.959 0.000 0.743 0.000 0.000 0.116 0.976 0.648 0.214 0.089 0.048 0.000 0.958 0.636 0.036 0.096 0.798 0.463 0.511
0.006 0.014 0.929 0.001 0.700 0.005 0.000 0.020 0.930 0.540 0.280 0.013 0.056 0.000 0.949 0.698 0.146 0.100 0.749 0.479 0.546
0.001 0.023 0.926 0.000 0.000 0.000 0.118 0.041 0.913 0.466 0.168 0.018 0.043 0.001 0.923 0.519 0.026 0.090 0.548 0.348 0.500
This fault can easily be compensated by the control system. To see it clearly, the monitoring results of PCA for this fault is first given in Figure 6. It can be found that the fault cannot be
Figure 6. Monitoring results of fault 5 by PCA.
detected after about sample 350. This is because this fault has been compensated by the control system, and the temperature of the separator returns to its set-point. As shown in Figure 6, it takes about 10 h to reach the steady state again. However, this fault still remains in the process after sample 350, because condenser cooling water inlet temperature is still higher than its normal value. Precisely, the condenser cooling water flow rate is
5. CONCLUSIONS AND DISCUSSIONS In the present paper, a new distributed PCA model has been developed for plant-wide process monitoring. Based on this method, the block division step can be carried out automatically, which means no process knowledge is needed. Simulation 1953
dx.doi.org/10.1021/ie301945s | Ind. Eng. Chem. Res. 2013, 52, 1947−1957
Industrial & Engineering Chemistry Research
Article
Figure 7. Data characteristics of two variables in fault 5: (a) separator cooling water outlet temperature; (b) condenser cooling water flow rate.
Figure 8. Monitoring results of fault 5 by DPCA: (a) sub-block 4; (b) sub-block 6.
Figure 9. Variable contributions in two sub-blocks: (a) sub-block 4; (b) sub-block 6.
results on a numerical example and the TE benchmark process have demonstrated the efficiency of the developed method. However, there are several parameters in the distributed PCA method that should be carefully determined. The number of selected principal components in each sub-block PCA model can easily be chosen by the CPV rule. Another two important parameters are the size of each sub-block r and the aggregation parameter l. In the present paper, r was selected to guarantee a low computational complexity. However, this is not always the right choice. How to determine the value of r theoretically needs further investigation in future work. The number selection of l will greatly influence the monitoring performance of the proposed method. With the increase of the l value, the aggregation rule will change from aggressive to conservative. Generally, how to choose l depends on the monitoring task and performance requirement.
Figure 10. Monitoring results of fault 10 by PCA.
1954
dx.doi.org/10.1021/ie301945s | Ind. Eng. Chem. Res. 2013, 52, 1947−1957
Industrial & Engineering Chemistry Research
Article
Figure 11. Monitoring results of fault 10 by DPCA: (a) sub-block 1; (b) sub-block 2; (c) sub-block 15.
Figure 12. Monitoring results of fault 10 by DPCA: (a) sub-block 4; (b) sub-block 5; (c) sub-block 9; (d) sub-block 13.
While the distributed PCA model tries to divide process variables into different sub-blocks in the plant-wide process,
there are some other techniques that divide the operating region into different sub-blocks or modes. Among these 1955
dx.doi.org/10.1021/ie301945s | Ind. Eng. Chem. Res. 2013, 52, 1947−1957
Industrial & Engineering Chemistry Research
Article
Figure 13. Variable contributions in three sub-blocks, (a) sub-block 1; (b) sub-block 2; (c) sub-block 15. (5) Zhang, Y.; Ma, C. Fault diagnosis of nonlinear processes using multiscale KPCA and multiscale KPLS. Chem. Eng. Sci. 2011, 66, 64− 72. (6) Wang, X. Z.; Medasani, S.; Marhoon, F.; Albazzaz, H. Multidimensional visualization of principal component scores for process historical data analysis. Ind. Eng. Chem. Res. 2004, 43, 7036− 7048. (7) Lee, Y. H.; Jin, H. D.; Han, C. H. On-line process state classification for adaptive monitoring. Ind. Eng. Chem. Res. 2006, 45, 3095−3107. (8) Wang, X.; Kruger, U.; Irwin, G. W.; McCullough, G.; McDowell, N. Nonlinear PCA with the local approach for diesel engine fault detection and diagnosis. IEEE Trans. Control Syst. Technol. 2007, 16, 122−129. (9) Yu, J. Nonlinear bioprocess monitoring using multiway kernel localized fisher discriminant analysis. Ind. Eng. Chem. Res. 2011, 50, 3390−3402. (10) Yao, Y.; Chen, T.; Gao, F. Multivariate statistical monitoring of two-dimensional dynamic batch processes utilizing non-Gaussian information. J. Process Control 2011, 20, 1188−1197. (11) Ge, Z. Q.; Yang, C. J.; Song, Z. H. Improved kernel PCA-based monitoring approach for nonlinear processes. Chem. Eng. Sci. 2009, 64, 2245−2255. (12) Wang, J.; He, Q. P. Multivariate statistical process monitoring based on statistics pattern analysis. Ind. Eng. Chem. Res. 2010, 49, 7858−7869. (13) MacGregor, J. F.; Jaeckle, C.; Kiparissides, C.; Kourtoudi, M. Process monitoring and diagnosis by multiblock PLS methods. AIChE J. 1994, 40, 826−838. (14) Westerhuis, J. A.; Kourti, T.; MacGregor, J. F. Analysis of multiblock and hierarchical PCA and PLS models. J. Chemom. 1998, 12, 301−321. (15) Qin, S. J.; Valle, S.; Piovoso, M. J. On unifying multiblock analysis with application to decentralized process monitoring. J. Chemom. 2001, 15, 715−742.
techniques, the Gaussian mixture model is an attractive one that has been well studied and widely used for modeling and monitoring of processes with multiple operation modes.29−31 Although the division of GMM is carried out on data samples, with appropriate conversion, this method may be made useful for variable division in the plant-wide process. Furthermore, a two-dimensional mixture model may also be developed for monitoring more complex industrial processes.
■
AUTHOR INFORMATION
Corresponding Author
*E-mail:
[email protected],
[email protected]. Notes
The authors declare no competing financial interest.
■
ACKNOWLEDGMENTS This work was supported in part by the National Natural Science Foundation of China (NSFC) (61273167), National Project 973 (2012CB720500), Zhejiang Provincial Natural Science Foundation of China (LY12F03008), and the Fundamental Research Funds for the Central Universities.
■
REFERENCES
(1) Jackson, J. E. A User’s Guide to Principal Components; Wiley: New York, 1991. (2) Bakshi, B. R. Multiscale PCA with applications to multivariate statistical process monitoring. AIChE. J. 1998, 40, 1596−1610. (3) Kano, M.; Nagao, K.; Hasebe, H.; Hashimoto, I.; Ohno, H.; Strauss, R.; Bakshi, B. R. Comparison of multivariate statistical process monitoring methods with applications to the Eastman challenge problem. Comput. Chem. Eng. 2002, 26, 161−174. (4) Ge, Z. Q.; Song, Z. H. Online monitoring of nonlinear multiple mode processes based on adaptive local model approach. Control Eng. Pract. 2008, 16, 1427−1437. 1956
dx.doi.org/10.1021/ie301945s | Ind. Eng. Chem. Res. 2013, 52, 1947−1957
Industrial & Engineering Chemistry Research
Article
(16) Smilde, A. K.; Westerhuis, J. A.; de Jong, S. A framework for sequential multiblock component methods. J. Chemom. 2003, 17, 323−337. (17) Choi, S. W.; Lee, I. B. Multiblock PLS-based localized process diagnosis. J. Process Control 2005, 15, 295−306. (18) Cherry, G. A.; Qin, S. J. Multiblock principal component analysis based on a combined index for semiconductor fault detection and diagnosis. IEEE Trans. Semiconductor Manufacturing 2006, 19, 159−172. (19) Hoskuldsson, A.; Svinning, K. Modeling of multi-block data. J. Chemom. 2006, 20, 376−385. (20) Kohonen, J.; Reinikainen, S. P.; Aaljoki, K.; Perkio, A.; Vaanaen, T.; Hoskuldsson, A. Muliti-block methods in multivariate process control. J. Chemom. 2008, 22, 281−287. (21) Ge, Z. Q.; Song, Z. H. Two-level multiblock statistical monitoring for plant-wide processes. Korea J. Chem. Eng. 2009, 26, 1467−1475. (22) Westerhuis, J.; Gurden, S.; Smilde, A. Generalized contribution plots in multivariate statistical process monitoring. Chemom. Intell. Lab. Syst. 2000, 51, 95−114. (23) Alawi, A.; Choi, S. W.; Martin, E.; Morris, J. Sensor fault identification using weighted combined contribution plots. Fault Detection, Supervision, and Safety of Technical Processes 2007, 2006, 908−913. (24) Dunia, R.; Qin, S. J. Subspace approach to multidimensional fault identification and reconstruction. AIChE J. 1998, 44, 1813−1831. (25) Qin, S. J.; Li, W. H. Detection, identification, and reconstruction of faulty sensors with maximized sensitivity. AIChE J. 1999, 45, 1963− 1976. (26) Lieftucht, D.; Kruger, U.; Irwin, G. W. Improved reliability in diagnosing faults using multivariate statistics. Comput. Chem. Eng. 2006, 30, 901−912. (27) Li, G.; Alcala, C. F.; Qin, S. J.; Zhou, D. H. Generalized reconstruction-based contributions for output-relevant fault diagnosis with application to the Tennessee Eastman process. IEEE Trans. Control Syst. Technol. 2011, 19, 1114−1127. (28) Downs, J. J.; Vogel, E. F. A plant-wide industrial process control problem. Comput. Chem. Eng. 1993, 17, 245−255. (29) Ge, Z. Q.; Song, Z. H. Mixture Bayesian regularization method of PPCA for multimode process monitoring. AIChE J. 2010, 56, 2838−2849. (30) Ge, Z.; Song, Z. Maximum-likelihood mixture factor analysis model and its application for process monitoring. Chemom. Intell. Lab. Syst. 2010, 102, 53−61. (31) Yu, J. A nonlinear kernel Gaussian mixture model based inferential monitoring approach for fault detection and diagnosis of chemical processes. Chem. Eng. Sci. 2012, 68, 506−519.
1957
dx.doi.org/10.1021/ie301945s | Ind. Eng. Chem. Res. 2013, 52, 1947−1957