The 9th International Conference on Modelling, Identification and Control (ICMIC 2017), Kunming, China, July 10-12, 2017
Multimode Srocess fault detection method based on variable local outlier factor Lei Wang, Xiaogang Deng
multimode process effectively, Zhao et al. [7] studied multiple modeling strategy based PCA method, which divides multimode training dataset into different groups through clustering technique and then builds individual model for each group separately. Ge et al. [8] utilized Bayesian inference to combine monitoring results in each mode to construct global monitoring statistical index. Different from these multiple modeling methods, local modeling strategy has been introduced to monitor multimode process. He and Wang [9] introduced the k-nearest rule to develop monitoring model. Deng and Tian [10] proposed a local neighborhood similarity analysis method, which uses the PCA similarity factor between the current data window and its local neighborhood data window to monitor process status. A novel local neighborhood standardization strategy has been put forward by Ma et al. [11] to transform multimode data into an approximately unimodal or Gaussian distribution. In addition, Gaussian mixture model (GMM) is also utilized to address multimode data analysis [12], [13]. Second, the uncertainty of data distribution within a single operating mode would influence the monitoring performance. In real industrial process, variable characteristic is complex and usually unknown. Usually, PCA is competent to handle Gaussian process. However, its two monitoring statistics are constructed based on Gaussian distribution and may not perform well in non-Gaussian cases. In contrast to PCA, ICA is more suitable for non-Gaussian process, but its monitoring performance may not comparable to that of PCA if process variables follow normalized distribution. In fact, the process variables cannot conform to the strict Gaussian distribution, but follows the non-Gaussian distribution or the mixture distribution of Gaussian and non-Gaussian. Considering the advantages of PCA and ICA, several works has been presented to combine PCA and ICA for better process monitoring. Ge and Song [14] performed ICA first and PCA second while Liu [15] used PCA first and ICA next to extract information of Gaussian and non-Gaussian. In order to handle nonlinear process, Zhao [16] proposed a modeling strategy using kernel-ICA-PCA method. Recently, a series of local outlier factor (LOF) statisticsbased monitoring methods, which can cope with multimodality and distribution uncertainty simultaneously, have been proposed in different cases. Ma et al. [17] proposed a neighborhood standardized LOF method for better monitoring performance and further developed an adaptive LOF method [18]. Song et al. [19] combined a LOF-based clustering strategy and LOF-based statistics for
Abstract—Recently, local outlier factor (LOF)-based process monitoring methods, which can handle multimodality and complex data distribution simultaneously, have been successfully applied to industrial process monitoring. However, local variable information cannot be reflected by tradition LOF method and would be submerged in process monitoring. To emphasize information of local variable, this paper proposes a novel multimode process fault detection method, referred to as variable local outlier factor (VLOF). Firstly, the neighbors of a sample are determined in training dataset using Euclidean distance. Then, Euclidean distance between data pairs is transformed into distance vector, with which vector-form reachability distance and LOF vector can be sequentially calculated. While monitoring multimode process, a fused monitoring statistic is constructed by combining whole LOF values of total variables. Finally, simulation on the Tennessee Eastman (TE) process is used to demonstrate the superiority of the proposed method. Index Terms—multimode process, fault detection, local
outlier factor, local variable information.
W
I. INTRODUCTION
ITH the increasing complexity and scale of modern industries, timely fault detection technology has played an important role in ensuring process safety [1]-[3]. As sensor networks and distributed control systems have been extensively applied in modern industrial process, a huge amount of process data is available. Therefore, datadriven fault detection methods especially multivariate statistical process monitoring (MSPM) methods have progressed rapidly [4]-[6]. Although traditional MSPM monitoring methods have shown significant success in many cases, there are still some valuable problems deserving further investigation. The main problems are often caused by two issues of unimodal assumption and Gaussian distribution. Firstly, the multimodality of data distribution, which is caused by season alternant, demand fluctuation and raw materials changing, will make conventional MSPM methods inappropriate for multimode process monitoring. To monitor
Manuscript received March 24, 2017. This work was supported by the Natural Science Foundation of Shandong Province, China, under Grant ZR2014FL016; the National Natural Science Foundation of China under Grants 61403418 and 21606256; and the Postgraduate Innovation Funds of China University of Petroleum (East China), under Grant YCX2017058. L. Wang is with the college of Information and the control Engineering, China University of Petroleum, Qingdao 266580, China (e-mail:
[email protected]). X. Deng is with the college of Information and the control Engineering, China University of Petroleum, Qingdao 266580, China (phone: 053286983472; fax: 0532-86981335; e-mail:
[email protected]).
175
multimode process monitoring. Even though these extensions of LOF-based methods have shown their superiorities, there are still some valuable issues deserving further investigation, with the question of how to mine local variable information within LOF calculation. Traditionally, LOF value of a sample is computed with its neighbors in normal training dataset, where the Euclidean distance between data pair reflects the distance of two sample points. Therefore, local variable information may be submerged within conventional LOF computation. To develop an efficient monitoring method for complex process with multimodality and within-mode uncertainty of data distribution, a novel variable local outlier factor (VLOF) monitoring method is proposed in this paper. Firstly, neighbors of a test sample or training sample are located in training dataset using traditional Euclidean distance. Then, to mine local variable information, Euclidean distance between data pairs is transformed into vector form. Thereby vector-form reachability distance can be obtained and the LOF values of each variable can be further calculated. To get global monitoring statistics index, a fused monitoring statistics, called variable LOF (VLOF), is constructed by integrating LOF value of each variable. Finally, the Tennessee Eastman (TE) process is applied to evaluate the proposed method. The remainder of this paper is organized as follows. In section 2, traditional LOF is reviewed briefly. Section 3 gives a detailed explanation about VLOF method. Section 4 describes the VLOF-based fault detection procedure. Then, one simulation example is given to verify the proposed method in section 5. Finally, the conclusions are drawn in section 6. II. LOCAL OUTLIER FACTOR Firstly, a brief introduction of LOF is presented. For the given training dataset and the test sample , the LOF computing procedure is listed as [17]: 1) For the sample , find its nearest neighbors ( ) in using Euclidean distance, denoting as ( ) = [ , , ⋯ , ], where a prior parameter K represents the size of ( ). 2) For each neighbor sample (1 ≤ ≤ ), determine its ) and K distance. It is denoted as k_distance( represents the Euclidean distance between and its Kth nearest neighbor. 3) Obtain the reachability distance of the sample , defined as reach_d ( x, x f ) max ^k _ distance(x f ), d (x, x f )` (1)
5) Compute the local outlier factor (LOF) of sample , formulated by 1 K lrd( x f ) (3) LOF( x ) ¦ K f 1 lrd( x ) From the above procedure of LOF calculation, it can be seem that LOF indicates the degree of how isolated a sample is with respect to its surrounding neighbors. If sample is not an outlier, the LOF value would be approximately equal to 1. If is an outlier, the LOF value would be larger than 1, because the local reachability density of would be smaller than that of its neighbors. Therefore, LOF can identify if some sample is an outlier for the given training dataset without any data distribution assumption. III.
VARIABLE LOCAL OUTLIER FACTOR SCHEME
This section gives a motivation illustration and the detailed explanation about VLOF method. A. Motivation Because LOF is calculated using a sample’s neighbors in normal dataset and these neighbors are usually within one operating mode, it can be regarded as a local modeling strategy for multimode process monitoring. Besides, LOF is a density-type statistics with no special data distribution assumption. Therefore it can handle multimodality and distribution uncertainty simultaneously. However, it should be pointed out that tradition LOF calculation is not the best choice for process monitoring because some fault only involves specific variables, while tradition LOF puts all variables as a whole and may submerge the influence of local fault variables. Thus, it is necessary to emphasize the effects of different variables. To validate the above analysis, a simple illustration is designed as Fig. 1, where point is a testing sample and is its neighbor in normal condition, whereas and are neighbors of and , respectively. The dashed lines denote the normal boundaries of variables and . x2
C
d2
d2 B A
where ( , ) represents the Euclidean distance between and its f-th neighbor. 4) Compute the local reachability density (lrd) for the sample , which is expressed as K lrd( x ) K (2) ¦ reach _ d ( x, x f )
O
x1
D
d1
Fig. 1. Effect of local variable information in LOF calculation.
Simply, the LOF value of sample can be calculated as 1 < , which represents is within with a condition normal status. However, we can find that is abnormal in dimension, which cannot be reflected by traditional LOF.
f 1
176
If we transform the Euclidean distance between data pair into vector form (for example, distance vector between and is ( , ) = [ − , − ] ), then the reachability distance can be determined separately through dimension and dimension, like − , − reach_ ( , ) = [ case, the LOF expression of sample obtained as LOF( )
=[
VLOF( x )
] . Clearly,
IV.
As discussed above, local variable information can be highlighted by replacing traditional Euclidean distance with and ∈ , their its vector form. For two sample ∈ Euclidean distance is calculated as 2
m
¦(x
j
y j )2
(4)
j 1
By removing the summation in (4), then distance vector between and is obtained as d ( x , y )vec [( x1 y1 ) 2 , ( x2 y2 ) 2 , , ( xm ym ) 2 ]T (5) [d ( x1 , y1 ), d ( x2 , y2 ),
(10)
j lim
FAULT DETECTION BASED ON VLOF METHOD
This section presents the process monitoring framework based on the proposed VLOF method, and the flowchart is shown in Fig. 2. The complete procedure is summarized as follows, including offline modeling stage and online detection stage. Offline modeling stage: 1) Collect normal dataset , and scale it with its mean and standard variance. 2) Determine neighbors of each sample in using traditional Euclidean distance. 3) Use (5) to express distance vector between data pairs. 4) Calculate LOF vector of using (6)-(8). 5) Determine confidence limit for each variable’s LOF using KDE method. 6) Compute VLOF statistics using (10) and determine its confidence limit by KDE method. Online monitoring stage , and standardize it with 1) Collect a new data vector normal dataset . 2) Locate its neighbors in training dataset . and its 3) Use (5) to express distance vector between neighbors. using (6)-(8). 4) Calculate new LOF vector of 5) Compute new VLOF statistics using (10). 6) Detect a fault if the statistic exceeds its confidence limit.
B. VLOF method
x y
2
is the confidence limit of variable , where LOF( ) which can be obtained by kernel density estimation (KDE) method [20]. And the confidence limit of VLOF monitoring statistics is also determined by KDE method.
is large than 1 while the the first element of LOF( ) second element is equal to 1, which indicates that variable of is in abnormal condition and variable of is still in normal state. The results are in accordance with the reality. Therefore, local variable information should be emphasized and reflected within LOF computation.
d ( x, y )
· ¸¸ ¹
¦ LOF( x ) j 1
] . In this can be finally
,
§ LOF( x j ) ¨¨ ¦ j 1 © LOF( x j ) lim m LOF( x j ) m
, d ( xm , ym )]T
With the new expression of distance, the (1) is transformed into ª max{k _ distance( x1f ), d ( x1 , x1f ) º « » max{k _ distance( x2f ), d ( x2 , x2f ) » (6) reach _ d ( x, x f )vec « « » « f f » ¬« max{k _ distance( xm ), d ( xm , xm ) ¼» Then, the local reachability density can be calculated as K (7) lrd( x )vec K f ¦ reach _ d ( x, x )vec f 1
Finally, the vector-form LOF expression of a test sample is computed as 1 K lrd( x f )vec (8) LOF( x )vec ¦ K f 1 lrd( x )vec LOF( ) can be written as LOF( x)vec [LOF( x1 ), LOF( x2 ), , LOF( O ( xm )]T (9) Therefore, the novel LOF expression can reflects information of the local variables. While monitoring multimode process, a global statistic is constructed by integrating LOF values of the whole variables, expressed as
Offline modeling stage
Online monitoring stage
Normalized training data X
Normalized new data xnew
Determine neighbors for each sample
Locate its neighbors in training dataset
Obtain distance vectors between data pairs
Obtain distance vectors between xnew and its neighbors
Calculate LOF vector for each sample
Calculate new LOF vector
Determine confidence limit for each variable’s LOF using KDE method
Compute new VLOF statistics
Compute VLOF statistics and use KDE method to get its confidence limit
Detect a fault if the new statistic exceeds its confidence limit
Fig. 2. Process monitoring based on VLOF method.
177
V.
limits. To determine the local neighbors of a sample, the parameter K is set as 20 artificially. Fault 10 under mode 1 is first illustrated and monitoring charts of two methods are shown in Fig. 3 and Fig. 4, respectively. Obviously, traditional LOF method can hardly detect this fault and the fault detection rate (FDR) is only 4.25%. By contrast, the proposed VLOF method provides significantly better monitoring performance and it can alarm the fault at the 240th sample with 87.75% FDR. Fault 10 is caused by a random variation of C feed temperature, and the most corresponding variable (stripper temperature) is presented in Fig. 5. Clearly, the variable has strongly drastic fluctuations. However, within traditional LOF calculation, this faulty information is submerged by other variables, as shown in Fig. 3. But in VLOF scheme, the local variable information can be reflected and emphasized. Figure 6 shows the LOF values of stripper temperature, from which the superiority of the proposed VLOF method can be demonstrated.
CASE STUDY
4
8
3
6
VLOF
LOF
In this section, the proposed method is validated through the TE process [21], which has been widely used to evaluate different monitoring strategies. It consists of five major operation units, including a product condenser, a reactor, a recycle compressor, a vapor liquid separator, and a product stripper. According to the G/H mass ratios, there are six operating modes. In this study, mode 1 and mode 3 are used to generate multimode dataset with 500 normal data from each mode. And 9 manipulated variables and 22 continuous process variables are selected as monitored variables. A total of 20 faults are considered for monitoring evaluating. Each fault dataset consists of 1000 samples, where a fault is introduced at the 201th sample. More detailed descriptions about the monitored variables and 20 faults can be found in [22]. In this simulation, traditional LOF and the proposed VLOF methods are applied for process monitoring. And a 99% confidence level is adopted to calculate the confidence
2 1 0
2
0
200
400 600 Sample
800
0
1000
Fig. 3. Monitoring chart of fault 10 in mode 1 based on LOF method.
200
400 600 Sample
800
1000
LOF of stripper temperature
40
68 67 66 65 64
0
Fig. 4. Monitoring chart of fault 10 in mode 1 based on VLOF method.
69
Stripper temperature
4
0
200
400 600 Sample
800
30 20 10 0
1000
0
200
400 600 Sample
800
1000
Fig. 5. Stripper temperature of mode 1 under fault 10.
Fig. 6. LOF values of stripper temperature.
Fault 14 under mode 3 is also utilized to demonstrate the advantages of the proposed method. According to Fig. 7, traditional LOF method can detect this fault at the 202th sample with 82.5% FDR. However, it is clear that the LOF statistics fluctuate around their confidence limit, which would doubt the operators to make accurate judgments. In
contrast to LOF, the novel VLOF method can successfully detect this fault at the 202th point and with a high 99.88% FDR, as shown in Fig. 8. This is because VLOF method takes the local variable information into account, thereby the novel VLOF statistics can reflect faulty information of those variables corresponding to this fault. But in tradition LOF 178
By this fault case, the superiority of the proposed method is validated again.
8
8
6
6
VLOF
LOF
calculation, the Euclidean distance is computed to measure distance between two sample points, which useful variable information would be submerged by those useless variables.
4 2 0
4 2
0
200
400 600 Sample
800
0
1000
0
200
400 600 Sample
800
1000
Fig. 7. Monitoring chart of fault 14 in mode 3 based on LOF method.
Fig. 8. Monitoring chart of fault 14 in mode 3 based on VLOF method.
For all 20 programmed faulty datasets, the monitoring results of traditional LOF and VLOF methods are listed in table 1 (the best monitoring result of each fault is expressed as bold format). According to table 1, traditional LOF method could not give a satisfactory monitoring
performance for fault 10, 18, and 19 in mode 1, and fault 2, 10 in mode 3. By highlighting the local variable information, the proposed VLOF method obtains significant improvement with respect to these faults. Therefore, the superiority and feasibility of the VLOF method can be verified.
Table 1 fault detection rates of 20 faults in TE mode 1 and mode 3 Fault
Mode 1
Mode 3
LOF
VLOF
LOF
VLOF
1
99.88
99.88
98.63
99.75
2
99.25
99.38
60
98.13
3
1.25
2.75
1.50
5
4
100
100
100
100
5
1.13
2.75
95.0
100
6
100
100
100
100
7
100
100
100
100
8
98.38
98.75
98.25
98.5
9
1.63
6.63
7.88
21.88
10
4.25
85.75
7.13
87.63
11
94.75
98.88
90
98.13
12
20.13
47.63
97.13
99
13
97.5
97.5
86.25
96.25
14
90.88
99.5
82.5
99.88
15
1.13
3.25
1.63
2.38
16
1.13
2.63
1.75
2.63
17
87.63
96.13
75.63
95.88
18
6.26
72.5
87.38
90.13
19
17.75
98.88
73.38
99.5
20
93.75
93.88
74.25
96.88
Average
55.83
70.33
66.91
79.58
179
VI.
CONCLUSION
In this paper, a novel multimode process fault detection method based on variable local outlier factor is proposed. With this method, the multimodality and within-mode uncertainty of data distribution can be considered simultaneously. The major contribution of the proposed method is that local variable information can be reflected and emphasized in contrast to traditional LOF method. By transforming traditional Euclidean distance into vector form, information of single variable would not be submerged by other variables. Thereby, the process status can be evaluated more accurately. Besides, this novel VLOF statistic may be applied in other LOF-based monitoring methods. REFERENCES [1] [2]
[3] [4] [5] [6] [7]
[8] [9]
Z. Ge, Z. Song, and F. Gao. “Review of recent research on data-based process monitoring,” Ind. Eng. Chem. Res., vol. 52, no.10, pp. 35433562, 2013. Z. Geng, K. Yang, Y. Han, and X. Gu. “Fault detection of large-scale process control system with higher-order statistical and interpretative structural model,” Chin. J. Chem. Eng., vol. 23, no. 1, pp. 146-153, 2015. Y. Liu, Y. Pan, Q. Wang, and D. Huang. “Statistical process monitoring with integration of data projection and one-class classification,” Chemom. Intell. Lab. Syst., vol. 149, pp. 1-11, 2015. Yin, S., S. Ding, X. Xie, and H. Luo. “A review on basic data-driven approaches for industrial process monitoring,” IEEE Trans. Ind. Electron., vol. 61, no. 11, pp. 6418-6428, 2014. J. Dong, K. Zhang, Y. Huang, and K. Peng. “Adaptive total PLS based quality-relevant process monitoring with application to the Tennessee Eastman process,” Neurocomputing, vol. 154, pp. 77-85, 2015. Y. Xu and X. Deng. “Fault detection of multimode non-Gaussian dynamic Bayesian independent component analysis,” Neurocomputing, vol. 200, pp. 70-79, 2016. S. Zhao, J. Zhang, and Y. Xu. “Monitoring of processes with multiple operating modes through multiple principle component analysis models,” Ind. Eng. Chem. Res., vol. 43, no. 22, 7025-7035, 2004. Z. Ge, Z. Song. “Multimode process monitoring based on Bayesian method,” J. Chemom., vol. 23, no. 12, 636-650, 2009. Q. He, J. Wang. “Fault detection using the k-nearest neighbor rule for semiconductor manufacturing processes,” IEEE Transactions on Semiconductor Manufacturing, vol. 20, no. 4, pp. 345-354, 2007.
180
[10] X. Deng and X. Tian. “Multimode process fault detection using local neighborhood similarity analysis,” Chin. J. Chem. Eng., vol. 22, no. 11, pp. 1260-1267, 2014. [11] H. Ma, Y. Hu, and H. Shi. “A novel local neighborhood standardization strategy and its application in fault detection of multimode processes,” Chemom. Intell. Lab. Syst., vol. 118, no. 7, pp. 287–300, 2012. [12] X. Xie and H. Shi. “Dynamic multimode process modeling and monitoring using adaptive Gaussian mixture models,” Ind. Eng. Chem. Res., vol. 51, no. 15, pp. 5497-5505, 2012. [13] J. Yu. “Fault detection using principal components-based Gaussian mixture model for semiconductor manufacturing processes,” IEEE Transactions on Semiconductor Manufacturing, vol. 24, no. 3, pp. 432-444, 2011. [14] Z. Ge and Z. Song. “Process monitoring based on independent component analysis−principal component analysis (ICA−PCA) and similarity factors,” Ind. Eng. Chem. Res., vol. 46, no. 7, pp. 2054-2063, 2007. [15] X. Liu, L. Xie, U. Kruger, and S. Wang. “Statistical-based monitoring of multivariate non-Gaussian systems,” AlChE J., vol. 54, no. 9, pp. 2379–2391, 2008. [16] C. Zhao, F. Gao, and F. Wang. “Nonlinear batch process monitoring using phase-based kernel-independent component analysis−principal component analysis (KICA−PCA),” Ind. Eng. Chem. Res., vol. 48, no. 20, pp. 9163-9174, 2009. [17] H. Ma, Y. Hu, and H. Shi. “Fault detection and identification based on the neighborhood standardized local outlier factor method,” Ind. Eng. Chem. Res., vol. 52, no. 6, pp. 2389–2402, 2013. [18] Y. Ma, H. Shi, H. Ma, and M. Wang. “Dynamic process monitoring using adaptive local outlier factor,” Chemom. Intell. Lab. Syst., vol. 127, no. 18, pp. 89-101, 2013. [19] B. Song, H. Shi, Y. Ma, and J. Wang. “Multisubspace principal component analysis with local outlier factor for multimode process monitoring,” Ind. Eng. Chem. Res., vol. 53, no. 42, pp. 16453-16464, 2014. [20] X. Deng, X. Tian, and S. Chen. “Modified kernel principal component analysis based on local structure analysis and its application to nonlinear process fault diagnosis,” Chemom. Intell. Lab. Syst., vol. 127, no. 16, pp. 195-209, 2013. [21] J. J. Downs and E. F. Vogel. “A plant-wide industrial process control problem,” Comput. Chem. Eng., vol. 17, no. 3, pp. 245-255, 1993. [22] Q. Jiang and X. Yan. “Monitoring multi-mode plant-wide processes by using mutual information-based multi-block PCA, joint probability, and Bayesian inference,” Chemom. Intell. Lab. Syst., vol. 136, no. 9, pp. 121-137, 2014.