Multimode Process Fault Detection Based on Local Density Ratio

Article pubs.acs.org/IECR

Multimode Process Fault Detection Based on Local Density Ratio-Weighted Support Vector Data Description Han Li, Huangang Wang,* and Wenhui Fan Department of Automation, Tsinghua University, Beijing 100084, China

ABSTRACT: Industrial process fault detection plays an important role for process security, and data-based process monitoring is an effective way to detect fault. Industrial processes typically have multiple operating modes with different data distribution and outliers. This paper proposes a novel multimode process fault detection method based on local density ratio-weighted support vector data description (LDR-wSVDD) to address the multimodal process monitoring with different density and outliers in training samples. By considering both global density distribution and local data structure, this method provides a weight for each training sample based on its density information to make sure that the outliers with lower weight have less influence on the normal sample model. Meanwhile, it maintains the efficiency of SVDD single hypersphere model for multimode processes. The feasibility and validity of the LDR-wSVDD approach for multimode multidensity process monitoring are demonstrated through a numerical example and Tennessee Eastman benchmark process. solutions are developed. The first one is building multiple models for each mode. One issue to be addressed is how to divide the training data into different modes using appropriate means such as clustering.2−5 The other issue is how to decide the final result from different models. If only one model is chosen, a proper measurement should be constructed.6−10 If all models can be retained, the results should be integrated according to their reliabilities such as Bayesian inference.3,11−15 The second solution is adaptive methods. The researchers revised adaptive methods developed for time varying characteristic to fit multimodal processes.16−19 The last solution is creating one single global model such as the improved super PCA model.20,21 Ma et al.22 present a new local outlier factor method based on neighborhood standardization strategy. Song et al.23 acquire low dimensional global characterization of a new defined block-wise matrix with multimodality labels by preserving neighborhood structure to deal with the multimode problem. For the latter problem, some probabilistic models have been built considering characteristics of the transition period instead of hard partition of the process condition. For example, Yu et al.13 apply the finite

1. INTRODUCTION In modern industry, process safety and product quality are of great concern to both managers and customers. If processes are monitored online and the faults are detected timely, diagnosed accurately, and isolated exactly, the product quality will be increased as the cost decreases. Process monitoring is an effective way for process security, which can be divided into three categories: model-based method, knowledge-based methods, and data-based methods.1 In contract to the first two methods, the data-based methods have no requirement of the priori process model and associated expert knowledge. Moreover, with the rapid development of information technology, abundant data can be collected and recorded and new data mining approaches can be provided for process monitoring, which makes the databased process monitoring methods attract more popular attention in recent years. Because of disturbances of operating environment, changes of market demand and manufacturing strategies, process operation conditions are often switched from one operation mode to another. The processes have multiple modes, and the data of different modes have distinctly different means, variances, covariances, or density. The multimodality of process data distribution results in two main problems. One is how to model the multimode process, the other is how to model the transition period among different modes. For the former problem, three © XXXX American Chemical Society

Received: August 30, 2016 Revised: January 18, 2017 Accepted: February 7, 2017

A

DOI: 10.1021/acs.iecr.6b03306 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research Table 1. LDR-wSVDD Algorithm Input:

the training set

Method: Step 1:

Calculate the Euclidean distance of every data point pair.

Step 2:

Choose a parameter p and calculate the density for each sample according to eq 12.

Step 3:

Let k = p × N and compute the local density ratio based on eq 14; then the weight is obtained according to eq 15.

Step 4:

Substitute the weight into eq 6 and solve the weighted SVDD problem to obtain the decision model as eq 3 and eq 4.

Output:

the decision function f(z)

Gaussian mixture model (GMM) and Bayesian inference strategy to multimodal process monitoring. Ge et al.3 extract the feature with a two-step independent component analysis-principal component analysis (ICA-PCA) strategy based on Bayesian inference. Research15 introduces factor analysis into the maximum-likelihood framework for multimode process monitoring. Reserach14 extends probabilistic PCA to a mixture form to address the multimode problem. Ma et al.24 presents aligned mixture factor monitoring method by dividing and integrating step, preserving both the with-mode and cross-mode correlations. Compared with conventional multivariable statistical process monitoring (MSPM) methods, the support vector data description (SVDD) method based on the idea of one-class classification is more suitable for multimodal process monitoring because it constructs a single global model, simplifying the modeling and monitoring procedure. Ge et al.25 succeed in employing SVDD model on multimode batch process. However, SVDD is insensible to outlier and density because it is the normal data boundary or domain but not the class density that has been described. When the training data are contaminated by outliers due to interference of the external environment or the internal sensor capability, the SVDD model could become overfitting and its detection performance could become deteriorated. To handle this problem, the weighted-SVDD method has been developed. For each training sample, a different weight is employed to evaluate the degree of belonging to normal samples. The smaller the weight is, the more likely the sample is an outlier, and the less it influences SVDD model. Earlier studies generally concentrated on two kinds of way of calculating the weight. One way is based on the distance between a data point and the center of all the samples, which reflects the global data distribution. Class center in kernel space is used to compute confidence score26 or position regularized weighting.27 Yin et al.28 employ the total square loss distance to measure distances between points and constructed center. The other way is on the basis of some form of density. The nearest neighborhood and Parzen window approaches are respectively introduced to measure density.29 The distances between points and k nearest neighbors are used to calculate density in the form of exponential function30 or of opposite of ratio.31 Besides, Liu et al.32 presents k-means clustering and kernel LOF-based method to generate the likelihood value. These methods establish weight using k nearest neighborhood or local outlier factor which represents the local data structure. The work published in Science in 2014 applies a cutoff distance-based local density for clustering.33 By applying this kind of density for each sample, Chen et al.34 proposea robust-SVDD for outlier detection with noise or uncertain data. It is worthwhile to note that the above methods take either global or local data distribution characteristics into account

Figure 1. Decision boundary on the toy data set of (a) SVDD, (b) RSVDD, and (c) LDR-wSVDD.

individually to calculate the weight. In fact, in the condition of the multimodal process, the data of different modes have distinctly different means, variances, and covariances,24 and the data distribution and density of multimodal industrial processes vary obviously. If only local density distribution is considered, the B


Article

Industrial & Engineering Chemistry Research

Figure 2. Weight of each point based on (a) RSVDD and (b) LDR-wSVDD.

The rest of this article is organized as follows: In section 2 we briefly describe the original SVDD, weighted SVDD. The proposed LDR-wSVDD multimode fault detection method and an illustrating example are introduced in detail in section 3. The results and discussion of two cases are presented in section 4. Finally, we draw some conclusions in section 5.

model may be typical of a certain submode, whereas if only global density information is considered, the differences among different modes would be ignored and the model cannot obtain a satisfactory monitoring performance either. Combining global density distribution with local data structure based on the k nearest neighborhood, this paper proposes a novel robust local density ratio-weighted SVDD (LDR-wSVDD) process monitoring method. First, the weight for each training sample is calculated on the basis of the local density ratio between points and the mean density of the k nearest neighborhoods. This step can mitigate the effect of outliers in the training data set toward to SVDD, especially that of outliers around normal samples with higher density. Then weighted-SVDD optimization problem is solved to get a single global normal samples boundary model. Finally, the model is exploited to monitor multimodal processes with different densities and outliers in the training data set.

2. PRELIMINARIES Support Vector Data Description (SVDD). The support vector data description (SVDD), proposed by Tax and Duin35 aims to construct a minimum-volume hypersphere to enclose normal data samples in a high dimensional feature space as many as possible. The points outside the hypersphere are deemed outliers. Given the training data set xi ∈ d × 1(i = 1, ..., N ) and a nonlinear transformation function Φ(·): xi → Φ(xi), we map data from the original space to a higher feature space. The primal optimization problem is C


Article


Figure 3. Data distribution of (a) case 1, (b) case 2, (c) case 3, and (d) case 4.

Table 2. Fault Detection Rate, False Alarm Rate, and g-Mean of Different Methods in Cases 1 and 2 case 1 SVDD RSVDD LDR-wSVDD

upper bound on the fraction of misclassified samples and lower bound on the fraction of the support vectors.36 By introduction of Lagrange multipliers α = [α1, ..., αN]T, the corresponding dual problem of eq 1 is

case 2

FDR

FAR

g-mean

FDR

FAR

g-mean

0.965 1.000 1.000

0.010 0.035 0.015

0.977 0.982 0.992

0.733 0.758 0.753

0.015 0.045 0.020

0.849 0.851 0.859

N

max ∑ αiK (x i,x i) − α

s.t.

N 2

min R + C ∑ ξi

a, R , ξ

s.t.

i=1 j=1

i = 1, ..., N

∑ αi = 1 i=1

∥Φ(x i) − a∥2 ≤ R2 + ξi i = 1, ..., N

0 ≤ αi ≤ C

N

N

i=1

ξi ≥ 0

i=1

N

∑ ∑ αiαjK (x i,x i)

(2)

where K(xi,xj) = ⟨Φ(xi), Φ(xj) ⟩ is defined as kernel function. ⎛ ∥ x i − x j ∥2 ⎞ Gaussian kernel function K(xi,xj) = exp⎜ − σ 2 ⎟ is chosen ⎝ ⎠ in this paper, which can approximate most kernel functions. Solve the dual problem and get the optimal solution α*. The training data with α*i > 0 are called support vectors (SVs), whose subscripts constitute set SV = {i|αi* > 0, i = 1, ..., N }. Then the center and radius of the hypersphere are

(1)

where a and R are denoted as the center and radius of the hypersphere, respectively. The slack variable ξi is introduced to allow outliers outside the hypersphere. The penalty parameter C > 0 controls the trade-off between volume and misclassified samples, which can be replaced by 1/(νN), where ν indicates D


Article

Industrial & Engineering Chemistry Research Table 3. Fault Detection Rate, False Alarm Rate, and g-Mean of Different Methods in Cases 3 and 4 case 3

K (z,z) − 2

case 4

FAR

g-mean

FDR

FAR

g-mean

SVDD

1.000

0.055

0.972

0.815

0.048

0.881

RSVDD

1.000

0.163

0.915

0.815

0.173

0.821

LDR-wSVDD

1.000

0.045

0.977

0.818

0.040

0.886

∑

i ∈ SV

αi*K (x s,x i) +

∑ ∑

∑ ∑

αi*αj*K (x i,x j)

i ∈ SV j ∈ SV

The decision function is f (z) = Dist(z) − R

(5)

If f(z) > 0, i.e., Dist(z) > R, then z is considered as a fault sample; otherwise, it is a normal one. Weighted-SVDD. It is shown in eq 1 that the existence of the slack variable ξi allows some data points to locate outside the hypersphere, and the penalty parameter C controls the number of points outside. The smaller the penalty parameter is, the higher the possibility of points to be assigned outside the hypersphere is. Nevertheless, the penalties of all the training samples in the original SVDD optimization problem are equal, which leads the boundary of normal samples to shift toward outliers. To reduce the effect of outliers, the weighted-SVDD

i ∈ SV

∑

αi*K (z,x i) +

(4)

αi*Φ(x i)

R= K (x s,x s) − 2

∑ i ∈ SV

FDR

a=

Dist(z) =

αi*αj*K (x i,x j)

i ∈ SV j ∈ SV

(3)

For a new test sample z, the distance from itself to the center in feature space is

Figure 4. Weight of each point based on (a) RSVDD and (b) LDR-wSVDD. E


Article


Figure 5. Monitoring results on case 1 of (a) SVDD, (b) RSVDD, and (c) LDR-wSVDD.

the same solver in eq 2. Denote α* as optimal solutions and the training data with αi* > 0 are also called support vectors (SVs), whose subscripts constitute set SV = {i|αi* > 0, i = 1, ..., N }. The center and radius of the hypersphere are calculated by eq 3, and the distance between a new test sample and center is calculated by eq 4. The decision strategy of a sample to be a normal or fault one is not modified; see eq 5.

method introduces weight w(xi) for each training sample xi. The more likely the sample is an outlier, the smaller the weight is, and the higher possibility of this points to be classified outside hypersphere. The primal optimization problem of weighted-SVDD is N

min R2 + C ∑ w(x i)ξi

a, R , ξ

s.t.

i=1

∥Φ(x i) − a∥2 ≤ R2 + ξi ξi ≥ 0

i = 1, ..., N

3. PROPOSED METHOD Density-Based Weighted-SVDD. Density is a valid indicator reflecting the data distribution that is common in weight calculating. Rodriguez and Laio33 propose a new method of calculating local density. d(xi,xj) is defined as the Euclidean distance between data point xi and xj, as follows:

(6)

Similarly, by introducing Lagrange multipliers α = [α1, ..., αN]T, the corresponding dual optimization problem is N

max ∑ αiK (x i,x i) − α

s.t.

i=1

N

N

∑ ∑ αiαjK(x i,x i) i=1 j=1

0 ≤ αi ≤ w(x i)C

d(x i,x j) = ∥x i − x j∥2

i = 1, ..., N

N

∑ αi = 1 i=1

(8)

Let ρi denote the local density of data point xi, 1 ≤ i ≤ N; calculation is expressed as follows:

(7)

In contrast to eq 2, the upper bound of αi in the inequality constraint in eq 7, has changed from C to w(xi)C. Equation 7 is still a quadratic optimization problem; thus, it can be solved by

ρi =

∑ χ(d(x i,x j) − dc) j

F

1 ≤ j ≤ N, j ≠ i (9) DOI: 10.1021/acs.iecr.6b03306 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article


Figure 6. Monitoring results on case 2 of (a) SVDD, (b) RSVDD and (c) LDR-wSVDD.

⎧1 x ≤ 0 χ (x ) = ⎨ ⎩0 x > 0

It should be noted that dc is selected from a set D that contains all the distances between every data point. Therefore, the difference between the density in eq 9 or eq 12 of different points suggests the distinction of density in the view of global data structure. In other words, the calculation of density does not take local data structure into consideration. We will improve the density calculation in the next Section. Local Density Ratio-Weighted SVDD (LDR-wSVDD). As mentioned above, the local density in eq 12 cannot characterize the local data structure, especially when there are multiple modes with different densities and outliers in the training data set. The local densities of normal samples with lower density and of the outliers around the higher density data area are similar. To widen the gap of weight between normal points and outliers in the training data set, we propose a measurement named local density ratio (LDR). d(xi,xj) is still defined as the Euclidean distance, and NNk(xi) is defined as the kth nearest neighborhood of point xi; then kNN(xi), the neighborhood set of xi, is denoted as follows:

(10)

where dc is a cutoff distance adjustable by a percentage parameter p. It is calculated as eq 11. Define D as a sorted set of size Nd where distances of every two data points are sorted in ascending order. dc is the p × Ndth element in the set D. dc indicates the average number of neighbors around a data sample. dc = dp × Nd D = {dk|sort d(x i,x j) in ascending order, 1 ≤ i , j ≤ N , i ≠ j} (11)

There is another local density computing method in the code presented by Rodriguez and Laio, which uses Gaussian function to handle nonlinear cases. The expression is eq 12 ⎛ d(x ,x )2 ⎞ i j ⎟ ρi = ∑ exp⎜⎜ − ⎟ 2 d ⎝ ⎠ c j

1 ≤ j ≤ N, j ≠ i (12)

34

kNN (x i) = {j ∈ X |d(x i,x j) ≤ d(x i,NNk(x i))}

Chen et al. uses the density in eq 12 after mapping it into range from 0 to 1 as weight. This method will be referred to as RSVDD hereafter.

(13)

Then, the LDR is constructed as follows: G


Article



ri =

ρi 1

∑

NkNN(x i)

x j ∈ kNN (x i)

Then we substitute the LDR-based weight into weight-SVDD algorithm framework shown in eq 6 and solve the optimization problem using the original solver. In conclusion, the procedure of the LDR-wSVDD method is given in Table 1. A multimode example is used here to illustrate the advantage of our method. First 100 blue banana-shaped normal samples with lower density and 100 green ellipse-shaped normal samples with higher density are generated. Then three red outliers are added to the training samples. The original SVDD, RSVDD,34 and our proposed LDR-wSVDD model are trained on these samples. Figure 1 shows the boundary of normal samples depicted by a black curve, and black circles are denoted as the support vectors. For a global visual presentation, the boundary of the LDR-wSVDD method encloses the normal samples more properly than the other two methods and all the outliers with our method are excluded from the boundary. The boundary of normal samples expands to outliers in the original SVDD although it encloses all the normal ones, because all the training samples share the same penalty, and outliers have a negative effect on the decision boundary. Though the RSVDD method gives different weights for normal samples and outliers, and the

ρj (14)

where the calculation of ρi is given as eq 12 and k is computed as the p percentage of the total number of training data N. Actually, the LDR is the ratio of the density of a data point to the mean density of its k nearest neighborhoods. When the point is a normal one, the densities of itself and its neighborhoods are similar and the LDR is close to 1. Otherwise, when the point is an outlier, the density of itself is smaller than that of its neighborhood; thus, the LDR is close to 0. In the case where outliers in the training data set are near the normal samples with higher density, the LDR is still much small, even smaller than that of normal sample with lower density. By this means, outliers become less important to the training data and the gap between outliers and the normal sample is widened. To make the weight fall within the range from 0 to 1, the weight of each sample is normalized as follows: w(x i) =

ri max(rj) j

(15) H


Article



decision boundary no longer shifts to outliers, it excludes 14 normal samples in the lower density mode whereas our proposed method excludes only 4 normal ones in that area. Figure 2a shows the weight of each training point based on RSVDD with corresponding color. It can be seen that the densities of bananashaped samples, especially those asterisk-shaped ones, are apparently lower than that of the ellipse-shaped ones and even similar to that of outliers that are around higher density area, because the RSVDD calculates the density-based weight only by using global density structure. It tends to eliminate the influence of outliers but ignores the different density between different modes; thus, it misses more normal ones in lower density mode. In contrast, as shown in Figure 2b, the weights of normal samples distribute similarly and are distinctly higher than that of outliers no matter what the density the normal sample is, because our LDR-wSVDD method takes both global and local data structure into account. As a result, our proposed method not only represses the influence of outliers in the training data set but also ensures a suitable decision boundary for normal ones regardless of density distribution. Complexity Analysis. Given the number of training sets N, for the proposed algorithm shown in Table 1, the time complexity

in Step 1 of calculating the distance of every data point pair is O(N2), and the computational complexity of sorting the distances is O(N log N). In Step 3, the local density ratio calculating is also based on distances between points, so no extra complexity with higher order of magnitude will be generated. In Step 4, the time complexity of the quadratic programming for the weighted-SVDD optimization problem is O(N3). So the total calculation complexity is O(N2) + O(N log N) + O(N3) ∼ O(N3), which is the same as that of the conventional SVDD method.

4. CASE STUDIES In this section, a numerical example and Tennessee Eastman process are conducted to test the performance of the proposed method, and to compare it with the original SVDD method and RSVDD method.34 To compare these methods in the similar standard, a Gaussian function is chosen as the kernel function. All the training samples are standardized into zero mean and unit deviation for each variable before the training model; then test samples are normalized with the mean and deviation of the training data set. Numerical Example. A numerical example that was suggested by Ma et al.22 is redesigned to fit the specific test I


Article


Figure 9. Performance analysis of g-mean with different penalty paramter C with the RSVDD and LDR-wSVDD methods compared with the SVDD method in (a) case 1, (b) case 2, (c) case 3, and (d) case 4.

purpose in this article. The simulation data contain five variables and are generated by the following equations.

A total of 400 samples of each mode are generated for training with 20 artificial outliers added to them, and the training data set consists of 820 samples in all. To compare the methods in the same standard, we set 0 as the threshold. If the decision value is more than 0, the sample is considered as a fault. To illustrate the feasibility and utility of the proposed LDR-wSVDD method, four fault test cases are designed by the following rules. Each of the test data set has 800 samples where the first 400 sample are from normal operating condition and the last 400 samples are in fault operating condition.

x1 = 0.5768s1 + 0.3766s2 + e1 x 2 = 0.7382s12 + 0.0566s2 + e 2 x3 = 0.8291s1 + 0.4009s2 2 + e3 x4 = 0.6519s1s2 + 0.2070s2 2 + e4 x5 = 0.3972s1 + 0.8045s2 + e3

(16)

where [s1 s2]T are two data sources following uniform distribution and Gaussian distribution respectively, and [e1 e2 e3 e4 e5]T are white noise with zero mean and standard deviation of 0.01. In this experiment, two different operation modes are simulated from the following data sources:

Case 1: The system runs in mode 2, then a step change of 6.2 occurs in x5 at the 401th sample until the end. Case 2: The system runs in mode 2, then a drifting error of 0.02 × (k − 400) is introduced into x1, where k indicates the sample number. Case 3: The system runs in mode 1, then a step change of 4 occurs in x5 at the 401th sample until the end.

mode 1: s1 ∼ U ( −10, −7); s2 ∼ N ( −15, 1) mode 2: s1 ∼ U (2, 5); s2 ∼ N (7, 1) J


Article


Figure 10. Flow sheet of TEP.

measurement of both the FDR and FAR in four cases. The other two methods cannot achieve good consequence for both modes simultaneously. For case 1, the SVDD method misses about 3.5% fault samples, with the result that it obtains the smallest g-mean value of the three methods. The LDR-wSVDD method detects all the fault samples and almost 98.5% normal samples, which leads to the highest g-mean value. In comparison, although the FDR in the RSVDD method is also up to 1, the FAR is more than 3%, which leads to a lower g-mean value. For case 2, both RSVDD and LDRwSVDD detect more than 75% faults in the test set, better than the FDR or SVDD method. Moreover, the FAR in LDR-wSVDD is 0.02, significantly lower than 0.045 in the RSVDD method, resulting in the highest g-mean value of all. For case 3, all the three methods do not miss any fault samples. However, the FAR of the RSVDD method is more than 15%, worsening the overall g-mean value. Instead, the FAR of the other methods is less than 10%, particularly less than 5% in the proposed method. As a whole, the LDR-wSVDD method performances better. For case 4, in the same way, the FDR of the three method surpasses 81%; nevertheless, the FAR of the RSVDD method still exceeds 17%. In contrast, the other methods misclassify no more than 10% normal samples, especially 4% of the proposed method. As a result, LDR-wSVDD method shows the best performance. The reason for these result is that, for the SVDD method, the penalty parameter is a constant that treats every training sample equally no matter its density. However, when outliers are mixed into the training set, the changeless penalty parameter makes normal samples and outliers in the training data set ambiguous. As a consequence, the SVDD method encompasses fault samples around the higher density normal mode to encompass the lower density normal mode properly. For the RSVDD method, there is

Table 4. Six Operation Modes of TEP mode

G:H mass ratio

production (kg/h)

1 2 3 4 5 6

50:50 10:90 90:10 50:50 10:90 90:10

14076 14076 11111 maximum maximum maximum

Case 4: The system runs in mode 1, then a drifting error of 0.02 × (k − 400) is introduced into x1, where k indicates the sample number. The distributions of the four fault cases are shown as Figure 3. In the SVDD method, because the fraction of outliers in the training data set is about 2.5%, the penalty parameter C is set to 1/(0.025 × N) = 0.05. The Gaussian kernel parameter plays the same role in the three methods, which controls the mapping from original space to feature space. The method proposed by Xiao et al.37 is used in this paper to select the Gaussian kernel width parameter. In the RSVDD method and LDR-wSVDD method, the value of constant C is 0.15. The parameters p of two methods are both equal to 0.05. In addition to the false detection rate (FDR) and false alarm rate (FAR), a g-mean is also a statistic to evaluate the overall performance of fault detection algorithm.38 The g-mean is constructed as FDR·(1 − FAR) . The FDR, FAR, and g-mean of three methods for four test data sets are listed in Tables 2 and 3 with the best results marked in boldface. On the whole, the LDR-wSVDD method has a better performance of the highest g-mean value, which means an overall K


Article


Figure 4a presents the density-based weight of each training sample in the RSVDD method. It is obvious to find that the density of mode 1 is lower than that of mode 2 on the whole, and the density of outliers is similar to that of mode 1. There is no distinct difference between the lower density normal samples and outliers. Hence the decision boundary is tight for the lower density normal mode, which explains why the RSVDD method misclassifies more normal samples in the lower density mode. As a contrast, as shown in Figure 4b, for the LDR-wSVDD method, the weight of two normal modes distributes similarly and the weights of most outliers are smaller than the smallest ones of normal modes, for the reason that this method extracts the local data structure information. LDR-wSVDD treats normal samples and outliers differently and makes the normal samples with different density equally important in the training set. On balance, LDR-wSVDD gets a better monitoring result for both the high density mode and low density mode with outliers in the training set. The monitoring results of the three methods for four cases are shown in Figures 5−8. For case 1, the SVDD method missses a few fault ones whereas the other methods do not mistake any fault samples. Particularly, the LDR-wSVDD method encircles more normal samples, and the distance between the decision value and threshold for fault ones is larger than that from the RSVDD method. For case 2, the fault samples are no longer misidentified from the 521th sample in the LDR-wSVDD method, earlier than other two methods. For case 3, SVDD and our proposed method distinguish more normal ones than the RSVDD method. What’s more, the minimum distance between the decision value and threshold for fault ones reaches 0.017 in our method, higher than 0.008 in the SVDD method, which remains a larger margin for potential fault detection. For case 4, it can be seen that more normal ones are identified in SVDD and our proposed method than in RSVDD. In addition, our method inspects the drifting error at the 501th sample, earlier than for the SVDD and RSVDD methods. In a summary, the proposed method provides enough margin and earlier time for detecting fault samples and maintains the good property in the false alarm rate at the same time. In the numerical experiment, performance of the g-mean value with different penalty parameters C in RSVDD and LDRwSVDD compared with the case for the SVDD method has been analyzed. In Figure 9 the blue and red curves, respectively, denote the g-mean value with parameter C from 0.01 to 0.5 for RSVDD and LDR-wSVDD, and the green square dots denote the g-mean value for SVDD with the penalty parameter C 0.05. For cases 1 and 2 in which fault occurs at a higher density normal mode, our proposed method and the RSVDD method gain advantage in the g-mean value compared with the case of SVDD, resulting from the effect of density weight. For cases 3 and 4 where the fault happens at a lower density normal mode, the g-mean values of our proposed method and the SVDD method are comparable, superior to those of the RSVDD method at any C value. In can be concluded that the proposed method shows superior robustness for penalty parameter C for fault detection regardless of the normal mode density. It functions well for multimodal process monitoring with multiple density and with outliers in the training set. In summary, through the above analysis, it is reasonable to believe that the proposed method achieves a better performance than the conventional SVDD and RSVDD methods for multimode process monitoring. Tennessee Eastman (TE) Benchmark Process. The Tennessee Eastman process is a benchmark simulation problem

Table 5. Monitored Variables in TEP no.

variable

unit

1

A feed (stream 1)

km3/h

2

D feed (stream 2)

kg/h

3

E feed (stream 3)

kg/h

4

total feed (stream 4)

km3/h

5

recycle flow (steam 8)

km3/h

6

reactor feed rate (stream 6)

km3/h

7

reactor pressure

kPa

8

reactor level

%

9

reactor temperature

°C

10

purge rate (stream 9)

km3/h

11

product separator temperature

°C

12

product separator level

%

13

product separator pressure

kPa

14

product separator underflow (stream 10)

m3/h

15

stripper level

%

16

stripper pressure

kPa

17

stripper underflow (stream 11)

m3/h

18

stripper temperature

°C

19

stripper steam flow

kg/h

20

compressor work

kW

21

reactor cooling water outlet temperature

°C

22

separator cooling water outlet temperature

°C

23

D feed

%

24

E feed

%

25

A feed

%

26

A + C feed

%

27

purge valve

%

28

separator valve

%

29

stripper valve

%

30

reactor coolant

%

31

condenser coolant

%

Table 6. Fault Types in TEP no.

description

1

A/C feed ratio, B composition constant (stream 4)

step

type

2

composition, A/C ratio constant (stream 4)

step

4

reactor cooling water inlet temperature

step

5

condenser cooling water inlet temperature

step

6

A feed loss (stream 1)

step

7

C header pressure loss-reduced availability (stream 4)

step

8

A, B, C feed composition (stream 4)

random variation

10

C feed temperature (stream 4)

random variation

11

reactor cooling water inlet temperature

random variation

12

condenser cooling water inlet temperature

random variation

13

reaction kinetics

slow drift

14

reactor cooling water valve

sticking

no denying the fact that normal samples and outliers in the training data set are treated differently by inducing the weight for each sample. However, it assigns the global density to each sample, not considering different densities in different modes. L


Article

Industrial & Engineering Chemistry Research Table 7. Fault Detection Rate, False Alarm Rate, and g-Mean in Mode 1 on TEP FDR fault

g-mean

FAR

SVDD

RSVDD

LDR-wSVDD

SVDD

RSVDD

LDR-wSVDD

SVDD

RSVDD

LDR-wSVDD

1

0.998

0.998

0.998

0.115

0.110

0.105

0.940

0.942

0.945

2

0.938

0.999

0.998

0.190

0.195

0.190

0.871

0.897

0.899

4

0.624

0.999

0.985

0.135

0.150

0.150

0.735

0.921

0.915

5

0.093

0.100

0.101

0.155

0.180

0.160

0.280

0.286

0.292

6

1.000

1.000

1.000

0.110

0.125

0.120

0.943

0.935

0.938

7

0.938

1.000

1.000

0.085

0.080

0.085

0.926

0.959

0.957

8

0.984

0.984

0.984

0.055

0.065

0.055

0.964

0.959

0.964

10

0.211

0.221

0.224

0.050

0.050

0.050

0.448

0.458

0.461

11

0.921

0.930

0.926

0.095

0.095

0.105

0.913

0.917

0.910

12

0.323

0.344

0.328

0.130

0.150

0.145

0.530

0.541

0.529

13

0.943

0.944

0.943

0.075

0.075

0.070

0.934

0.934

0.936

14

0.976

0.978

0.976

0.070

0.085

0.070

0.953

0.946

0.953

average

0.746

0.791

0.788

0.105

0.113

0.109

0.786

0.808

0.808

Table 8. Fault Detection Rate, False Alarm Rate, and g-Mean in Mode 3 on TEP FDR fault

g-mean

FAR

SVDD

RSVDD

LDR-wSVDD

SVDD

RSVDD

LDR-wSVDD

SVDD

RSVDD

LDR-wSVDD

1

0.995

0.998

0.995

0.320

0.475

0.300

0.823

0.724

0.835

2

0.993

0.996

0.993

0.310

0.480

0.305

0.828

0.720

0.831

4

0.999

0.999

0.999

0.430

0.590

0.410

0.755

0.640

0.768

5

1.000

1.000

1.000

0.405

0.510

0.400

0.771

0.700

0.775

6

1.000

1.000

1.000

0.460

0.635

0.450

0.735

0.604

0.742

7

1.000

1.000

1.000

0.360

0.525

0.360

0.800

0.689

0.800

8

0.991

0.996

0.991

0.410

0.610

0.420

0.765

0.623

0.758

10

0.661

0.776

0.654

0.390

0.570

0.390

0.635

0.578

0.631

11

0.960

0.970

0.954

0.425

0.590

0.405

0.743

0.631

0.753

12

0.991

0.995

0.991

0.465

0.630

0.470

0.728

0.607

0.725

13

0.925

0.956

0.924

0.435

0.625

0.430

0.723

0.599

0.726

14

0.976

0.990

0.973

0.380

0.560

0.385

0.778

0.660

0.773

average

0.958

0.973

0.956

0.399

0.567

0.394

0.757

0.648

0.760

in real chemical engineering first suggested by Downs and Vogel,39 which has been widely utilized to test the performance of process monitoring. The TE process consists of five major operation units: reactor, condenser, compressor, stripper, and separator. The flow sheet is shown as Figure 10. There are eight productions from A to H. According to different G/H mass ratios, the process appears in six operating modes, as listed in Table 4. Among several available simulation schemes, the decentralized control strategy designed by Ricker40 is used here to simulate mode 1 and mode 3 data. The Simulink program can be downloaded from http://depts.washington.edu/control/LARRY/ TE/download.html. The TE process contains 53 variables: 22 continuous process variables, 12 manipulated variables, and 19 composition measurements. Because during the simulation the steady-state values of the recycle value and steam value in mode 1 do not change, and the agitator rates in two modes are both 100, the three variables are not included in the monitored variables. The remaining 9 manipulated variables and 22 easily measured continuous variables are chosen for process monitoring, as listed in Table 5. The sampling interval is set to 3 min.

Twenty faults exist in this simulation platform. Faults 3, 9, and 15 are difficult to detect and faults 16−20 are unknown types; thus, they are not employed in this experiment. The detailed descriptions and types of the remaining 12 faults are given in Table 6. In data set preparation, 800 normal samples in mode 1 and 200 normal samples in mode 3 are collected, and 50 fault samples are mixed with them, so there are in all 1050 samples in the training samples. For a certain fault of each mode, the test data set contains 1000 samples, where the first 200 samples are normal and the fault happens at the 201th sample and lasts to the end. In the training stage, we utilize the approach proposed by Xiao el al.37 to select the Gaussian kernel width parameter in the three methods. In the SVDD method, the penalty parameter C = 1/(0.05 × N) = 0.02 corresponds to the 5% outliers in the training set, and in the RSVDD and LDR-wSVDD methods the parameter settings are C = 0.08 and p = 0.05. The fault detection rate, false alarm rate, and g-mean for each fault of modes 1 and 3 are gathered into Tables 7 and 8, with the average resultant FDR, FAR, and g-mean of the 12 faults are also listed in them. The best results are marked in boldface. In general, M


Article


Figure 11. Weight of each sample in (a) RSVDD and (b) LDR-wSVDD on TEP.

value is up to 0.760. It can be concluded that our proposed method is more suitable for monitoring faults around the lower density area. From the above results, the SVDD method is able to enclose normal samples regardless of the density. However, the influence of outliers is not eliminated because of the constant penalty parameter C. For the RSVDD method, it treats normal ones and outliers differently by inducing weight based on global density, so that the decision boundary excludes outliers better. But it ignores the different density, and as a result, it misclassifies many normal samples in the low density area. In a comparison, our proposed method takes both outlier effects and global-local density difference into consideration, thus encircling the samples more properly for both modes and eliminating the effect of outliers in the training set. When Tables 7 and 8 are compared, the three methods all have very low FARs in mode 1, but the FAR in mode 3 is high. The three methods encircle the normal samples more appropriately for mode 1 while tightly for mode 3, resulting in more misclassified normal ones. The reason for the high FAR in mode 3 is that, on one hand, in this TE process case, the density of mode 1

the LDR-wSVDD method has a better monitoring result on both modes. For mode 1, it can be observed that the LDR-wSVDD method achieves the 7 highest g-mean values of the 12 faults whereas RSVDD obtains 4 and SVDD obtains 3. The average g-mean value of our proposed method is 0.808, the same as for RSVDD and higher than that for SVDD. The weight based on density depresses the influence of outliers in the training set, resulting in better monitoring results for faults around the higher density mode than in the conventional SVDD method. In terms of the FDR, LDR-wSVDD gains the 6 highest of the 12 faults and RSVDD gains 6 whereas SVDD achieves 3. Hence, for fault monitoring around the higher density area, the weighted-SVDD methods have a good performance and our proposed method shows larger g-mean values. For mode 3, although the average FDRs of the three methods are all higher than 0.95, RSVDD misidentifies about 56.7% normal ones, apparently higher than the other two methods, which means the decision boundary of RSVDD is tight for the lower denstiy mode. According to the g-mean value, 8 of the 12 faults for the LDR-wSVDD method are higher than for the other two methods, and the average g-mean N


Article


Figure 12. Monitoring results for fault 7 in mode 1 of (a) SVDD, (b) RSVDD, and (c) LDR-wSVDD.

is high, and the density of mode 3 is low. Generally, the density of points located on the boundary of normal samples with lower density is much lower, and the SVDD-based methods may tend to classify these points outside boundary. On the other hand, the FAR for mode 3 is calculated by the total number of normal ones for mode 3, which is smaller than that of mode 1. The more misclassified normal ones and lower total number in the test set lead to the higher FAR. Figure 11 shows the weight of each training sample in the RSVDD and LDR-wSVDD methods. The normal samples in modes 1 and 3 are denoted as blue and green dots respectively, and the outliers are denoted as red dots. The Y-axis indicates the weight of each sample. It is easy to find that, in the RSVDD method, the density of mode 3 is strongly lower than that of mode 1 and most weights of mode 3 are around 0.1, which reduces their significance. However, the density of outliers is similar to the lower density of normal samples. The distances between the weight of outliers and normal samples with lower densities is not distinguished. As a comparison, in the LDRwSVDD method, the weights of outliers are all below the smallest value of normal ones, and the weight of both modes 1 and 3 distributes around 0.5 intensively. Benefiting from the weight

adjustment, the decision boundary is not easily apt to be affected by outliers, and our method demonstrates a good performance in monitoring results for both the higher and lower density modes. Specifically, the monitoring charts of the three methods for two modes are exhibited in Figures 12 and 13. Blue dots and green dots, respectively, indicate the decision value of normal samples for modes 1 and 3, and red ones indicate the fault ones. The black dashed line denotes the threshold. Take fault 7 in mode 1 as an example; the monitoring results are shown in Figure 12. The SVDD method mistakes a few fault samples as normal whereas the RSVDD and LDR-wSVDD methods detect all of the faults. For the LDR-wSVDD method, the minimum distance between the decision value and threshold for the fault sample is about 0.015, which is acceptable in practice. To give another example, Figure 13 shows the monitoring result of fault 7 in mode 3. Compared with the results for the RSVDD method, more normal samples are classified correctly in the SVDD method and LDR-wSVDD method. What is more, the LDRwSVDD method provides a 0.037 margin, the minimum distance between the fault decision value and the threshold for potential fault detection, higher than 0.029 in the SVDD method. To sum O


Article


Figure 13. Monitoring results for fault 7 in mode 3 of (a) SVDD, (b) RSVDD, and (c) LDR-wSVDD.

up, the LDR-wSVDD method acquires the characteristic of balancing between modes with different density distribution when outliers are mixed into the training set, comprehensively.

describe the boundary of the multimodal multidensity target class more appropriately and is no longer sensitive to outliers in the training data set. However, more efforts are needed to extract the local data structure in multimodal process monitoring. Moreover, it would be meaningful to extend this proposed method to batch process monitoring or other novelty detection applications in future studies.

5. CONCLUSIONS To address the problem of multimodal industrial process monitoring with outliers in the training data set, an LDR-wSVDD method is proposed in this article. First, the local density ratiobased weight for each sample is constructed by simultaneously considering the global density and local data structure. In this way, outliers are made less important for the next training model stage. Then the weight is introduced into the weight-SVDD optimization problem framework to solve the hypersphere boundary of normal samples, and outliers can be separated from normal samples. Finally, the decision boundary is applied to monitor multimodal process. A numerical example and Tennessee Eastman process experiments are designed to test the availability and efficiency of the proposed method. The results show the proposed method achieves better fault detection performance than other relevant methods regardless of the density of the different process modes. These experiments confirm that the proposed method is able to

■

AUTHOR INFORMATION

Corresponding Author

*H. Wang. E-mail: [email protected]. Phone: 86-1062781993. Fax: 86-10-62786911. ORCID

Huangang Wang: 0000-0002-7322-3446 Notes

The authors declare no competing financial interest.

■

ACKNOWLEDGMENTS This work was supported by the National Natural Science Foundation of China (No. 51575469). P


Article


■

(25) Ge, Z.; Gao, F.; Song, Z. Batch process monitoring based on support vector data description method. J. Process Control 2011, 21, 949−959. (26) Liu, B.; Xiao, Y.; Cao, L.; Hao, Z.; Deng, F. SVDD-based outlier detection on uncertain data. Knowledge and information systems 2013, 34, 597−618. (27) Wang, C.-D.; Lai, J. Position regularized support vector domain description. Pattern Recognition 2013, 46, 875−884. (28) Yin, S.; Zhu, X.; Jing, C. Fault detection based on a robust one class support vector machine. Neurocomputing 2014, 145, 263−268. (29) Lee, K.; Kim, D.-W.; Lee, K. H.; Lee, D. Density-induced support vector data description. IEEE Transactions on Neural Networks 2007, 18, 284−289. (30) Tian, J.; Gu, H.; Gao, C.; Lian, J. Local density one-class support vector machines for anomaly detection. Nonlinear Dynamics 2011, 64, 127−130. (31) Cha, M.; Kim, J. S.; Baek, J.-G. Density weighted support vector data description. Expert Systems with Applications 2014, 41, 3343−3350. (32) Liu, B.; Xiao, Y.; Philip, S. Y.; Hao, Z.; Cao, L. An efficient approach for outlier detection with imperfect data labels. IEEE Transactions on Knowledge and Data Engineering 2014, 26, 1602−1616. (33) Rodriguez, A.; Laio, A. Clustering by fast search and find of density peaks. Science 2014, 344, 1492−1496. (34) Chen, G.; Zhang, X.; Wang, Z. J.; Li, F. Robust support vector data description for outlier detection with noise or uncertain data. KnowledgeBased Systems 2015, 90, 129−137. (35) Tax, D. M.; Duin, R. P. Support vector data description. Machine learning 2004, 54, 45−66. (36) Schölkopf, B.; Platt, J. C.; Shawe-Taylor, J.; Smola, A. J.; Williamson, R. C. Estimating the support of a high-dimensional distribution. Neural computation 2001, 13, 1443−1471. (37) Xiao, Y.; Wang, H.; Zhang, L.; Xu, W. Two methods of selecting Gaussian kernel parameters for one-class SVM and their application to fault detection. Knowledge-Based Systems 2014, 59, 75−84. (38) Wu, M.; Ye, J. A small sphere and large margin approach for novelty detection using training data with outliers. IEEE Transactions on Pattern Analysis and Machine Intelligence 2009, 31, 2088−2092. (39) Downs, J. J.; Vogel, E. F. A plant-wide industrial process control problem. Comput. Chem. Eng. 1993, 17, 245−255. (40) Ricker, N. L. Decentralized control of the Tennessee Eastman challenge process. J. Process Control 1996, 6, 205−221.

REFERENCES

(1) Ge, Z.; Song, Z.; Gao, F. Review of recent research on data-based process monitoring. Ind. Eng. Chem. Res. 2013, 52, 3543−3562. (2) He, Q. P.; Wang, J. Fault detection using the k-nearest neighbor rule for semiconductor manufacturing processes. IEEE Transactions on Semiconductor Manufacturing 2007, 20, 345−354. (3) Ge, Z.; Song, Z. Multimode process monitoring based on Bayesian method. J. Chemom. 2009, 23, 636−650. (4) Srinivasan, R.; Wang, C.; Ho, W.; Lim, K. Dynamic principal component analysis based methodology for clustering process states in agile chemical plants. Ind. Eng. Chem. Res. 2004, 43, 2123−2139. (5) Zhu, Z.; Song, Z.; Palazoglu, A. Process pattern construction and multi-mode monitoring. J. Process Control 2012, 22, 247−262. (6) Zhao, S. J.; Zhang, J.; Xu, Y. M. Performance monitoring of processes with multiple operating modes through multiple PLS models. J. Process Control 2006, 16, 763−772. (7) Zhao, S. J.; Zhang, J.; Xu, Y. M. Monitoring of processes with multiple operating modes through multiple principle component analysis models. Ind. Eng. Chem. Res. 2004, 43, 7025−7035. (8) Natarajan, S.; Srinivasan, R. Multi-model based process condition monitoring of offshore oil and gas production process. Chem. Eng. Res. Des. 2010, 88, 572−591. (9) Ng, Y. S.; Srinivasan, R. An adjoined multi-model approach for monitoring batch and transient operations. Comput. Chem. Eng. 2009, 33, 887−902. (10) Feital, T.; Kruger, U.; Dutra, J.; Pinto, J. C.; Lima, E. L. Modeling and performance monitoring of multivariate multimodal processes. AIChE J. 2013, 59, 1557−1569. (11) Ge, Z.; Gao, F.; Song, Z. Two-dimensional Bayesian monitoring method for nonlinear multimode processes. Chem. Eng. Sci. 2011, 66, 5173−5183. (12) Yu, J.; Qin, S. J. Multimode process monitoring with Bayesian inference-based finite Gaussian mixture models. AIChE J. 2008, 54, 1811−1829. (13) Yu, J. A nonlinear kernel Gaussian mixture model based inferential monitoring approach for fault detection and diagnosis of chemical processes. Chem. Eng. Sci. 2012, 68, 506−519. (14) Ge, Z.; Song, Z. Mixture Bayesian regularization method of PPCA for multimode process monitoring. AIChE J. 2010, 56, 2838−2849. (15) Ge, Z.; Song, Z. Maximum-likelihood mixture factor analysis model and its application for process monitoring. Chemom. Intell. Lab. Syst. 2010, 102, 53−61. (16) Ge, Z.; Song, Z. Online monitoring of nonlinear multiple mode processes based on adaptive local model approach. Control Engineering Practice 2008, 16, 1427−1437. (17) Xie, X.; Shi, H. Dynamic multimode process modeling and monitoring using adaptive Gaussian mixture models. Ind. Eng. Chem. Res. 2012, 51, 5497−5505. (18) Yu, J. A particle filter driven dynamic Gaussian mixture model approach for complex process monitoring and fault diagnosis. J. Process Control 2012, 22, 778−788. (19) Ma, Y.; Shi, H.; Ma, H.; Wang, M. Dynamic process monitoring using adaptive local outlier factor. Chemom. Intell. Lab. Syst. 2013, 127, 89−101. (20) Hwang, D.-H.; Han, C. Real-time monitoring for a process with multiple operating modes. Control Engineering Practice 1999, 7, 891− 902. (21) Lane, S.; Martin, E.; Kooijmans, R.; Morris, A. Performance monitoring of a multi-product semi-batch process. J. Process Control 2001, 11, 1−11. (22) Ma, H.; Hu, Y.; Shi, H. Fault detection and identification based on the neighborhood standardized local outlier factor method. Ind. Eng. Chem. Res. 2013, 52, 2389−2402. (23) Song, B.; Tan, S.; Shi, H. Time−space locality preserving coordination for multimode process monitoring. Chemom. Intell. Lab. Syst. 2016, 151, 190−200. (24) Ma, Y.; Shi, H. Multimode process monitoring based on aligned mixture factor analysis. Ind. Eng. Chem. Res. 2014, 53, 786−799. Q


Multimode Process Fault Detection Based on Local Density Ratio

Recommend Documents