Article pubs.acs.org/IECR
Dissimilarity-Based Fault Diagnosis through Ensemble Filtering of Informative Variables Chudong Tong*,† and Ahmet Palazoglu‡ †
Faculty of Electrical Engineering and Computer Science, Ningbo University, Zhejiang 315211, P.R. China Department of Chemical Engineering and Materials Science, University of California, Davis, One Shields Avenue, Davis, California 95616, United States
‡
ABSTRACT: Despite the fact that fault diagnosis, similar to pattern recognition, has been widely studied in recent years, two key challenges remain: insufficient training samples and overlapping characteristics faced by reference fault classes. Recognition of these challenges motivate this study. First, an ensemble filtering of informative variables, also serving as a dimensionality reduction step, is proposed to address the challenge of insufficient training samples vs high dimensionality. Second, to characterize the difference among overlapped fault classes, a dissimilarity analysis, that detects changes in a distribution of two data sets, is employed. A moving window technique with incrementally increasing window sizes is used to gather data from online abnormal samples as well as each reference fault class. The dissimilarity for a pairwise set of data windows is then computed using the informative variables. The fault class recognition depends on the minimum dissimilarity achieved by the reference fault classes at each moving window step. The comparisons demonstrate that the recognition performance of the proposed approach is considerably better than that of discriminate models as well as other pattern matching methods. however, suffers from “fault smearing” that can lead to misdiagnosis. Alternative approaches, such as reconstructionbased contribution9 and missing variable contribution,10 have been proposed to address this issue. These methods are generic in that they can accommodate various statistical models.8 Although the aforementioned methods do not require prior fault information, they could provide misleading results in certain cases. Another category of fault diagnosis methods, which is similar to pattern recognition, takes advantages of historical fault data sets and then trains a classifier for establishing fault classes. This method works well in identifying the detected fault if the measurements on numerous process variables faithfully characterize the different abnormal conditions present in the historical database. Several approaches, such as discriminant analysis,11,12 neural networks,13 and support vector machines, 14 have been used for fault recognition. It is important to note here that classificationbased fault diagnosis is distinct from common pattern recognition tasks because of the following challenges: (1) the dimensionality of measured variables is high in contrast to the number of available reference fault samples and this is always the case since the faults are expected to be eliminated as early as
1. INTRODUCTION Modern industrial plants have been witnessing a rapid development of distributed computer-aided systems and sensor technologies as well as operator support systems through datadriven process monitoring systems, in particular, multivariate statistical process monitoring (MSPM) techniques in recent years.1,2 Generally, two main tasks are involved in process monitoring: (1) fault detection, where an alarm is triggered to indicate something abnormal or unexpected is occurring in the process, and (2) fault diagnosis (or identification), where the root cause or the type of detected fault is identified.2 Not surprisingly, MSPM, due to its data-driven nature, has been receiving considerable attention as fundamental mathematical models of modern complex process systems are often costly to develop or practically inaccessible.3 A number of fault detection methods have been developed for monitoring dynamical, nonlinear, non-Gaussian, and large-scale processes over the past few decades.4−7 However, an examination of the existing literature for MSPM shows that the majority of existing approaches mainly focus on fault detection. Yet, fault diagnosis is a “downstream” step that will be initiated once a fault has been declared.8 This diagnosis step is of critical importance because it can provide support for operators to realize process recovery actions. From this viewpoint, developing efficient fault diagnosis methods is regarded as an indispensable component of MSPM strategies. As an early method, the use of contribution plots has been a popular “default” fault diagnosis method.9 This method, © XXXX American Chemical Society
Received: March 7, 2016 Revised: July 21, 2016 Accepted: July 27, 2016
A
DOI: 10.1021/acs.iecr.6b00915 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
consideration. The proposed fault diagnosis method is outlined schematically in Figure 1. The window of online detected fault
possible, and (2) the available training samples from different fault classes usually present highly overlapping characteristics. With respect to the above challenges, some studies that have been carried out within the scope of classification-based fault diagnosis are briefly reviewed here. He et al.11 proposed a fault diagnosis method based on Fisher discriminant analysis (FDA) using a variable-weighted strategy. Although the variableweighted FDA method can have better classification performance than the FDA, the presented case studies only tested fault samples taken from stationary states. Thus, the timely requirement for diagnosing a fault is severely compromised, making it impractical for engineering applications. Verron et al.15 presented a fault diagnosis technique through implementing discriminant analysis on the selected informative variables. In their work, the variables that can help increase discriminant performance are selected to be informative variables. With the elimination of the “noisy” effect from noninformative variables, the performance (in terms of misclassification rate) of the designed fault classifiers has been improved. Although their work illustrated the potential benefit that can be achieved by variable selection, the corresponding strategy still suffers from the overlapping characteristics among different fault classes, especially in the case where a large number of reference fault classes is available. Rong et al.12 formulated a novel discriminant algorithm named locality preserving discriminant analysis (LPDA) for fault diagnosis. Their comparison studies demonstrated the superiority of LPDA-based fault diagnosis over the FDA method. However, the aforementioned two challenges are not fully considered. More importantly, it needs to be stressed that the LPDA method as well as the other two approaches are implemented in a samplewise manner instead of window-wise. Given that the nonstationary trajectory of a fault at its beginning stage can be more approximately reflected by time-series data, the window-wise data can thus capture more changeover variations to represent each fault class than the samplewise approach, and it is expected to result in lower fault misdiagnosis rate. Even for some special faults, like step-change sensor faults in an open loop, the window-wise data is still good in capturing the abnormal variations, and can provide more information than a single sample. Previously, fault detection and diagnosis methods that rely on the moving window technique have been developed. For example, Kano et al.16 proposed a statistical process monitoring method based on dissimilarity of process data, in which a fixedsize moving window was adopted. The dissimilarity analysis, termed as DISSIM, was based on the idea that a change in a distribution of process data can be utilized to indicate a change of operating conditions.16 Later, Zhao et al.17 applied the technique of DISSIM to monitor batch processes through a moving window where the window size grows gradually. Singhal and Seborg18 investigated several moving window based approaches including principal component analysis (PCA) based similarity factors, distance similarity factor, and the DISSIM for matching patterns in time-series databases. Although these pattern matching techniques can be used for fault class matching, they require sufficient training samples stored in the historical database; the challenging case where only limited training samples are available has not been even considered so far. Here we present a novel classification-based fault diagnosis method through the implementation of dissimilarity analysis on filtered informative variables. The two challenges mentioned before in the context of fault diagnosis tasks are given full
Figure 1. Schematic of the proposed fault diagnosis methods.
samples together with the window of the kth reference fault data is filtered by the designed informative variables for the kth fault class (k = 1, 2, ..., K). The DISSIM is then calculated for a pairwise set of window data sets using the informative variables, and the detected fault is assigned as the one having the minimum value of DISSIM. For the first challenge (i.e., insufficient training samples vs high dimensionality), the dimensionality reduction should be first considered. As shown in Figure 1, a variable filter is designed for each reference fault class to select its own set of informative variables, which are defined here to be those displaying significant departures from their normal regions. The other variables that show nonsignificant abnormal behaviors are considered to be “noisy”, and the dimensionality of the window data sets can thus be reduced accordingly if only the informative variables are considered. Consequently, the problems resulting from insufficient training samples are no longer that troublesome. Although overlapping characteristics are often encountered when different fault classes are present, the distribution of data sampled from different fault conditions cannot be exactly the same, especially when the noisy variables have been eliminated. Therefore, calculating DISSIM for pairwise window data with only informative variables is a meaningful choice to identify the right fault class from the reference fault data sets.
2. PREVIOUS WORK ON FAULT DIAGNOSIS/CLASSIFICATION This section reviews relevant work on fault diagnosis, highlighting the motivation of the current work at the end of this section. 2.1. Discriminant Analysis. The general idea of discriminant analysis is to seek some optimal directions that can maximize the separability of different classes.14 Considering a total of n training samples {x1, x2, ..., xn} from all K classes, the discriminant directions are actually the eigenvectors of the following generalized eigenvalue problem: Sbα = γ Swα (1) where γ is the generalized eigenvalue that indicates the extent of mutual separation between classes in terms of the ratio of the between-class vs within-class distances, Sb and Sw are the between-class and within-class scatter matrices, respectively. The FDA and LPDA algorithms provide two different ways to define Sb and Sw.11,12 Usually, there are K − 1 discriminant B
DOI: 10.1021/acs.iecr.6b00915 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
Figure 2. Dissimilarity index D in two simulated cases.
directions, A = [α1, α2, ..., αK−1], that can lead to the best separation. In fault diagnosis, a lower dimensional discriminant model is trained offline and the following multivariate statistical discriminant function14
principal component (PC) and weighs each PC equally in the similarity calculation. A more preferable similarity measure, SPCA ′ , was proposed by Johannesmeyer21 that takes into account the variance explained by each principal component direction. The S′PCA for two data sets X1 and X2 is calculated as
−1
1 gi(y) = − (y − μi )T ∑ (y − μi ) − ln[det(Σi)] 2 i
d
′ = SPCA
(2)
1 [log(σ 2) − 2
CK
∑ Ck = 1
d
i=1 j=1
is used as the classification method in the reduced space, where y = ATx, μi, and ∑i are the estimated mean and covariance of the ith class in the reduced space, i = 1, 2, ..., K. 2.2. Fault Diagnosis Using Variable Selection. Verron et al.15 presented a fault diagnosis method by integrating discriminant analysis and mutual information (MI) based variable selection algorithm. The MI can be viewed as a quantity measuring the mutual dependence between two random variables. In a supervised manner, if one of the two variables denotes classes that can be viewed as a multinomial random variable with K possible values, it becomes possible to compute the MI of a measured process variable and class variable, based on which the informative variables can be identified. Equation 3 defines the MI between a Gaussian variable x and a multinomial variable z, assuming that the probability distribution of variable z is given by P(z ∈ Ck) = P(Ck), x is a random variable with a normal density function of parameters μ and σ2 and x conditioned to z ∈ Ck follows a normal density function with parameters μCk and μCk2. I (x , z ) =
d
∑ ∑ (λi1λj2 cos2 θij)/∑ λi1λi2 i=1
(4)
where d is the number of PCs such that d PCs can explain at least 95% variance in each data set, θij is the angle between the ith PC of X1 and the jth PC of X2, λ1 and λ2 are the eigenvalues of XT1 X1 and XT2 X2, respectively. 2.3.2. Distance Similarity Factor. Considering a case that two data sets may have similar spatial orientation but are located far apart, the similarity factor defined in eq 6 is thus not useful because it only considers the angles of PCs. Fortunately, Singhal and Seborg18 presented a distance similarity factor, Sdist, to handle this problem, which is given as Sdist =
1 n
n
∑
exp{− [T12(i) − T2 2(i)]2 }
i=1
(5)
where T12(i) and T22(i) are the well-known Hotelling’s T2 statistic of the ith measurement of X1 and X2 after PCA transformation, respectively. For simplicity, the mean value of T2 is used to calculate the distance similarity factor. Given that the similarity factors defined in eqs 4 and 5 are both PCA-based and quantify similarity from different aspects, the following weighted combination of SPCA and Sdist is used:
P(Ck)log(σCk 2)]
Smix = τSPCA + (1 − τ )Sdist
(3)
(6)
where 0 ≤ τ ≤ 1. It is easy to see that the values of Smix change between 0 and 1, which corresponds to dissimilar and similar patterns, respectively. It should be emphasized here that the performance of Smix is influenced by the parameter τ, which cannot be optimally determined without enough prior knowledge. 2.3.3. DISSIM. Kano et al.16 suggested a dissimilarity factor to compare the deviation of two data sets for the purpose of process monitoring. Consider two data sets, X1 and X2, with ni (i = 1, 2) samples and m variables, where one data set is chosen as reference and normalized to be zero-mean and unit-variance and the other data set is also scaled using the same normalization information from the reference one. The covariance matrix R of the mixture of the two data sets is calculated as
In this way, the MI defined in eq 3 can be computed for all variables. The informative variables are those having higher I values than the others. The supervised discriminant model is then constructed on the informative variables, where the accuracy of the corresponding classifier would be increased as the number of noninformative variables decreases.15 2.3. Pattern Matching Methods. Pattern matching involves comparison of two multivariate data sets with an aim of finding historical data sets that are similar to a data set of current interest. 2.3.1. PCA-Based Similarity Factors. Background material on PCA and its application for fault detection and diagnosis are available in the literature.19,20 The standard PCA similarity factor employed by Singhal and Seborg18 for pattern matching, SPCA, does not account for the variance explained by each C
DOI: 10.1021/acs.iecr.6b00915 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research R=
T n1 n2 1 ⎡ X1 ⎤ ⎡ X1 ⎤ ⎢ ⎥ ⎢ ⎥= R1 + R2 n1 + n2 ⎣ X 2 ⎦ ⎣ X 2 ⎦ n1 + n2 n1 + n2
direction and distance of PCs into account, there is a new issue associated with the utilization of Smix that the parameter τ needs to be specified a priori. In contrast, DISSIM is a good alternative since it is parameterless. Therefore, the motivation for the proposed approach centers on the deficiencies of the former two categories of fault diagnosis methods. More precisely, the present work proposes an approach that simultaneously: • deals with the modeling problem caused by insufficient training samples vs high dimensionality; • characterizes the abnormal variations of each fault class by filtering informative variables individually for each fault class; • relies on the DISSIM of two window-wise data sets for identifying the fault class of current interest.
(7)
where Ri = XTi Xi/ni. The R can then be diagonalized using the following singular value decomposition: R = VSVT
(8)
where S ∈ Rm×m is a diagonal matrix, and the column vectors in V are mutual orthogonal. Afterward, the original data sets Xi can then be transformed into Yi: ni X iVS−1/2 = n1 + n2
Yi =
ni X iU n1 + n2
(9)
where U = VS−1/2 is a transformation matrix. To this end, the covariance matrix of Yi can be computed, and its corresponding eigenvalues, λj (j = 1, 2, ..., m) are then obtained. An index D is defined for evaluating the distribution dissimilarity between X1 and X2: D=
4 m
3. DISSIMILARITY-BASED FAULT DIAGNOSIS 3.1. Information Variables Ensemble Filtering. The informative variables of each fault class are defined to be those that can represent the abnormal variations in contrast to the normal variations of the normal data set. To achieve a successful filtering of informative variables, a measure of abnormal variations in each process variable needs to be established first. Motivated by the definition of negentropy for measuring non-Gaussianity,22 a new index is defined for measuring the degree of abnormal variations in each variable as follows:
m
∑ (λj − 0.5)2 (10)
j=1
The values of D also change between 0 and 1, but different from the index Smix, the lower and upper bounds correspond to similar and dissimilar distribution characteristics, respectively. Moreover, two test cases as shown in Figure 2, are considered here to illustrate the capability of the index D in identifying dissimilarity of two highly overlapped data sets. When the two data sets are generated according to the following multivariate normal distribution: X1 ∼ N (μ1 , σ1),
X 2 ∼ N (μ2 , σ2)
J(v) = [E{G(vnormal)} − E{G(vfault)}]2
(12)
where vnormal and vfault are variable v measured under normal and fault conditions, respectively. Hyvärinen and Oja22 have suggested a number of functions for G:
(11)
where two-dimensional mean vector μ1 = μ2 = [0, 0], 2 × 2 ⎡ ⎤ ⎤ ⎡ covariance matrix σ1 = ⎣−21 −31⎦, and σ2 = ⎣−20.5 −30.5⎦. Therefore, the generated two data sets are similar to each other as displayed in Figure 2a, the D = 0.0138. When the covariance ⎡ ⎤ matrix of the second data set X2 is replaced by σ2 = ⎢⎣ 2 1.6 ⎥⎦, 1.6 3 D = 0.3724 indicates that the two data sets as shown in Figure 2b are different even though they are highly overlapped. The testing results provided in Figure 2 quantitatively demonstrate the validity of the D index in handling overlapped data sets. 2.4. Remarks and Motivation. The application of discriminant analysis algorithms on fault diagnosis also requires sufficient training samples from different fault classes so as to build a reliable classifier. In some cases, this may lead to infeasibility if there is only a limited number of measurements available. Although the MI-based variable selection method can be adopted to filter informative variables and reduce dimensionality before training the discriminant model, the informative variables are not uniquely filtered to represent the abnormal characteristic in each fault class. Without characterizing fault classes individually, the overlap issue would become more troublesome with the increasing number of reference fault classes. Moreover, a sample-wise classifier is less appealing than a window-wise classifier in handling time-varying measurements. The moving window strategy integrated with pattern matching methods is a good choice for fault diagnosis. Although the PCA-based similarity factor, Smix, takes both
G1(v) = log cosh(a1v)/a1 G2(v) = exp( −a 2v 2 /2) G3(v) = v 4
(13)
with parameters 1 ≤ a1 ≤ 2 and a2 ≈ 1. Note that, if vnormal is a Gaussian variable and vfault is any random variable, eq 12 is a simple approximation of negentropy, which was proposed by Hyvärinen and Oja22 and used to measure the departure of a random variable from Gaussianity. Therefore, the index J(v) defined in eq 12 can be used to measure the departure of variable vfault from its normal region vnormal. The values of J(v) are nonnegative with 0 representing that variable v only contains normal variations. During the offline modeling phase, filters of informative variables are designed for every fault class. Through comparing the data set from the kth reference fault class with the data set from normal operating condition, J(v) can be computed for all m process variables. Generally, the variables with higher J(v) values tend to be informative variables for the considered fault class. To determine the informative variables efficiently, the following steps are proposed by also taking the dimensionality reduction into account: First: Sort all variables in decreasing order of their J(v) values, and the cutoff point p is determined to be the first one satisfying Jp(v)/∑p−1 i=1 Ji(v) ≤ 0.5%, the first p − 1 variables with largest J(v) values are used to form an initial set of informative variables. D
DOI: 10.1021/acs.iecr.6b00915 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research Second: If the number of variables satisfies M < 2, add a variable achieving the second largest value of J(v) to the set of informative variables; or if M > m/3, remove the variables with smallest values of J(v) from the set of informative variables to make the final M ≤ m/3. Last: Repeat the above two steps for other fault classes, and save the subscripts of the filtered variables for online filtering. In the first step, the cutoff point is suggested to be the elbow point where further increase in the number of variables does not lead to a significant increase in the sum of previous p − 1 J(v) values. The second procedure restricts the reduced dimensionality to 2 ≤ M ≤ m/3. The dimensionality is significantly reduced and the available samples would not be that insufficient compared to the reduced dimensionality. Nevertheless, it should be emphasized that the available training samples are considered to be insufficient if the number of samples is even less than the number of measured variables. However, the extreme case that only a few samples are available is not considered since the proposed method is still data-based and requires some samples. Furthermore, it needs to be pointed out that the calculation of J(v) involves a selection problem, that is, the determination of function G. Given that G has three available alternatives, the question is which is the optimal one? Generally, nonquadratic function G is required to be chosen widely so as to obtain a good approximation of negentropy. Although Mori et al.6 argued that G1 is a good contrast function, it is still determined empirically. In the current work, we observe that different formulations of G measure the departure between two variables from different aspects. From this point of view, there is no optimal contrast function without enough prior knowledge. To avoid this issue, ensemble learning technique is adopted. Recently, ensemble learning has become quite popular in the field of pattern recognition as well as MSPM.23−25 The main idea behind ensemble learning is to build multiple base classifiers and then combine them into an ensemble. The ensemble solution is always shown to have a better performance than the outcome of an individual solution.24 Inspired by this, three alternative G expressions are used in parallel to design base filters for filtering informative variables resulting in three different sets of informative variables for each fault class. 3.2. Dissimilarity-Based Online Fault Matching. Following the three filters designed for each fault class, a moving window with an incrementally increasing size is used to gather data for calculating Dik (i = 1, 2, 3) as shown in Figure 3. The initial window size is W, which means that the online fault matching is not activated until the number of detected abnormal samples reaches W. As illustrated in Figure 3, at each moving window step, three DISSIM calculations are carried out in parallel because of the designed three filters for the kth fault class. Through giving equal weight to Dik, the ensemble solution representing the matching degree between the data windows from detected abnormal samples and the kth fault class is given as follows: Dk = Dk1 + Dk2 + Dk3
Figure 3. Moving window based dissimilarity calculation.
learning. Finally, the fault class is identified to be the one achieves minimum Dk value: fault class = arg min{D1,D2 , ..., DK } k
(15)
3.3. Fault Diagnosis Procedure. Of f line modeling: (1) Collect historical normal process data and scale it to zero mean and unit variance. (2) Collect historical data sets {F1, F2, ..., FK} from K different fault classes and scale them using the same mean and standard deviation vectors as used in step 1. (3) Calculate the value of J(v) for each process variable and activate the procedures described in section 3.1 to obtain the filters of informative variables for all available fault classes; Online diagnosis: (4) When the number of currently detected abnormal samples equals the initial window size W, scaled them the same way as in step 2 and form them into a window matrix. (5) Calculate Dk in a way described in Figure 3 for comparing the dissimilarity between X and the window data with the same size gathered from the kth fault class. (6) Repeat step 5 for all K fault classes and determine the fault class using eq 15. (7) As new abnormal samples become available, implement steps 5 and 6 repeatedly.
4. ILLUSTRATION AND COMPARISON STUDY The section studies the recognition performance of the proposed method, which is abbreviated as IV-DISSIM. Other methods, such as LPDA,12 MI-based variable selection integrated with FDA (MI-FDA),15 and the PCA-based similarity factor Smix,18 are considered for comparisons. 4.1. Tennessee Eastman (TE) Process. The physical model of TE process was provided by Downs and Vogel,26 which consists of five major unit operations: a reactor, a condenser, a vapor−liquid separator, a recycle compressor, and a stripper. The TE process is a widely used benchmark platform for evaluating process monitoring approaches because of its
(14)
where k = 1, 2, ..., K denotes the subscript of fault classes, D1k, D2k, and D3k are dissimilarity values using informative variables filtered by base filters with G1, G2, and G3, respectively. To this end, an ensemble solution is obtained and the selection of G in computing J(v) is no longer an issue with the use of ensemble E
DOI: 10.1021/acs.iecr.6b00915 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
each fault class is three times larger than the number of variables. 4.2. Study on Fault Diagnosis Performance. With the available training samples given in case 1, the implementation of offline modeling procedures would result in three filters of informative variables for each fault class. The initial window size is W = 25, and the online diagnosis results for IDV(4) and IDV(11) are given in Figure 4. The black dots in Figure 4, and the following figures point to the target fault class that should achieve minimum DISSIM values. IDV(4) and IDV(11) are clearly isolated even though they both happen to be caused by the abnormal temperature variation in RCW. It should be pointed out that the moving window for each fault class becomes unavailable after the window size reaches its maximum number of training samples. For example, the number of historical samples for IDV(5) is 35 and the initial window size is W = 25, the number of available moving window steps is thus 11. As a comparison, the diagnosis results based on DISSIM without filtering informative variables are displayed in Figure 5, where IDV(4) and IDV(11) are not clearly isolated. The high misclassification rate in identifying IDV(11) is shown in Figure 5b. A similar comparison can be made in Figure 6, in which the IV-DISSIM again presents a superior recognition performance. This is because the “noisy” influence from noninformative variables is greatly limited, and a relatively reliable statistical model can be obtained since the number of training samples is now at least two times larger than the number of informative variables, better recognition performance is expected using the proposed approach. Furthermore, it can be easily observed from Figures 5 and 6c and d that the value of the D index drops as the window gathering more data for comparison. Given that “noisy” information increases when the moving window capture more data for analysis, without eliminating the irrelevant information, the DISSIM method without filtering informative variables is difficult to identify fault classes and thus gives wrong results. Without the ensemble learning strategy, the diagnosis performance would also deteriorate, which can be observed from Figure 7. Here the diagnosis of IDV(8) is considered, and only function G1 is used for filter design. As illustrated in Figure 7b, the Dk between windows from online data and IDV(8) is not always the minimum one. In contrast, with the integration of the ensemble learning technique, IDV(8) is successfully identified as shown in Figure 7a. Moreover, it should be noted that the selection of an appropriate size of initial window is crucial for effective functioning of the proposed method and also other moving window relevant approaches.
relatively realistic and complex dynamics. A set of 52 process variables can be measured, which includes 22 continuous variables, 12 manipulated variables, and 19 composition measurements sampled less frequently. In the current work, a total of 33 continuous measured variables is used, and the simulated data sampled every 3 min can be downloaded from http://web.mit.edu/braatzgroup. Details can be found in the book by Chiang et al.14 There are 21 different fault modes that can be programmed in the simulator, in which 9 different faults are employed for designing test scenarios. As listed in Table 1 (the IDV notation Table 1. Test Scenarios in the TE Process available samples fault
description
IDV(2)
B composition, A/C ratio constant (stream 4) reactor cooling water inlet temperature condenser cooling water inlet temperature A, B, C feed composition (stream 4) C feed temperature (stream 4) reactor cooling water inlet temperature condenser cooling water inlet temperature reactor cooling water valve unknown
IDV(4) IDV(5) IDV(8) IDV(10) IDV(11) IDV(12) IDV(14) IDV(19)
type
case 1
case 2
step
1−55
1−115
step
1−33
1−115
step
1−35
1−115
random variation random variation random variation random variation sticking
1−58
1−115
1−45
1−115
1−61
1−115
1−38
1−115
1−50
1−115
unknown
1−40
1−115
is from Downs and Vogel25), both IDV(4) and IDV(11) involve a disturbance in reactor cooling water (RCW) inlet temperature, both IDV(5) and IDV(12) involve a disturbance in condenser cooling water inlet temperature. This clearly indicates the overlapping nature of the reference fault classes. Furthermore, the available training samples are insufficient compared to the dimensionality (i.e., the number of monitored variables) in the case 1, which points to a highly challenging fault diagnosis task. Note that the difference in numbers of available samples is not a result of different sampling rates. Instead, we try to simulate a “real” case, where the numbers of recorded samples for different fault classes are highly limited and cannot always be the same because of the time when system recovery is carried out. In contrast, the case 2 represents a common scenario tested by other classification-based fault diagnosis methods since the number of available samples in
Figure 4. Fault diagnosis results for (a) IDV(4) and (b) IDV(11) using the proposed IV-DISSIM approach. F
DOI: 10.1021/acs.iecr.6b00915 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
Figure 5. Fault diagnosis results for (a) IDV(4) and (b) IDV(11) using the DISSIM method without filtering of informative variables.
Figure 6. Fault diagnosis results for (a) IDV(5), (b) IDV(11) using the IV-DISSIM method, (c) IDV(5), and (d) IDV(11) using the DISSIM method, respectively.
Figure 7. Fault diagnosis results for IDV(8) based on the IV-DISSIM method, (a) with and (b) without ensemble learning strategy.
Consequently, it is recommended to utilize a larger window size if the conditions allow. 4.3. Comparisons with Discriminate Models. The issues that derive from insufficient samples vs high dimensionality are addressed by the filtering of informative variables in the IVDISSIM method, and it becomes possible to validate its performance further if more training samples become available, i.e., the case 2 in Table 1. In this case, discriminate models, such as LPDA and MI-FDA, can be built for comparison purposes
To study the influence resulting from different window sizes, the diagnosis step is repeated with W = 22, 25, and 30. As illustrated in Figure 8, with the increasing values of W, the recognition performance becomes better. Particularly, no misclassification error occurs if the largest window size (i.e., W = 30) is chosen for IDV(10). This points to that with more data gathered, more changeover variations can be used for fault recognition, and this is the main reason that a moving window with increasing sizes is utilized in the proposed framework. G
DOI: 10.1021/acs.iecr.6b00915 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
Figure 9. Fault diagnosis results for IDV(19) based on (a) the IVDISSIM method, (b) the Smix with τ = 0.9, and (c) the Smix with τ = 0.1.
Figure 8. Fault diagnosis results for IDV(10) based on the IV-DISSIM approach with W = (a) 22, (b) 25, and (c) 30.
the contrary, the recognition performance of the DISSIM based on a comparison of distribution of two data sets is enhanced by using the informative variables. Given that the noninformative variables present insignificant abnormal variations, the specific distribution structure of each fault class would be clearly delineated without them. Therefore, DISSIM with informative variables is a feasible way to define individuality for overlapping fault classes. This is the main observation to be made for the results presented in Figure 9. Therefore, the online detected fault cannot be successfully classified by simply replacing the DISSIM with Smix in the proposed method. Due to the use of the moving window, the proposed method and Smix suffer from a delay in classification, therefore the misclassification rate within the first W − 1 samples is not used for the sample-wise discriminate models so as to make a fair comparison. The fault misclassification rates (MR) based on different methods are then computed as follows:
since sufficient training samples are provided. The discriminant function defined in eq 2 is used for classifying. Furthermore, replacing DISSIM with the Smix index in the proposed approach is also considered, which provides an additional comparison study. The initial moving window size is chosen to be W = 30 as previously recommended, τ = 0.9 and τ = 0.1 are empirically chosen for calculating the Smix index. The online diagnosis results for IDV(19) using two window-based methods are illustrated in Figure 9. Clearly, IV-DISSIM can distinguish IDV(19) successfully since the minimum Dk values are achieved persistently. However, the Smix index for IDV(19) is not reaching the maximum values as shown in Figure 9b and c, thus a different recognition performance is obtained for different values of τ. This points to the argument that the DISSIM is a good alternative since no parametrization is involved. Moreover, with the elimination of noninformative variables, the variance structure of the original data set is broken. PCA transformation thus would not extract useful information to represent the original data set, based on which the similarity factors defined using the PCs are not sufficient to characterize the specificity in each fault class. Besides, if all the measured variables are used in calculating the PCA-based similarity factors, the high dimensionality would become problematic. On
MR = n/(N − W + 1)
(16)
where n is the number of misclassified samples (or windows) and N is the total number of considered samples (or windows), which is 115 as listed in Table 1. The results are given in Table 2. Not surprisingly, the window-wise classifiers can improve fault diagnosis performance in terms of MR. Moreover, the fault H
DOI: 10.1021/acs.iecr.6b00915 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research Table 2. Misclassification Rates Based on Different Methods in Case 2 (%) methods
IDV(2)
IDV(4)
IDV(5)
IDV(8)
IDV(10)
IDV(11)
IDV(12)
IDV(14)
IDV(19)
IV-DISSIM Smix LPDA MI-FDA
0 0 9.30 0
0 0 11.63 62.79
0 0 0 6.98
0 48.84 10.47 44.19
0 100 34.88 56.98
0 100 31.39 55.81
0 100 2.33 25.58
0 0 24.42 31.40
0 100 9.30 29.07
(2) Ge, Z.; Song, Z.; Gao, F. Review f recent research on data-based process monitoring. Ind. Eng. Chem. Res. 2013, 52, 3543. (3) Tong, C.; El-Farra, N. H.; Palazoglu, A.; Yan, X. Fault detection and isolation in hybrid process systems using a combined data-driven and observer-design methodology. AIChE J. 2014, 60, 2805. (4) Wen, Q.; Ge, Z.; Song, Z. Multimode dynamic process monitoring based on mixture canonical variate analysis model. Ind. Eng. Chem. Res. 2015, 54, 1605. (5) Luo, L.; Bao, S.; Mao, J.; Tang, D. Nonlinear process monitoring using data-dependent kernel global-local preserving projections. Ind. Eng. Chem. Res. 2015, 54, 11126. (6) Mori, J.; Yu, J. A quality relevant non-Gaussian latent subspace projection method for chemical process monitoring and fault detection. AIChE J. 2014, 60, 485. (7) Liu, Q.; Qin, S. J.; Chai, T. Multiblock concurrent PLS for decentralized monitoring of continuous annealing processes. IEEE Trans. Ind. Electron. 2014, 61, 6429. (8) He, B.; Zhang, J.; Chen, T.; Yang, X. Penalized reconstructionbased multivariate contribution analysis for fault isolation. Ind. Eng. Chem. Res. 2013, 52, 7784. (9) Alcala, C. F.; Qin, S. J. Reconstruction-based contribution for process monitoring. Automatica. 2009, 45, 1593. (10) Chen, T.; Sun, Y. Probabilistic contribution analysis for statistical process monitoring: A missing variable approach. Control Eng. Practice. 2009, 17, 469. (11) He, X.; Wang, W.; Yang, Y.; Yang, Y. Variable-weighted Fisher discriminant analysis for process fault diagnosis. J. Process Control 2009, 19, 923. (12) Rong, G.; Liu, S.; Shao, J. Fault diagnosis by locality preserving discriminant analysis and its kernel variation. Comput. Chem. Eng. 2013, 49, 105. (13) Chen, X.; Yan, X. Using improved self-organizing map for fault diagnosis in chemical industry process. Chem. Eng. Res. Des. 2012, 90, 2262. (14) Chiang, L. H.; Russell, E. L.; Braatz, R. D. Fault detection and diagnosis in industrial systems; Springer-Verlag: London, 2001. (15) Verron, S.; Tiplica, T.; Kobi, A. Fault detection and identification with a new feature selection based on mutual information. J. Process Control 2008, 18, 479. (16) Kano, M.; Hasebe, S.; Hashimoto, I.; Ohno, H. Statistical process monitoring based on dissimilarity of process data. AIChE J. 2002, 48, 1231. (17) Zhao, C.; Wang, F.; Jia, M. Dissimilarity analysis based batch process monitoring using moving windows. AIChE J. 2007, 53, 1267. (18) Singhal, A.; Seborg, D. E. Evaluation of a pattern matching method for the Tennessee Eastman challenge process. J. Process Control 2006, 16, 601. (19) Jiang, Q.; Yan, X. Just-in-time reorganized PCA integrated with SVDD for chemical process monitoring. AIChE J. 2014, 60, 949. (20) Zhu, J.; Ge, Z.; Song, Z. HMM-driven robust probabilistic principal component analyzer for dynamic process fault classification. IEEE Trans. Ind. Electron. 2015, 62, 3814. (21) Johannesmeyer, M. C.; Singhal, A.; Seborg, D. E. Pattern matching in historical data. AIChE J. 2002, 48, 2022. (22) Hyvärinen, A.; Oja, E. Independent component analysis: algorithms and applications. Neural Networks 2000, 13, 411. (23) Li, L.; Hu, Q.; Wu, X.; Yu, D. Exploration of classification confidence in ensemble learning. Pattern Recognit. 2014, 47, 3120. (24) Tong, C.; Palazoglu, A.; Yan, X. Improved ICA for process monitoring based on ensemble learning and Bayesian inference. Chemom. Intell. Lab. Syst. 2014, 135, 141.
classification based on discriminant algorithms show relatively high misclassification rates and may yield even higher misclassification rates with the increasing number of reference fault classes. The proposed IV-IDISSIM method again proves its superiority in diagnosing faults.
5. CONCLUSIONS This article proposed a novel fault diagnosis method to directly handle the two methodological challenges (i.e., insufficient training samples and highly overlapping characteristic), which are rarely studied so far. In the current work, the presented ensemble filtering of informative variables has proven to be a practical approach for variable selection with the aim of dimensionality reduction. On the basis of informative variables, the first challenge associated with insufficient training samples vs high dimensionality is no longer an issue. Motivated by the idea that a change of operating condition can be detected by the monitoring of a distribution of process data, the DISSIM adopted for matching fault patterns has also been shown to have the desired functionality and effect. The comparison results demonstrated that the recognition performance of IVDISSIM is considerably better than that of discriminate models as well as PCA-based similarity factors. However, the proposed method cannot be directly applied to real industrial processes since it assumes that isolated signatures of individual faults are available in the historical database. Generally, the recorded data in practice is much more complicated. For example, the normal operating data is usually contaminated by different abnormal samples. Therefore, a data preprocessing tool is required before the application of the proposed IV-DISSIM approach. Anyway, there are some interesting points that require further investigation within the framework of the proposed method. For instance, nonlinear or non-Gaussian pattern matching methods can be employed to gain enhanced recognition performance. In addition, how to recognize a new fault class that is not modeled offline is worth investigation in the future.
■
AUTHOR INFORMATION
Corresponding Author
*E-mail:
[email protected]. Tel.: +86 18815280932. Notes
The authors declare no competing financial interest.
■
ACKNOWLEDGMENTS This work was sponsored by the K. C. Wong Magna Fund in Ningbo University, the Natural Science Foundation of China (61503204), the Natural Science Foundation of Zhejiang Province (Y16F030001), and the Science & Technology Planning Project of Zhejiang (2015C31017).
■
REFERENCES
(1) Yin, S.; Ding, S. X.; Xie, X.; Luo, H. A review on basic data-driven approaches for industrial process monitoring. IEEE Trans. Ind. Electron. 2014, 61, 6418. I
DOI: 10.1021/acs.iecr.6b00915 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research (25) Li, N.; Yang, Y. Ensemble kernel principal component analysis for improved nonlinear process monitoring. Ind. Eng. Chem. Res. 2015, 54, 318. (26) Downs, J. J.; Vogel, E. F. A plant-wide industrial process control problem. Comput. Chem. Eng. 1993, 17, 245.
J
DOI: 10.1021/acs.iecr.6b00915 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX