Normal and Abnormal Data Segmentation Based on Variational

Jun 12, 2017 - a new method to automatically find normal and abnormal data segments from historical data sets based on variational directions of multi...
0 downloads 11 Views 4MB Size
Article pubs.acs.org/IECR

Normal and Abnormal Data Segmentation Based on Variational Directions and Clustering Algorithms Kuang Chen† and Jiandong Wang*,‡ †

College of Engineering, Peking University, Beijing, China College of Electrical Engineering and Automation, Shandong University of Science and Technology, Qingdao, China



ABSTRACT: Historical normal and abnormal data sets are prerequisites for process monitoring, alarm system rationalization, fault detection, and diagnosis. This paper proposes a new method to automatically find normal and abnormal data segments from historical data sets based on variational directions of multiple process variables. The minimum time duration and the minimum amplitude shift are introduced as empirical knowledge to define underlying stages in the data sets. Two major challenges in identifying these stages are addressed by using a density-based clustering algorithm. The effectiveness of the proposed method is illustrated using numerical and industrial examples.

1. INTRODUCTION Normal and abnormal data segmentation are often required as training data sets for process monitoring, alarm system rationalization, fault detection and diagnosis. Doing the data segmentation manually via consultation to industrial plant operators is possible for small-sized data sets, but it is timeconsuming and practically infeasible for large-sized data sets. More training data points are generally beneficial for improving performances of data-based methods. Therefore, it is indispensable to have a method automatically obtaining normal and abnormal data segments. The data segmentation problem is closely related to building nominal models for fault detection and diagnosis. Related methods may be classified into three main categories, namely, quantitative model-based methods, qualitative model-based methods, and historical data-based methods.1−3 Input−output or state-space models are perhaps the most common representatives of quantitative models.4,5 If assumptions of mathematical models are satisfied, then the evolution of processes and associated abnormalities can be well captured. Qualitative models aim to generate abstract algebraic models from priori knowledge and to derive solutions in a qualitative expression, such as high/ low/normal states stored in look-up tables.6 Digraph-based causal models are the representative qualitative models frequently applied for process fault diagnosis.7−9 Historical data-based models are built on a large amount of historical process data.10,11 Features of historical data, such as principal component components, are extracted from normal and/or abnormal training data sets, and some classifiers are used to detect abnormal conditions. Abnormalities can also be analyzed in the frequency space of signals instead of the time-amplitude space.12,13 Finding abnormal data segments is the offline counterpart of detecting the presence of abnormalities that is one major task © XXXX American Chemical Society

for industrial plant operators in the 24-h process monitoring operation. If industrial processes deviate from normal operating ranges due to device malfunctions, operational errors or raw material fluctuations, operators should detect the abnormalities and make corrective actions in a prompt manner; otherwise, the abnormalities will have negative effects on process safety and/or efficiency. We have observed that one common technique adopted by industrial plant operators in detecting the abnormalities is to compare variational directions of process variables with priori process knowledge. Let us take a variable speed pump as an example. It is known from related physical laws of the pump that if the pump speed increases or decreases, then the outlet flow rate changes in the same direction when the pump works normally. Thus, by looking at directions of variations in the outlet flow rate and the pump speed, industrial plant operators can readily tell whether the pump is in the normal or abnormal condition, that is, if the outlet flow rate and pump speed are changing in opposite directions, then there must be something wrong with the pump. The qualitative trend analysis (QTA) methods can be used to obtain variational directions. The QTA methods extract the qualitative increasing, decreasing or steady features of a single signal.3 The features can be obtained by representing signals using various basis functions, for example, polynomials,14−17 splines,18 wavelet,19−21 primitives,22,23 and neural networks,24 and segmenting signals using the top-down, bottom-up and sliding-window methods.27,28 However, two major challenges have not been Received: Revised: Accepted: Published: A

May 4, 2017 June 9, 2017 June 12, 2017 June 12, 2017 DOI: 10.1021/acs.iecr.7b01868 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

3. MAIN IDEA This section presents an example to illustrate the main idea of the proposed method. Consider a bivariate system composed by two process variables, namely, X ≔ [X1, X2], and assume that X1 and X2 vary synchronously in the same direction when they are in the normal condition. Thus, the relationship in eq 1 becomes

well addressed in contemporary studies (to be clarified later in Section 5): • Similar data points adjacent in time are likely to be merged into the same data segment; however, it is difficult for similar data points far apart in time to be grouped together. • Variations with larger amplitude changes are likely to be detected; however, false detections of variations are often present due to large amplitude changes caused by random noises. This paper proposes a new method to automatically find normal and abnormal data segments from historical data sets based on variational directions of multiple process variables. Here a variation is defined as a change between two different underlying stages being adjacent in time. The underlying stages are defined based on the minimum time duration and the minimum amplitude shift. The above-mentioned two challenges are addressed by taking densities of data points shown in scatter plots into consideration; however, an associated side effect is the loss of time continuities of data points in scatter plots. The proposed method addresses the two challenges and avoids the associated side effect by exploiting a density-based clustering algorithm twice to find clusters in historical data sets. The first exploitation incorporates the time dimension in order to ensure time continuities of data points inside each cluster. The second exploitation is applied to the cluster centers obtained in the first exploitation to identify underlying stages. The variational directions of process variables in different stages are compared with rules from priori process knowledge to separate abnormal segments from normal ones. The rest of the paper is organized as follows. Section 2 describes the problem to be solved. Section 3 illustrates the main idea of the proposed method via a numerical example. Section 4 presents detailed steps of the proposed method. Numerical and industrial examples are provided in Sections 5 and 6, respectively. Section 7 concludes the paper.

[X1]·[X 2] = 1or[X1] = [X 2] = 0

The time sequence plots and the scatter plot of X are given in Figures 1 and 2, respectively. Assume that X is in the normal

Figure 1. Time sequence plots of X1 (a) and X2 (b).

condition at the initial time instant t = 1. It is obvious that X goes back and forth between two stages corresponding to the clusters A and B marked in Figure 2 during the time interval t ∈ [1, 400]. The variational directions of X1 and X2 meet the relationship in eq 2. By contrast, the variational directions of X1 and X2 from the cluster B to the cluster C do not agree with eq 2. Hence the bivariate system is in the normal condition before t = 400, and in the abnormal condition after t = 400. The scatter plot in Figure 2 implies that a clustering technique based on the densities of data points is promising to resolve the two challenges in Section 1. In particular, similar data points being apart in time, for example, x1(t) and x2(t) for t ∈ [1, 100] and t ∈ [201, 300], formulate the same cluster A in the scatter plot. In addition, the noise has little negative effects on identifying the clusters. The data points x1(t) and x2(t) for t ∈ [201, 350] are contaminated by higher level noises than other data points. Even so, the cluster B clearly encloses these data points. A side effect of working in the scatter plot is the loss of time information, which may result in unreasonable data segmentations. For instance, consider the data point marked by the red large dot in Figures 1 and 2. This data point is closer to the cluster A or C than the cluster B in the scatter plot. However, if the time information is considered, then this data point should belong to the cluster B in order to keep time continuities of data points. To avoid such a side effect, two clustering steps are introduced. The first step finds clusters of data points after incorporating the time dimension in order to preserve time continuities; thus, each resulting cluster center represents a cluster of data points being consecutive in time. In the second step, the previous cluster centers are grouped again without a concern in losing time continuities.

2. PROBLEM DESCRIPTION Consider a multivariate system involving multiple process variables, X ≔ [X1,X2,···, Xn]. The data sequence of Xi with i = 1,2,···, n is represented by xi(1: t) ≔ [xi(1), xi(2),···, xi(t)]T. Here t ∈  (the set of nonnegative integers) is the index associated with the sampling time instance th, where h is a realvalued sampling period; without loss of generality, set h = 1 s and use t to represent th in the sequel. The variational direction of Xi at the time index t is defined as [Xi]t → t +Δt = sign(xi(t + Δt ) − xi(t ))

where Δt is a positive integer, and sign (x) takes the value of 1, 0, or −1 for x > 0, x = 0 or x < 0, respectively. Thus, if [Xi] equals +1, 0 and −1, then Xi is increasing, unchanged and decreasing, respectively. If X is in the normal condition, variational directions of X1, X2, ···, Xn are assumed to have a fixed qualitative relationship. Mathematically, the qualitative relationship is represented by f ([X1], [X 2], ···, [X n]) = 0

(2)

(1)

By contrast, when X runs into the abnormal condition, the relationship in eq 1 is assumed to be no longer valid, that is, f ([X1],[X2],···,][Xn]) ≠ 0. Based on the qualitative relationship in eq 1, our objective is to automatically separate a given historical data set X(1: N) of X into normal and abnormal data segments. B

DOI: 10.1021/acs.iecr.7b01868 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

Figure 2. Scatter plot of X1 and X2.

the relationship in eq 1. If the variational directions are consistent (inconsistent) with the relationship, then the corresponding data segment is in the normal (abnormal) condition.

Generally, the switching process among stages, that is, A → B → A → B → C shown in Figures 1 and 2, can be abstracted as a partical motion shifting between cluster centers in the same space as illustrated by Figure 3. Denote the current

4. THE PROPOSED METHOD This section defines the underlying stages and presents the detailed steps of the proposed method. 4.1. Definition of Stages. Two parameters are introduced in order to define the underlying stages, namely, the minimum time duration τ and the minimum amplitude shift Δ. The minimum time duration τ comes from common sense that if an abnormality arises, it should last at least for a while before being regarded as an actual abnormality instead of short-time random variations due to noises. In other words, any variations lasting less than τ are ignored, and only those having time durations larger than τ should be considered as stages. Thus, τ is a boundary parameter in time to distinguish noises and stages in historical data sets. The rationale of τ is similar to the alarm delay timer,25 whose role is to raise an alarm when several consecutive data points surpass an alarm limit. A default value of τ here is taken as τ = 20 s, similar to the recommended value for the alarm delay timer.25 Besides the time dimension, the amplitude dimension is equally important in defining stages. Only variations having sufficient amplitude changes are regarded as being notable. The minimum amplitude shift Δ is defined as the lower limit of the notable variations, that is, if an inequality holds,

Figure 3. A partical motion shifting between stage centers in the same space with the process shown in Figures 1 and 2.

position of the partical as Pt at the time instant t, and the latest ⎯⎯⎯⎯→ normal position is Pk for k < t. The displacement PkPt is defined as the vector from Pk to Pt. Based on the qualitative relationship ⎯⎯⎯⎯→ in eq 1, the normality of Pt depends on the vector PkPt , that is, ⎯⎯⎯⎯→ ⎧ ⎪ normal, iff ([ PkPt ]) = 0 Pt is⎨ ⎪ abnormal, iff ([⎯P⎯⎯⎯→ ⎩ kPt ]) ≠ 0

n

X(t 2) − X(t1)

(3)

≔ 2

The rationale of (3) is transparent. In Figure 3, B is the latest normal position, and the current position C is abnormal due ⎯→ ⎯ to f ([BC]) ≠ 0. Hence, the next position D would be in the normal condition such as D1 ∼ D4 in the dark domain shown in Figure 3, or in the abnormal condition such as D5 ∼ D8. In brief, the main idea of the proposed method is to deploy a density-based clustering algorithm twice in order to identify underlying stages, and to compare variational directions of process variables between the current stage and the latest normal stage with

∑ [xi(t2) − xi(t1)]2 i=1

⩾Δ (4)

then X(t2) is in a different stage with X(t1). Here t1 and t2 are the beginning and ending time indices of this variation, respectively. Obviously, Δ is a boundary parameter in amplitude to distinguish noises and stages. A default value of Δ can be determined by the standard deviation (std) of X when X stays at a steady state, that is, n

Δ=

∑ (std(Xi ,ss))2 i=1

C

(5) DOI: 10.1021/acs.iecr.7b01868 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research where Xi,ss represents some data points of Xi in a steady-state condition. It is now ready to summarize the definition of stages. Since some clustering algorithm will be used to find clusters of data points, each stage is represented by some particular data points of X as cluster centers satisfying two requirements: (i) the data segment associated with a cluster center contains no less than τ data points being consecutive in time, and (ii) two cluster centers adjacent in time have an amplitude difference no less than Δ, that is, the inequality (4) is satisfied. 4.2. Steps of the Proposed Method. The proposed method requires the following assumptions to be satisfied: A1. Variational directions of X ≔ [X1, X2,···,Xn] have a fixed qualitative relationship in eq 1 when X is in the normal condition, and the relationship is no longer valid when X runs into the abnormal condition. A2. X is in the normal condition initially. A3. The minimum time duration τ and the minimum amplitude shift Δ are known a priori to define the underlying stages. The assumptions require some process knowledge. More precisely, variational directions of related process variables, as well as the two parameters τ and Δ, are required. They are the very basic process knowledge to be learned for anyone being qualified for working on industrial processes. Hence, it should be feasible to verify whether the assumptions are satisfied before applying the proposed method. In some cases, process data may not be in line with these assumptions. For instance, process variables in the normal condition may experience some transient states, where the variational directions of process variables are inconsistent with the relationship in eq 1. If the time durations and amplitude shifts of the transient states are respectively larger than τ and Δ, then the assumption A1 is invalid and the proposed method cannot be applied. The proposed method consists of the following steps: Step 1: Remove the time delays, normalize the data points and quantify the priori process knowledge. First, all time delays are estimated for each pair of X1, X2, ···, Xn, and are removed by shifting the time sequences to remove the time delays. The estimation of time delays has well been studied in literature.29 Here a commonly used cross correlation method is used to estimate the time delay d between two time sequences Xi(t) and Xj(t) with i ≠ j as

Each row of R stands for one solution of (1) under a specific circumstance. For instance, eq 2 says that X1 and X2 vary synchronously, that is,

⎡1 1⎤ ⎥ ⎢ R=⎢0 0⎥ ⎣−1 −1⎦

(7)

Step 2: Cluster data points by incorporating the time dimension. A time dimension vector T0 ≔ t0·[1, 2,···,N]T is incorporated into X, that is, S1 ≔ [T0 , X ] = [T0 , X1 , X 2 , ···, X n]

Here t0 is a time weighting factor, with a default value t0 = 1. The value of t0 is subject to adjustment to ensure time continuities of data points in a cluster (to be clarified in Example 1 later). A so-called density-peak clustering (DPC) algorithm26 is used to find clusters of data points in S1. A brief introduction of the DPC algorithm is given in Appendix. To ensure enough details of X to be captured by clusters, the number of the cluster centers, denoted by the symbol m, should be large enough. However, owing to the requirement on the minimum time duration τ, it is meaningless to choose a too large value of m. A default value of m is chosen as m = ⌊N/τ ⌋, where N is the total number of data points and ⌊·⌋ takes the largest integer less than or equal to the operand. Thus, the DPC algorithm is applied to obtain m clusters from S1, denoted as [K , C ] = DPC(S1 , m)

Here the vector C ≔ [c1,c2,···,cm] contains the time indices of the m cluster centers, and K ≔ [k1,k2,···,kN] with kt ∈ [1, m] for t = 1,2,···,N is a lookup table to indicate the cluster that each data point of S1 belongs to. The time index of each data point in one cluster has to be in the strict time order. This is achieved by Algorithm 1. An indicator

diĵ = arg max ∑ [xi(t )xj(t − d)] d

t

The time sequence Xi(t) is shifted backward (forward if d̂ij < 0) in the time index t for |d̂ij| data points to remove the time delay. Second, all (shifted) process variables need to be normalized with zero mean and unit variance, in order to prevent that certain process variables with large amplitudes play a dominate role. Third, users select the minimum time duration τ and minimum amplitude shift Δ based on some priori process knowledge, or adopt the default values τ = 20 s and Δ in eq 5. Next, solutions of (1) formulate a matrix R with size r × n as ⎡[X1]1 [X 2]1 ⋯ [X n]1 ⎤ ⎢ ⎥ ⎢[X1]2 [X 2]2 ⋯ [X n]2 ⎥ R≔⎢ ⎥ ⋮ ⋱ ⋮ ⎥ ⎢ ⋮ ⎢ ⎥ ⎣[X1]r [X 2]r ⋯ [X n]r ⎦

Algorithm 1. Add the time dimension and reduce the noise effect η is used to denote whether the strict time order is satisfied or not. If η is equal to 1, that is, the strict time order is not satisfied, then the time weighting factor t0 is increased and the clustering step is repeated. Example 1. This is an example to illustrate the role of t0 and the necessity of preserving time continuities of data points inside one cluster. Consider a time sequence of X1 shown in Figure 4. After adding the time dimension with t0 = 1, the DPC algorithm is applied to S1 = [T0, X1] to yield the decision graph in Figure 5(a) and the two clusters in Figure 5(b). Note that the data point A at t = 11 in Figure 4 is closer in amplitude to the data points in the cluster 2 than the counterparts in the cluster 1 in

(6) D

DOI: 10.1021/acs.iecr.7b01868 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

Figure 4. Time sequence plot of X1.

Figure 5. Decision graph (a) and the obtained clusters (b) when t0 = 1.

Step 3: Merge the clusters having short time durations. According to the definition of the minimum time duration τ, variations lasting less than τ data points are ignored as random noise or disturbances. To avoid generating data segments with time durations shorter than τ, each cluster obtained in Step 2 containing less than τ data points is merged into another cluster adjacent in time. This leads to Algorithm 2 that yields p clusters

Figure 5(b). As a result, when t0 = 1, the data point A is assigned to the cluster 2 in Figure 5(b). Similarly, the data point B at t = 12 is assigned to the cluster 1 in Figure 5(b). Thus, not all of the data points in the cluster 1 strictly occur prior to the data points in the cluster 2. Such a time discontinuity is avoided by increasing t0. When t0 is increased to 10, both the data points A and B are correctly assigned to the clusters 1 and 2 in Figure 6(b), respectively. E

DOI: 10.1021/acs.iecr.7b01868 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

Figure 6. Decision graph (a) and the obtained clusters (b) when t0 = 10.

Step 5: Identif y the normal and abnormal data segments. Each element of D belongs to one stage represented by the associated cluster center in E. The p cluster centers in D are in the time order so that the variational directions between stages can be calculated. For centers dj and dk in D, or equivalently, the data points x(dj) and x(dk) in S2, the variational direction is [X ]dk → dj = sign(x(euj) − x(euk))

(8)

Algorithm 3 compares the variational direction with each row of the matrix R in eq 6 to mark the normal and abnormal data

Algorithm 2. Merge the clusters having short time durations as the output, where the vector D contains the time indices of the p cluster centers, that is, D ≔ [d1,d2,···,dp]. Step 4: Determine stages af ter screening in the amplitude dimension. Define a data space containing the cluster centers in D obtained in Step 3 as S2 ≔ [X1(t ), X 2(t ), ···, X n(t )], fort ∈ D

According to the definition of the minimum amplitude shift Δ, any changes in the amplitude dimension smaller than Δ are going to be ignored. This is achieved by applying the DPC algorithm to S2. The vertical coordinate δ in the decision graph provided by the DPC algorithm is the Euclidean distance of clustering centers; thus, only the data point with δ higher than Δ is chosen as a cluster center. This step can be denoted as [U , E] = DPC(S2 , δ > Δ)

Algorithm 3. Identify the normal and abnormal data segments

Here the vector E ≔ [e1,e2,···,eq] keeps the time indices of the obtained cluster centers; U ≔ [u1,u2,···,up] with uj ∈ [1, q] for j = 1, 2, ···, p is a lookup table to allocate the p cluster centers obtained in Step 3 into the q clusters in this step.

segments by the symbols 0 and 1, respectively. eq 8 implies that x(t) is approximated by a series of horizontal segments, where each segment takes the value x(euj) of the cluster center F

DOI: 10.1021/acs.iecr.7b01868 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research for all the points sharing the same cluster center x(dj). Such an approximation of x(t) is denoted as x̂(t). The computational cost of the proposed method is quite minor, since the main time consumption is associated with using twice the DPC algorithm that is efficient in computation, as shown by the time complexity analysis of the DPC algorithm in Appendix. The step 3 of the proposed method ensures that τ is the minimum number of data points in a stage. Because there should be at least two stages in order to determine variational directions of process variables, the minimum number of data points required by the proposed method is equal to 2τ.

5. NUMERICAL EXAMPLES In this section, the proposed method is applied to two numerical examples. The first example illustrates the proposed method via the bivariate system in Section 3. The second example serves as a benchmark to illustrate the two challenges discussed in Section 1 and to compare three representative QTA methods with the proposed method. Example 2: Let us revisit the bivariate system X = [X1,X2] in Section 3 with the time sequence plots in Figure 1 and the scatter plot in Figure 2. Figure 7 shows the decision graph of the DPC

Figure 8. 3D scatter plot of the cluster centers in Step 2 for Example 2.

Figure 9. 3D scatter plot of the cluster centers in Step 3 for Example 2.

Figure 7. Decision graph (a) and the obtained clusters (b) in Step 2 for Example 2.

algorithm and the obtained clusters in Step 2, where the DPC algorithm is applied to S1 = [T0, X1, X2] with t0 = 1, τ = 20, and m = 150. The obtained m clusters are distinguished by different colors and the cluster centers are the data points marked by blue circles in Figure 7(b). The cluster centers of S1 are visualized in Figure 8. Figure 9 is the counterpart of Figure 8 after merging the clusters having short time durations in Step 3. The DPC algorithm is applied to S2 in Step 4 to identify the stages, and Figure 10 shows the decision graph and the obtained clusters in Step 4. In particular, the horizontal red line in Figure 10(a) represents the lower limit δ = Δ and indicates the presence of three clusters. Thus, the data points in S2 are classified into the three clusters labeled by 1, 2, and 3 in Figure 10(b). Note that there is an ample space in the decision graph in Figure 10(a) for choosing Δ from 0.5 to 2 to yield the same clustering results, which makes us very confident on the obtained clustering results. Finally, as described in Step 5, variational directions of the identified stages are calculated and compared with the matrix R in eq 7. The final data segmentation result is given in Figure 11,

Figure 10. Decision graph (a) and the obtained clusters (b) in Step 4 for Example 2.

where the abnormal data segmentation starts from t = 398. Subject to the time ambiguity in τ samples, the normal and G

DOI: 10.1021/acs.iecr.7b01868 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research In the first case, x(t) is generated as ⎧ ⎪ 5(μ1 , σ1), forr(t ) > r0 x1(t ) ∼ ⎨ ⎪ ⎩ 5(μ2 , σ2), forr(t ) ≤ r0

with r(t ) ∼ 5(0, 1), where r0 is a constant and 5(μ , σ ) stands for the Gaussian distribution with mean μ and standard deviation σ. Under the conditions μ2 − μ1 > max(σ1, σ2) > 0, the above abnormality definition in eq 9 says that the corresponding indicating variable for normal and abnormal segments is ⎧ ⎪ 0, for r (t ) > r0 a1(t ) = ⎨ ⎪ ⎩1, for r(t ) ≤ r0

The time sequence plot of x1(t) in one simulation is presented in Figure 12 (the left top subplot). Clearly, x1(t) is in a state with a lower level for most of the time instants, and changes to another state occasionally with a higher level. In the second case, x(t) is generated as

Figure 11. Normal and abnormal data segmentation result for Example 2.

abnormal data segmentation results are consistent with the actual situation in Figure 1. Example 3: For a purpose of comparison, a single process variable x(t) is considered, to which the proposed method is equally applicable. For simplicity, the abnormal condition is defined only for data points that x(t) is increasing, and the remaining points are regarded as being normal, that is, ⎧1, if[x(t )]:= x(t + 1) − x(t ) > 0 a(t ) = ⎨ otherwise ⎩ 0,

⎧ 5(μ , σ1), for t ∈ [1, t1) 1 ⎪ ⎪ ( , σ1), for t ∈ [t1 , t 2) 5 μ ⎨ x 2(t ) ∼ 2 ⎪ ⎪ 5(μ , σ2), for t ∈ [t 2 , t3] ⎩ 3

Under the conditions that μ2 > μ1 > μ3, (9) leads to ⎧ 0, for t ≠ t1 a 2 (t ) = ⎨ ⎩1, for t = t1 ⎪

⎪ ⎪

(10)



(9)

(11)

With σ2 ≥ |μ2 − μ1|, the time sequence plot of x2(t) in one simulation is presented in Figure 12 (the right top subplot). Clearly, the noise level for x2(t) in the interval t ∈ [t2, t3] is much higher than other data points.

Here a(t) is an indicating variable, taking 0 for normal segments and 1 for abnormal ones. In order to illustrate the two challenges discussed in Section 1, two types of x(t) are generated as shown in the top two subplots of Figure 12.

Figure 12. Time sequence plots of x(t) and its estimates from the four methods for two cases in one simulation for Example 3. H

DOI: 10.1021/acs.iecr.7b01868 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

as shown by x̂TD(t), x̂BU(t) and x̂SW(t) in Figure 12 (the left middle three subplots). This is exactly due to the first challenge discussed in Section 1. By contrast, the proposed method addresses this challenge successfully using the DPC clustering algorithm based on the densities of data points. In particular, Step 4 of the proposed method determines stages by finding clusters in the data space S2 that does not include the time dimension. Thus, similar data points far apart in time can be grouped together to formulate clusters. For the second case, the third segment of x2(t) for t ∈ [t2, t3] is associated with large variabilities, which are often larger than the average amplitude change between the segments of x2(t) for t ∈ [1, t1) and that of x2(t) for t ∈ [t1, t2). The top-down, bottom-up and sliding-window methods only consider the amplitude change; as a result, in order to have no missed alarms for the first two segments for t ∈ [1, t2), many variations caused by noises in the third segment of x2(t) are erroneously detected as variations, which lead to false alarms. This is due to the second challenge discussed in Section 1. On the contrary, both the densities and distances of cluster centers are considered by the DPC algorithm in Steps 2 and 4 of the proposed method. The amplitude change between the segment of x2(t) for t ∈ [1, t1) and that of x2(t) for t ∈ [t1, t2) is relatively small (larger than the minimum amplitude shift Δ). However, each segment contains a large number of data points. Thus, the densities of the two segments are high enough for being identified as clusters. The third segment of x2(t) for t ∈ [t2, t3] has a large distance away from the first two segments; being contaminated by noises, the data points in the third segment are scattered with lower densities. As a result, the data points in the third segment are classified into the same cluster. Thus, the noises in the third segment have little negative effects. As a proof, Figure 12 shows that x̂TD(t), x̂BU(t) and x̂SW(t) (the left middle three subplots) contain significance noises, whereas x̂ from the proposed method removes the noise components greatly. The two cases clearly support the satisfactory performance of the proposed method. As stated in the Section 4.1, τ and Δ serve as the boundary parameters in time and amplitude to distinguish noises and stages, respectively. For the proposed method, Steps 2 and 3 find p clusters where each cluster contains as least τ data points. Afterward, the time dimension is removed in Step 4, so that the cluster centers can be picked out globally rather than locally. Step 4 obtains q clusters among the p cluster centers by using the distance δ > Δ as the threshold; moreover, densities are considered by the DPC algorithm in Step 4 to obtain the clusters. Therefore, the proposed method guarantees the validness of the obtained stages in terms of τ and Δ, and resolves the two major challenges discussed in Section 1.

Table 1. Parameters for Simulations in Two Cases and Those for the Four Methods: δx is the Maximum of Cumulative Squared Fitting Errors for the Top-Down, Bottom-up and Sliding-Window Methods, and q is the Number of Clusters Obtained by the Proposed Method

Monte Carlo simulations are performed for x(t) in the two cases, under different mean values and noise levels listed in Table 1. The top-down, bottom-up and sliding-window methods27 are taken as the representative QTA methods to be compared with the proposed method. The three methods share the same tuning parameter δx as the maximum of cumulative squared fitting errors. The number of clusters, denoted by q, is taken as the key tuning parameter of the proposed method. Table 1 lists the different choices of these tuning parameters. In order to compare the estimated indicating variables with the actual ones a1(t) in eq 10 and a2(t) in eq 11, the tuning parameters of all methods are chosen to have no missing alarms, that is, all abnormal data points are correctly detected. Thus, only false alarms are considered. Figure 13 compares the performances of the four methods in terms of the number of false alarms. In each subplot, each method is associated with a group of five boxplots from 100 Monte Carlo simulations under each set of parameters in Table 1. Figure 13 clearly shows that the proposed method results in much less false alarms in all the simulations, while the performances of the other three methods are highly dependent on the simulation parameters, and the number of false alarms are large in many simulations. For the top-down, bottom-up and sliding-window methods, the corresponding estimates of x(t) are denoted as x̂TD(t), x̂BU(t) and x̂SW(t), respectively, while x̂(t) is the estimate of x(t) obtained in Step 5 of the proposed method. Figure 12 shows the time sequence plots of x(t) and its estimates x̂(t), x̂TD(t), x̂BU(t) and x̂SW(t) in one Monte Carlo simulation. For the first case, the large numbers of false alarms for the topdown, bottom-up and sliding-window methods are due to the reason that the three methods are limited by fitting x(t) by a basis function within a local time horizon. Similar data segments being isolated far apart in time cannot be grouped together, and are erroneously mixed to dissimilar data segments adjacent in time,

6. INDUSTRIAL EXAMPLES This section provides two industrial examples to illustrate the effectiveness of the proposed method. The case studies are from a large-scale thermal power plant located at Shandong Province in China. The sampling period of all data sets is 1 s. Example 4: Consider a bivariate system from a variable speed pump, where X1 and X2 are the outlet flow rate and the pump speed, respectively. Based on some physical laws of the pump, X1 and X2 are expected to increase or decrease at the same time under the normal condition, so that the relationship matrix R is the same as that in eq 7. Figure 14 presents the time sequence plots of X1 and X2 before normalization. Figure 15 is the scatter plot of X1 and X2 after normalization. I

DOI: 10.1021/acs.iecr.7b01868 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

Figure 13. Box plots of the numbers of false alarms for the four methods in Example 3: (A) sliding-window, (B) top-down, (C) bottom-up, (D) the proposed one.

Figure 14. Time sequence plots of X1 (top) and X2 (bottom) before normalization for Example 4.

The proposed method is applied to yield 13 clusters in Step 4, as shown in Figure 16. The data segmentation result is shown in Figure 17. There are two abnormal data segments. The first one starts from 5010 to 5163 s, associated with the data segments from the cluster centers 3 to 8 in Figure 16. There is a normal data segment, corresponding to the cluster centers 8 to 10, between the two abnormal segments. The second abnormal segment is

associated with the cluster centers 10 to 13. The segmentation results are consistent with Figure 14. In particular, it is observed from Figure 14 that the first abnormal data segment approximately starts at 5000 s, when the pump speed X2 continues increasing quickly while the outlet flow rate X1 keeps going down until 5163 s. Example 5: The operating status of a coal-fired power generation unit in the power plant can mainly be described by six J

DOI: 10.1021/acs.iecr.7b01868 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

Figure 15. Scatter plot of X1 and X2 after normalization for Example 4.

Figure 16. Decision graph (a) and the obtained clusters (b) in Step 4 for Example 4.

⎡1 1 1 1 1 1⎤ ⎢ ⎥ R=⎢0 0 0 0 0 0⎥ ⎣−1 −1 −1 −1 −1 −1⎦

process variables, namely, the generated power X1, feed-coal flow X2, main steam pressure X3, main steam flow X4, feedwater flow X5, and air flow X6. Thus, a multivariate system is formulated as X ≔ [X1, X2,···, X6]. Based on the physical laws among these six variables, when X is in the normal condition, the six variables increase or decrease in a synchronized manner so that the relationship matrix R is

The proposed method is applied to the time sequence plots of X shown in Figure 18. Figure 19 presents the paired scatter plots K

DOI: 10.1021/acs.iecr.7b01868 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

Figure 17. Normal and abnormal data segmentation result for Example 4.

Figure 18. Time sequence plots of X for Example 5.

and estimated PDFs of X. The data segments for t ∈ [1, 6293] and t ∈ [6294, 6800], being separated by the vertical dash line in Figure 18, are detected as the normal and abnormal segments, respectively. Since the 6-dimensional space is difficult to visualize, the subset [X3, X6] is investigated here for illustration. Figure 20 presents the decision graph and the obtained clusters in Step 4 of the proposed method. Figure 21 shows the scatter plot of X3 and X6 with the time index t. The zigzag parts in the scatter plots are the abnormal segments that are well captured by the clusters 6, 7, and 8 in Figures 20 and 21. The detected abnormal data segment can be revealed from Figures 18 and 19. In the right of the vertical line in Figure 18, the main steam pressure X3 increases, while other five process

variables decrease. The paired scatter plots between X3 and other five process variables in Figure 19 have zigzag parts being inconsistent with the relationship matrix R. The detection result is also confirmed by industrial plant operators. It is related to an accident of emergency shutdown of the power generation unit occurred on January 24, 2014. In the middle of an operation to reduce the generated power X1, plant operators mistakenly increased the feed-coal flow X2 for a while, so that the main steam pressure X3 increased, in a direction inconsistent with other variables. The increased thermal energy could not be released, because the consumption of main steam (the main steam flow X4) kept reducing. As a result, the increment of the main steam pressure X3 was further enlarged, until that the L

DOI: 10.1021/acs.iecr.7b01868 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

Figure 19. Scatter plots and estimated PDFs (diagonal subplots) of X for Example 5.

mistake was noticed by operators and the feed coal flow X2 was reduced dramatically by shutting down coal grinding mills.

ith point and the jth data point in a high-dimensional space. A local density ρi for the ith point is defined as ρi =

7. CONCLUSION This paper proposes a new method to automatically detect normal and abnormal data segments from historical data sets for multiple process variables. The underlying stages are defined by the minimum time duration τ and the minimum amplitude shift Δ. The variational directions of process variables are assumed to have a fixed qualitative relationship in eq 1. Under the assumption that the initial stage is in the normal condition, the variational directions of process variables between the current stage and the latest normal stage are compared with the relationship in eq 1. If the variational directions are consistent (inconsistent) with eq 1, then the current stage is in the normal (abnormal) condition. The stages are obtained by exploiting a density-based clustering algorithm twice. In particular, Steps 2 and 3 of the proposed method (Section 4.2) find p clusters, each of which contains as least τ consecutive data points. Step 4 obtains q clusters among the p cluster centers by removing the time dimension, considering densities in identifying the clusters and using the distance δ > Δ as the threshold. The proposed method is able to well address the two major challenges in identifying stages (Section 1), as illustrated by a comparison with three representative QTA methods in Example 3. Several numerical and industrial examples are also presented to support the validness of the proposed method.

∑ exp(−(dij/dc)2 ) j

The cutoff distance dc is defined as the second percentile of the data set {dij}. The DPC algorithm is sensitive only to the relative magnitude of ρi so that the choice of dc is not critical. The distance δi for the ith point is defined as the minimum distance from any higher density points to the ith point, that is, δi = min (dij) j : ρj > ρi

The distance for the point having the highest density is defined as δk = arg max(ρi ) = max (dkj) i

j

The scatter plot of [ρi,δi] is referred to as the decision graph. There are two ways to determine the clusters. One way selects two thresholds ρ and δ so that the data points on the decision graph with ρi higher than ρ and δi higher than δ are determined as the cluster centers. Another way is to preselect the number m of cluster centers, calculate the Euclidean distance,

Di =



APPENDIX The appendix presents a brief introduction of the density-peak clustering (DPC) algorithm.26 The DPC algorithm looks for the cluster centers characterized by a higher density than their neighbors and by a relatively large distance from points with higher densities. Denote dij as the Euclidean distance between the

⎛ ⎞2 ⎛ ⎞2 ρi δi ⎜ ⎟ ⎜ ⎟ ⎜⎜ max (ρ ) ⎟⎟ + ⎜⎜ max (δ ) ⎟⎟ j j ⎝ j ⎠ ⎝ j ⎠

(12)

and select the data points associated with the m largest values of Di as the cluster centers. Both ways are used by the proposed method in this paper. Once the cluster centers are determined, each remaining point is assigned to the higher density cluster whose center is the nearest. M

DOI: 10.1021/acs.iecr.7b01868 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research



ACKNOWLEDGMENTS



REFERENCES

Figure 20. Decision graph (left) and the obtained clusters (right) for X3 and X6 in Step 4 for Example 5.

This work was supported by the National Natural Science Foundation of China under Grant No. 61433001 and the Research Fund for the Taishan Scholar Project of Shandong Province of China.

(1) Venkatasubramanian, V.; Rengaswamy, R.; Yin, K.; Kavuri, S. N. A review of process fault detection and diagnosis: Part I:Quantitative model-based methods. Comput. Chem. Eng. 2002, 27 (3), 293−311. (2) Venkatasubramanian, V.; Raghunathan, R.; Kavuri, S. N. A review of process fault detection and diagnosis: Part II: Qualitativemodels and search strategies. Comput. Chem. Eng. 2002, 27 (3), 313−326. (3) Venkatasubramanian, V.; Rengaswamy, R.; Kavuri, S. N.; Yin, K. A review of process fault detection and diagnosis: Part III:Process history based methods. Comput. Chem. Eng. 2002, 27 (3), 327−346. (4) Willsky, A. S. A survey of design methods for failuredetection in dynamic systems. Automatica 1976, 12 (6), 601−611. (5) Basseville, M. Detecting changes in signals andsystems: a survey. Automatica 1988, 24 (3), 309−326. (6) Maestri, M.; Ziella, D.; Cassanello, M.; Horowitz, G. Automatic qualitative trend simulation method for diagnosing faults inindustrial processes. Comput. Chem. Eng. 2014, 64, 55−62. (7) Palmer, C.; Chung, P. W. Creating signed directedgraph models for process plants. Ind. Eng. Chem. Res. 2000, 39 (7), 2548−2558. (8) Lee, G.; Lee, B.; Yoon, E. S.; Han, C. Multiple-faultdiagnosis under uncertain conditions by the quantification of qualitativerelations. Ind. Eng. Chem. Res. 1999, 38 (3), 988−998. (9) Peng, D.; Geng, Z.; Zhu, Q. A multilogic probabilisticsigned directed graph fault diagnosis approach based on Bayesian inference. Ind. Eng. Chem. Res. 2014, 53 (23), 9792−9804. (10) Basseville, M., Nikiforov, I. V. Detectionof Abrupt Changes: Theory and Application; Prentice Hall: Englewood Cliffs, 1993. (11) Jiang, Q.; Yan, X.; Li, J. PCA-ICA integrated with Bayesian method for non-Gaussian fault diagnosis. Ind. Eng. Chem. Res. 2016, 55 (17), 4979−4986.

Figure 21. Scatter plot of t, X3 and X6 with the obtained clusters for Example 5.

Denote the number of data points as N. The computational cost to obtain the Euclidean distance dij is O(N2). The time consumption to calculate ρi is O(N). In the worst case, the computational cost for the calculation of δi is O( log ). The final cluster assignment takes about O(). Hence, the time complexity of the DPC algorithm is O(N2), which is efficient in practical application as an offline method.



AUTHOR INFORMATION

Corresponding Author

*Phone: +86 (532) 8605-8103; e-mail: [email protected]. ORCID

Jiandong Wang: 0000-0003-2635-8724 Notes

The authors declare no competing financial interest. N

DOI: 10.1021/acs.iecr.7b01868 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research (12) Bardou, O.; Sidahmed, M. Early detection of leakagesin the exhaust and discharge systems of reciprocating machines by vibrationanalysis. Mech. Syst. Signal Proc. 1994, 8 (5), 551−570. (13) Pichler, K.; Lughofer, E.; Pichler, M.; Buchegger, T.; Klement, E. P.; Huschenbett, M. Fault detection in reciprocatingcompressor valves under varying load conditions. Mech. Syst. Signal Proc. 2016, 70, 104− 119. (14) Savitzky, A.; Golay, M. J. Smoothing anddifferentiation of data by simplified least squares procedures. Anal. Chem. 1964, 36 (8), 1627− 1639. (15) Maurya, M. R.; Paritosh, P. K.; Rengaswamy, R.; Venkatasubramanian, V. A framework for on-line trend extraction and faultdiagnosis. Eng. Appl. Artif. Intel. 2010, 23 (6), 950−960. (16) Charbonnier, S.; Portet, F. A self-tuningadaptive trend extraction method for process monitoring and diagnosis. J. Process Control 2012, 22 (6), 1127−1138. (17) Zhou, B.; Ye, H.; Zhang, H.; Li, M. A newqualitative trend analysis algorithm based on global polynomial fit. AIChE J. 2017, DOI: 10.1002/ aic.15706. (18) Vedam, H.; Venkatasubramanian, V.; Bhalodia, M. Ab-spline based method for data compression, process monitoring anddiagnosis. Comput. Chem. Eng. 1998, 22, S827−S830. (19) Flehmig, F.; Watzdorf, R. V.; Marquardt, W. Identification of trends in process measurements using the wavelettransform. Comput. Chem. Eng. 1998, 22, S491−S496. (20) Flehmig, F. U.; Marquardt, W. Detection ofmultivariable trends in measured process quantities. J. Process Control 2006, 16 (9), 947−957. (21) Villez, K.; Rosén, C.; Anctil, F.; Duchesne, C.; Vanrolleghem, P. A. Qualitative representation of trends(QRT): extended method for identification of consecutive inflection points. Comput. Chem. Eng. 2013, 48, 187−199. (22) Janusz, M. E.; Venkatasubramanian, V. Automaticgeneration of qualitative descriptions of process trends for fault detectionand diagnosis. Eng. Appl. Artif. Intel. 1991, 4 (5), 329−339. (23) Villez, K. Qualitative path estimation: afast and reliable algorithm for qualitative trend analysis. AIChE J. 2015, 61 (5), 1535−1546. (24) Rengaswamy, R.; Venkatasubramanian, V. A syntactic patternrecognition approach for process monitoring and faultdiagnosis. Eng. Appl. Artif. Intel. 1995, 8 (1), 35−51. (25) ISA-18.2. Management of Alarm Systems for the Process Industries; ISA (Instrumentation, Systems & Automation Society): NC, 2009. (26) Rodriguez, A.; Laio, A. Clustering by fast searchand find of density peaks. Science 2014, 344 (6191), 1492−1496. (27) Keogh, E., Chu, S., Hart, D., Pazzani, M. An onlinealgorithm for segmenting time series. In Proceedings of IEEE International Conference on Data Mining, California, USA, Nov. 29 to Dec. 2, 2001, 289−296. (28) Zhou, B.; Ye, H. A study of polynomial fit-based methods for qualitative trend analysis. J. Process Control 2016, 37, 21−33. (29) Björklund, S. A survey and comparison oftime-delay estimation methods in linear systems. PhD Thesis, Linköping University, Sweden, 2013.

O

DOI: 10.1021/acs.iecr.7b01868 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX