Article Cite This: Ind. Eng. Chem. Res. 2019, 58, 4518−4533
pubs.acs.org/IECR
Finding Significant Qualitative Trend Combinations for Multivariate Systems from Historical Data Kuang Chen† and Jiandong Wang*,‡ †
College of Engineering, Peking University, Beijing, China College of Electrical Engineering and Automation, Shandong University of Science and Technology, Qingdao, China
Downloaded via UNIV OF FLORIDA on March 22, 2019 at 13:53:35 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.
‡
ABSTRACT: Data segments in similar patterns are often prerequisites for process monitoring, data mining, and decision making tasks. This paper proposes a probabilistic clustering method to automatically find significant qualitative trend combinations from historical time sequences of multiple process variables. The main idea is to obtain the histograms of decreasing, steady, and increasing qualitative trends in a data segment, and to find the qualitative trend combinations using a clustering algorithm referred to as the multifeature topic model. Numerical and industrial examples are provided to illustrate the effectiveness of the proposed method.
1. INTRODUCTION Due to great advances and widespread application of sensors and automatic data collection systems, modern industrial plants are with rapidly growing historical databases. An important research topic is to extract useful information from the vast amounts of historical data samples.1 In particular, in order to prevent abnormalities, many process variables should be monitored for scenarios such as industrial alarm systems2,3 and intensive care units.4,5 Hence, there are strong demands for high-level abstract expression of time series in multivariate systems. Qualitative trends like decreasing, steady and increasing trends are typical high-level information extracted from time series.6 Qualitative trend analysis (QTA) methods have been developed to extract the qualitative trends of signals.7−12 The qualitative trend combination is defined as a vector consisting of the qualitative trends of process variables. For example, if decreasing, steady and increasing trends are denoted as −1, 0, + 1 respectively, a qualitative trend combination for a bivariate system X = [X1,X2] could be [+1,−1]. Such a qualitative trend combination helps engineers to quickly understand the dynamic behaviors of many related process variables in a multivariate system. Manually extracting the possible qualitative trend combinations would be sometimes acceptable for single process variable. However, it will cost too much time for complicated systems with multiple process variables being involved. For instance, when three types of qualitative trends are involved in a multivariate system with R process variables, the total number of possible qualitative trend combinations is 3R. Clearly, such a number grows exponentially, for example, 27, 81, 243,··· for R = 3, 4, 5,···. Thus, it is too heavy for manual analysis because industrial processes usually involve more than three process variables. In addition, only small portions of these qualitative trend combinations are significant for distinguishing different operating conditions, whereas the others may be rarely seen in © 2019 American Chemical Society
historical data samples. Hence, it would be very helpful in practice to automatically find out the meaningful qualitative trend combinations for multivariate systems. The major challenge of doing so is that the time lengths of data segments may not be the same, and qualitative trends of process variables may not be in a strict synchronous manner. This asynchronism may be a result of noises, time delays among process variables, or different dynamics in systems. As a result, qualitative trend combinations cannot be simply obtained by slicing the qualitative trends of process variables at the same time stamp. This paper proposes a probabilistic clustering method to automatically find qualitative trend combinations from historical time sequences of multiple process variables. The main idea of the proposed method is to obtain the histograms of decreasing, steady, and increasing qualitative trends of a data segment extracted from historical time sequences, and to find the qualitative trend combinations by clustering the histograms. A multifeature topic model (MFTM) is developed for clustering the histograms. The MFTM is a novel extension of a popular probabilistic latent semantic analysis (PLSA) topic model. Compared with the PLSA, the MFTM is designed in the higher dimension feature space in order to deal with multivariate systems. The qualitative trend combination corresponding to each cluster is obtained by deriving the type of trends with the highest probability on each process variable, and the MFTM assigns cluster labels to data segments with probabilities, instead of hard partitioning. The computational cost of building the MFTM is minor for multivariate systems. The asynchronism in multiple process variables is assumed to be limited within a minimum data length, and the data length of each data segment Received: Revised: Accepted: Published: 4518
November 25, 2018 February 18, 2019 February 25, 2019 February 25, 2019 DOI: 10.1021/acs.iecr.8b05864 Ind. Eng. Chem. Res. 2019, 58, 4518−4533
Article
Industrial & Engineering Chemistry Research
Figure 1. (a) Time sequence plots of 10 data segments in a bivariate system X = [X1,X2]. (b) Time sequence plots of the qualitative trends X⃗ 1 and X⃗ 2. (c) Qualitative trend histograms of X1 and X2.
means algorithm29 and self-organizing map algorithm30,31 are the commonly used methods. For the objective of extracting qualitative trend combinations, the performance of these similarity measures and clustering algorithms are compared with the proposed method in Example 2 of Section 3, and the superiority of the proposed method is observed. The rest of the paper is organized as follows. Section 2 presents the proposed method. Section 3 provides examples to illustrate the proposed method and compares it with other timeseries clustering methods. Section 4 discusses the advantages and shortcomings of the proposed method. Section 5 concludes the paper.
is required to be larger than the minimum data length. With the utilization of qualitative trend histograms, the proposed method could handle the challenge of the asynchronism in multiple process variables. Finding significant qualitative trend combinations for multivariate systems is highly related to the clustering problem for data segments in multivariate time series. A number of methods have been developed to cluster time series. According to the reviews by Liao13 and Aghabozorgi et al.,14 the following similarity measures in shape are suitable for finding data segments in the same qualitative trend combinations, namely, Euclidean distance,15,16 dynamic time warping based distance,17−20 cross-correlation based distance,21 the longest common subsequence based distance,22,23 J divergence,24 real sequence edit distance.25,26 For the clustering algorithms, kmeans algorithm,27 hierarchical clustering algorithm,28 fuzzy c-
2. THE PROPOSED METHOD 2.1. Problem Description. A multivariate system X with R process variables is formulated as X = [X1,X2,···, XR]. The time 4519
DOI: 10.1021/acs.iecr.8b05864 Ind. Eng. Chem. Res. 2019, 58, 4518−4533
Article
Industrial & Engineering Chemistry Research
where 1 ≤ r1,r2 ≤ R. Then the time delay is removed by shifting the time sequences xr1(1:L) backward τr1,r2 data points if τr1,r2 > 0 (forward if τr1,r2 < 0). For multivariate systems, a pivot process variable is specified first, for example, the one being the most important for the efficient and safe operations of X. Then, the time sequences of other process variables are shifted to remove time delays to the pivot process variable. In addition, with the utilization of qualitative trend histograms, the proposed method can partially tolerate the asynchronism caused by time delays. For instance, even though qualitative trends of data segments in Section 2.2 are not strictly synchronous, such time delays would not affect the major qualitative trend combinations corresponding to qualitative trend histograms. Each historical time sequence xr(1:L) on Xr is converted into Nr linear segments by some piecewise linear representation (PLR) methods,8,10,34,35
sequence of the rth process variable Xr is represented by xr(1:L) = [xr (1),xr (2), ···,xr(L)]T with r = 1,2,···,R. The qualitative trend of Xr is denoted as X⃗ r, taking the value −1, 0, + 1 for decreasing, steady and increasing trends, respectively.32 The corresponding time sequence of X⃗ r is x⃗r(1:L) = [x⃗r(1), x⃗r (2),···,x⃗r(L)]T. The qualitative trend combination z of these process variables is defined as z = [X1⃗ , X⃗ 2 , ···, XR⃗ ]
(1)
All the qualitative trend combinations in historical data segments of X form a set Z = {z1, z2,···, zK}, where the kth combination zk with 1 ≤ k ≤ K is a realization of z in (1). The qualitative trend combinations supported by enough many data segments with high probabilities are defined as the significant qualitative trend combinations in a set Z0, which is a subset of Z, that is, Z0⊂Z. G i v e n t h e h i s t o r i c a l t i m e s e q u e n c e x( 1: L) = [x1(1:L),x2(1:L),···,xR(1:L)], the objective is to find Z and Z0. Further, similar data segments in each qualitative trend combination zk with 1 ≤ k ≤ K should be also found. 2.2. Main Idea. This subsection presents an example to illustrate the main idea of the proposed method. Consider a bivariate system consisting of two process variables, X = [X1,X2]. Ten data segments of the historical time sequences of X with unequal time lengths are shown in Figure 1(a). The algorithm to determine each data segment will be given later in Section 2.3. The time sequences of qualitative trends X⃗ 1 and X⃗ 2 are generated as shown in Figure 1(b). The time lengths of data segments may not be the same, and the qualitative trends of X1 and X2 may not be in a strictly synchronous manner. For instance, X1 and X2 in the fifth segment are increasing before t = 20, but X2 is decreasing while X1 keeps increasing after t = 20, so that two qualitative trend combinations [+1, + 1] and [+1, − 1] are presented in this segment. Such an asynchronism in multiple process variables is the major challenge in finding a solution to the problem in Section 2.1. The multifeature topic model (MFTM) will be introduced to find qualitative trend combinations by clustering the qualitative trend histograms of data segments. For each data segment, histograms of qualitative trends are composed by ratios of three elements f r,1 = −1, f r,2 = 0, f r,3 = +1 for decreasing, steady and increasing trends of Xr, r = 1,2,···,R, as shown in Figure 1(c). In spite of the various time lengths and the asynchronism of qualitative trends, the qualitative trend histograms show that the five segments in the top half are all major in the qualitative trend combinations of both increasing [+1, +1] and the five segments in the bottom half are all major in [−1, −1]. The total number of all possible qualitative trend combinations is 3 × 3 = 9. Here, only two of them are significant, namely, both increasing [+1, +1] and both decreasing [−1, −1]. Details of the MFTM will be given later in Section 2.4. 2.3. Detailed Steps. This subsection presents the detailed steps of the proposed method. For the multivariate system X = [X1,X2,···,XR] in this paper, time delays should be removed first. The time delays are estimated by a commonly used cross correlation method in literature.33 The time delay τr1,r2 between time sequences xr1(1:L) and xr2(1:L) is
{xr̂ (ti , r : ti + 1, r − 1)}iN=r 1 = PLR(xr(1: L))
where ti,r is the starting time index of the ith data segment, i = 1,2,···,Nr, t1,r = 1, tNr+1,r = L + 1. In PLR, linear regressions are applied to generate linear segments, xr̂ (t ) = ai , r + bi , r ·t
τ
t=1
(4)
where ti,r ≤ t ≤ ti+1,r−1, i = 1,2,···,Nr. Using the least-square method, the intercept ai,r and the slope bi,r are calculated as l o ∑ t 2 ∑ xr̂ (t ) − ∑ t ∑ txr̂ (t ) o o = a o o i,r o (ti + 1, r − ti , r ) ∑ t 2 − ( ∑ t )2 o o o m o o o ∑ txr̂ (t ) − ∑ t ∑ xr̂ (t ) o o bi , r = o o o (ti + 1, r − ti , r ) ∑ t 2 − ( ∑ t )2 o n
(5) ti+1,r−1
where the summation symbol is the abbreviation of ∑t=ti,r . 34 Most PLR methods can be grouped into three categories, namely, the sliding window algorithm, the top-down algorithm, and the bottom-up algorithm. Among them, the bottom-up algorithm usually performs the best.12,36 Thus, the proposed method directly uses the bottom-up algorithm to obtain the segmentation result in eq 3. Next, qualitative trends are obtained based on the segmentation result. The hypothesis test on the significance of bi,r is used to yield the qualitative trends,37 H0: bi , r = 0, H1: bi , r ≠ 0.
(6)
The null hypothesis H0 is rejected when |bi , r | σ̂
Sxx ≥ tα /2(n − 2)
(7)
where tα /2(n − 2) is the quantile of t-distribution with n − 2 degrees of freedom on the α significance level, for example, α = 0.05, and l ti + 1, r − 1 o o o 1 o o σ̂ = ∑ [xr(t ) − ai ,r − bi ,r ·t ]2 o o o n − 2 t=t o o i ,r o o m o 2 o o ti + 1, r − 1 ij ti+1,r − 1 yz o o j z 1 o 2 j z o Sxx = ∑ t − jj ∑ t zz o o o ti + 1, r − ti , r jjj t = t zzz o o t = ti , r i ,r k { n
L
τr1, r2 = arg max ∑ [xr1(t )xr2(t − τ )]
(3)
(2) 4520
DOI: 10.1021/acs.iecr.8b05864 Ind. Eng. Chem. Res. 2019, 58, 4518−4533
Article
Industrial & Engineering Chemistry Research Thus, the qualitative trend x⃗r(t) for ti,r ≤ t ≤ ti+1,r − 1 is obtained as l o o sign(bi , r ) , if (7) is true xr⃗ (t ) = o m o o o , else n0
temperature variables, for raising (clearing) an alarm when at least T consecutive data points overpass an alarm threshold. Besides, an assumption should be made that the asynchronism among X1,X2,···,XR are limited within T. The parameter sensitivity of T will be investigated in Example 1. Data segments D = {d1,d2,···,dN} are determined using S = {ti,i = 1,2,···,N}, that is,
(8)
where sign (·) is the signum function, i = 1,2,···,Nr, t1,r = 1, tNr+1,r= L+1. The PLR produces a split time index set Sr = {ti,r,i = 1,2,···, Nr} for each process variable Xr. An integrated split time index set S = {ti,i = 1,2,···,N} is created by integrating Sr with a constraint of the minimum data length T to avoid too short data segments. The pseudocode to obtain S is given in the Algorithm described in Chart 1. Here the length T is specified as the minimum data
di = [x1⃗ (ti: ti + 1 − 1), x 2⃗ (ti: ti + 1 − 1), ···, xR⃗ (ti: ti + 1 − 1)]
(9)
where ti+1 − ti ≥ T. For instance, two time sequences shown in Figure 2(a) and (c) can be converted into a series of 2dimensional data segments marked 1,2,···,9 with T = 100, shown in Figures 2(g) and (i). The split time index sets S1, S2 and S are plotted as the vertical solid lines in Figure 2(b), (d), and (e), respectively. Dashed lines in Figure 2(e) are the split time indexes failing to meet the minimum data length constraint. It is worthy to point out that after deploying Algorithm 1, a segment may contain more than one type of trends, such as the segment 2 in Figure 2 enclosing the values of “1” and “−1”. Thus, qualitative trend histograms are exploited. For each data segment, qualitative trend histograms are composed by ratios of three elements f r,1 = −1, f r,2 = 0, f r,3 = +1 for decreasing, steady and increasing trends of Xr, r = 1,2,···,R. From the qualitative trend histograms of data segments D = {d1,d2,···,dN}, the MFTM is built as a probabilistic clustering model to produce K clusters C = {ck}k K= 1 with conditional probabilities P(ck|di) and P(f r,jr|ck), k = 1,2,···,K, i = 1,2,···,N, r = 1,2,···,R, j r = 1,2,3. By choosing the element f r,jr with the highest probability P( f r,jr|ck) on each process variable Xr, each cluster ck can be explicitly expressed as the corresponding qualitative trend combination zk, k = 1,2,···,K, r = 1,2,···,R, jr = 1,2,3, namely,
Chart 1. Algorithm to Generate the Integrated Split Time Index Set S
length to formulate a qualitative trend. The concept of T often appears in practice. The industrial standard38,39 recommends 15 s for flow rate and pressure variables, and 60 s for level and
Figure 2. Time sequence plots for two time sequences in (a) and (c), the corresponding qualitative trends in (b) and (d), the integrated split time index set in (e), the split data segments in (f) and (h), the corresponding segments in (g) and (i) separated by vertical lines. 4521
DOI: 10.1021/acs.iecr.8b05864 Ind. Eng. Chem. Res. 2019, 58, 4518−4533
Article
Industrial & Engineering Chemistry Research z k = [arg max P(f1, j |ck), arg max P(f2, j |ck), ···, arg max P(fR , j |ck)] 1
f1, j
1
f2, j
2
2
fR , j
R
R
(10)
Each data segment di in data segments D is labeled with the most likely qualitative trend combination ẑi, that is, zî = zarg maxP(ck | di) k
(11)
and the corresponding probability pi is pi = P(zî |di)
(12)
Details of the MFTM are going to be presented in Section 2.4. As defined in Section 2.1, significant qualitative trend combinations Z0 would be a subset of the set Z = {z1,z2,···,zK}. Z0 should be highly supported by enough data segments. As the summation of P(ck|di) for k ∈ [1,K] equals 1, this means that ck (zk) is more significant when more data segments with high values of P(ck|di) are seen in data segments D. Thus, a procedure to find Z0 is proposed as follows: 1. Calculate the mean value of [P(ck|di)]q for ck, i.e., μk =
1 N
N
∑ [P(ck|di)]q i=1
(13) Figure 3. A schematic diagram for the steps of the proposed method.
Here the exponent order q is an integer no less than 2, in order to increase the weight of higher values. A default value q = 3 is used in the sequel. 2. Compute the global mean value for C, i.e., μ0 =
1 NK
N
model and the MFTM are described by Bayesian nets as shown in Figure 4. The MFTM is an extended case of the PLSA model in the higher dimension feature space, or in other words, the PLSA model is a special case of the MFTM.
K
∑ ∑ [P(ck|di)]q i=1 k=1
(14)
3. Choose the corresponding qualitative trend combination zk of ck satisfying the constrain μk > μ0 as one of Z0, Z0 = {zk|μk > μ0 , zk ∈ Z , 1 ≤ k ≤ K}
(15)
The steps of the proposed method are shown in the schematic diagram in Figure 3. Details of the MFTM are presented in the next section. 2.4. Multifeature Topic Model. This subsection presents the details of the MFTM. Since the MFTM is an extension of the probabilistic latent semantic analysis (PLSA) model, the PLSA model is revisited first. The PLSA finds latent topics for documents in a corpus.40 Suppose that documents D = {d1,d2,···,dN} are composed of words W = {w1,w2,···,wM}, and topics are represented by latent variables C = {c1,c2,···,cK}. The PLSA holds an unigram assumption: each word in a document is independently and identically generated from a topic with a certain probability. Therefore, the joint probability of an observation pair (di,wj) with i = 1,2,···,N and j = 1,2,···,M is
Figure 4. A schematic comparison of the proposed MFTM model with the PLSA model.
P(di , wj) = P(di)P(wj|di)
Suppose that R types of features are included, and denote the set of features as F = {F1,F2,···,FR}. Each feature Fr is composed by Mr elements, that is, Fr = {f r,1,f r,2,···,f r,Mr}, r = 1,2,···,R. In this manuscript, Fr is defined as the set of decreasing, steady and increasing trends of Xr, that is, f r,1 = −1, f r,2 = 0, f r,3 = +1 and M r= 3 for r = 1,2,···,R. Similar to the unigram assumption for the PLSA model, each feature is assumed to be conditionally dependent on the cluster ck, but the features are independent to each other. Thus, an observation (di,f1,j1,f 2,j2,···,f R,jR) can be split
K
= P(di) ∑ P(wj|ck)P(ck|di). k=1
(16)
Symbols P(di), P(wj|ck) and P(ck|di) are respectively the probability of the ith document di, the conditional probability of the jth word wj for a given topic ck, and the conditional probability of ck with respect to di. The PLSA model in eq 16 is extended to the MFTM by replacing the words with multiple features. Both the PLSA 4522
DOI: 10.1021/acs.iecr.8b05864 Ind. Eng. Chem. Res. 2019, 58, 4518−4533
Article
Industrial & Engineering Chemistry Research into several independent observation pairs (di,f1,j1), (di,f 2,j2), ···, (di,f R,jR), that is, P(di , f1, j , f2, j , ···, fR , j ) = P(di , f1, j )P(di , f2, j )··· P(di , fR , j ) 1
R
2
1
2
R
N
=
Mr
R
K
ÑÑÖ ÅÅÇ k=1 ÉÑ ÄÅ Mr Å K ÅÅ P(ck , di , fr , j ) ÑÑÑ ÅÅ r Ñ ÑÑ. ∑ ÅÅÅÅn(di , fr ,jr )·log ∑ P(ck|di , fr ,jr ) Ñ P(ck|di , fr , j ) ÑÑÑÑ ÅÅÇ jr = 1 Å k=1 r Ñ Ö
i = 1 r = 1 jr = 1 N
=
R
∑∑ i=1 r=1
= P(di) ∑ P(fr , j |ck)P(ck|di),
The Jensen inequality, that is, log ∑k λk yk ≥ ∑k λk log yk where ∑k λk = 1, λk ≥ 0 and yk ≥ 0, is used to yield the lower bound B(θ) of L1(θ) in (22) as
r
(18)
N
for r = 1,2,···,R. Define the set of unknown parameters as
L1(θ) ≥
∑∑
The M-step maximizes the lower bound B(θ) of L1(θ) and so that L1(θ) could be increased at the same time. As a result, with the iterations of E-step and M-step, the log-likelihood L1(θ) will be maximized and the solution is obtained, that is,
for k = 1,2,···,K, i = 1,2,···,N, r = 1,2,···,R and jr = 1,2,···,Mr. In R total, there are NK + ∑r = 1 Mr K unknowns in θ. The maximum likelihood estimation (MLE) method is commonly used to estimate unknowns in probability models. However, it is rather difficult to apply the MLE method to the log-likelihood function L1(θ) in eq 19 for the nonlinearity caused by the sum of products of unknown parameters in the logarithmic term,
θ ̂ = arg max L1(θ ) = arg max B(θ ) θ
Mr
∑ ∑ ∑ [n(di , fr ,jr )log P(di , fr ,jr )]
ÉÑ| ÄÅ K l ÑÑo ÅÅ o o ÑÑo ÅÅ o o. Å n ( d , f ) log P ( f c ) P ( c d ) · | | m ∑ oo i r ,jr ÑÑ} ÅÅ ∑ r , j k k i Ñ o o Ñ Å r o Ñ Å jr = 1 n ÑÖo ÅÇ k = 1 ~
=
∑∑ i=1 r=1
r
r
r
=
The solution is θ ̂ = arg max B(θ ) = arg max L 2(θ ) θ
(20)
K
∑
r
K
∑l = 1 P(s)(fr , j |cl)P(s)(cl|di) r
(26)
θ
With the constraints that ∑
Mr jr = 1
P(fr , j |ck) = 1 and r
P(ck|di) = 1, the Lagrangian function is k=1 ij j
yz z
ji
zy
∑ ∑ τr ,kjjjjj1 − ∑ P(fr ,jr |ck)zzzzz + ∑ ρi jjjjj1 − ∑ P(ck|di)zzzzz R
H = L 2(θ) +
K
r=1 k=1
j k
Mr
N
z {
jr = 1
i=1
k
K
k=1
{
where τr,k and ρi are the Lagrange multipliers. By setting the partial derivatives of H with respect to P(f r,jr|ck), P(ck|di), τr,k and ρi as zeros, the solution θ ̂ = arg max L (θ ) in the (s + 1)th 2
θ
R M l o ∑r = 1 ∑ j =r 1 n(di , fr , j )P(s + 1)(ck|di , fr , j ) o o o r r r o o P(s + 1)(ck|di) = o R o o ∑r = 1 nr (di) o o m o N o o ∑i = 1 n(di , fr , j )P(s + 1)(ck|di , fr , j ) o o r r o o P (f | c ) = M o N o r o (s + 1) r , jr k ∑ ∑ | n d f P c d ( , ) ( , o i r , mr (s + 1) k i fr , mr ) o mr = 1 i=1 n
iteration is
r
P(s)(fr , j |ck)P(s)(ck|di)
k=1
(25)
P(s)(ck , di , fr , j ) K ∑l = 1 P(s)(cl ,
K
i=1 r=1 r
Instead, the expectation maximization (EM) algorithm41 is adopted to estimate θ based on observations (di,f1,j1),(di,f 2,j2),···, (di,f R,jR). The EM algorithm is composed by iterations of E-step and M-step as follows. Denote the current calculation as the (s + 1)th iteration, s = 1,2,···. For the E-step, P(ck|di) and P(f r,jr|ck) in θ take the values estimated from the M-step in the previous (s)th iteration. Bayes’ rule is applied to reach the posterior probability of the latent variable zk, P(s + 1)(ck|di , fr , j ) =
Mr
R
(19)
Here the second equality is from eq 18 with P(di) = 1 for a specific data segment di, and n(di,f r,jr) is the number of the occurrences of the element f r,jr in di, n(di , fr , j ) = |{t |xr⃗ (t ) = fr , j , ti ≤ t ≤ ti + 1 − 1}|
l o o j =1 o n
∑ ∑ ∑ omoon(di , fr ,jr )· ∑ {P(ck|di , fr ,jr )log[P(fr ,jr |ck)P(ck|di)]}} N
L 2(θ) =
Mr
R
(24)
θ
Since P(ck|di,f r,jr) is known at the M-step of the current iteration, it could be easily proven that B(θ) in (23) and the loglikelihood function L2(θ) in eq 25are maximized at the same time.
i = 1 r = 1 jr = 1
N
(23)
= B(θ)
r
R
ÉÑ ÅÄÅ K ÅÅ P(ck , di , fr , j ) ÑÑÑ ÅÅ r Ñ ÑÑ Å ∑ ÅÅÅn(di , fr ,j )· ∑ P(ck|di , fr ,j )log Ñ r r P(ck|di , fr , j ) ÑÑÑÑ Å 1 = jr = 1 Å k ÅÇ r Ñ Ö Mr
R
i=1 r=1
θ = {P(ck|di), P(fr , j |ck)}
N
(22)
42
r
K
L1(θ) =
ÑÉÑ ÑÑ
∑ ∑ ∑ ÅÅÅÅÅn(di , fr ,jr )log ∑ P(ck , di , fr ,jr )ÑÑÑÑÑ N
(17)
P(di , fr , j ) = P(di)P(fr , j |di)
k=1
ÅÄÅ ÅÅ
i = 1 r = 1 jr = 1
for 1 ≤ jr ≤ Mr and r = 1,2,···,R. Similarly to (16), the joint probability of (di,f r,jr) is r
Mr
R
∑ ∑ ∑ [n(di , fr ,jr )log P(di , fr ,jr )]
L1(θ) =
di , fr , j ) r
.
(27)
(21)
where n(di,f r,jr) is given in (20), and nr (di) = ∑
Here the subscripts (s) and (s + 1) denote the (s)th and (s + 1)th iterations, respectively. The M-step of the EM algorithm transforms L1(θ) in eq 19 into another form,
Mr jr = 1
n(di , fr , j ) r
is the amount of the elements in the feature Fr to be presented in di. 4523
DOI: 10.1021/acs.iecr.8b05864 Ind. Eng. Chem. Res. 2019, 58, 4518−4533
Article
Industrial & Engineering Chemistry Research In summary, the MFTM is built by the following steps: 1. Randomly initialize P(ck|di) and P(f r,jr|ck). 2. Calculate eqs 21 and 27 in an iteration manner. 3. Stop the iterations of Step 2 if the convergence criterion such as ||θ(s+1) − θ(s)|| 0.8 for c6 (z6), whereas only four data segments are with P(c3|di) > 0.8 for c3 (z3), so that z6 is more significant than z3. By selecting the qualitative trend combinations with the highest probabilities from P(ck|di) using eq 11, the clustering labels {ẑi}i N= 1 are obtained. Figure 9 shows the time sequence plots of {ẑi}iN= 1 with 1 ≤ t ≤ 2000. It is easy to find the similar data segments in the same qualitative trend combination. The ground truth of the qualitative trend combinations {zi}i N= 1 for
r=1
N
R
Mr
(29)
Along with the increment of K, the model may fit the training sample set better so that η(K) decreases. As an example, a perplexity curve will be shown later in Figure 7(a). However, the generalization capability of the model may decay at the same time, so that the training sample set is overfitted. Hence, a proper value of K should be chosen such that both K and η(K) are not large. An empirical approach is to choose K as K0 for the first time that η(K 0) − η(K max ) 24.5 (33) n Based on the review in Section 1, several clustering methods are applied to assign labels ẑi for each data segment di in D. These methods are listed below according to the type of input features: • Raw data segments x0(ti:ti + T − 1): (a) k-means,27 (b) hierarchical clustering with Euclidean distance,28 (c) hierarchical clustering with dynamic time warping based distance,17 (d) hierarchical clustering with the longest common subsequence based distance,22 (e) hierarchical clustering with real sequence edit distance,26 (f) hierarchical clustering with cross-correlation based distance;21 • Trend segments x⃗0(ti:ti + T − 1): (g) k-means,27 (h) hierarchical clustering with Euclidean distance,28 (i) hierarchical clustering with dynamic time warping based distance,17 (j) hierarchical clustering with the longest common subsequence based distance,22 (k) hierarchical clustering with real sequence edit distance,26 (l) hierarchical clustering with cross-correlation based distance;21 • Trend histograms [n(di, − 1), n(di,0), n(di,1)] in (20): (m) k-means,27 (n) hierarchical clustering with Euclidean distance,28 (o) fuzzy c-means,29 (p) self-organizing map,30 (q) hierarchical clustering with J divergence,24 (r) the MFTM in Section 2.4. 4528
DOI: 10.1021/acs.iecr.8b05864 Ind. Eng. Chem. Res. 2019, 58, 4518−4533
Article
Industrial & Engineering Chemistry Research
Figure 15. Time sequence plots of the prepump electrical current X1, inlet flow rate X2, inlet flow pressure X3, pump speed X4 and outlet flow pressure X5 in Example 3.
Figure 16. Upper subplot: the bar plots of μk and the global mean value μ0 as the horizontal line. The middle subplot: the histograms of the most likely trend combination ẑi in eq 11. The bottom subplot: the boxplots of corresponding pi in eq 12.
combinations are chosen. Based on the procedure in eqs 13 ∼ 15 as illustrated by the upper subplot in Figure 16, significant qualitative trend combinations Z0 of Z are obtained as Z0 = {z2, z3, z5}, marked in Table 3. The significant qualitative trend combinations can also be validated by the following information. The histograms of labels {ẑi}i N= 1 in eq 11 and the boxplots of corresponding probabilities pi in eq 12 are shown in the middle and bottom subplots in Figure 16. By checking the bar heights in the middle subplot in Figure 16, z2, z3 and z5 occur most frequently compared with the others. Moreover, according to the bottom subplot in Figure 16, data segments with labels ẑi ∈ {z2, z3, z5} correspond to higher probabilities pi. As defined in Section 2.1, the significant qualitative trend combinations should be supported by enough many data segments with higher probabilities. This fact means that z2, z3, and z5 are
Table 3. Qualitative Trend Combinations Z in Example 3. Z0 is Marked Z
X⃗ 1
X⃗ 2
X⃗ 3
X⃗ 4
X⃗ 5
z1 z2 z3 z4 z5 z6 z7
−1 0 +1 +1 −1 −1 −1
0 0 +1 0 −1 0 −1
−1 0 −1 0 +1 +1 −1
−1 0 +1 0 −1 0 0
−1 0 +1 0 −1 0 0
significant qualitative trend combinations in Z, which is consistent with the result of Z0. Several representative data segments for z2, z3, and z5 are given in Figures 17, 18, and 19. z2 = [0, 0, 0, 0, 0] can be described as 4529
DOI: 10.1021/acs.iecr.8b05864 Ind. Eng. Chem. Res. 2019, 58, 4518−4533
Article
Industrial & Engineering Chemistry Research
Figure 17. Time sequence plots of three representative data segments in z2 = [0, 0, 0, 0, 0] with additional data in the previous and subsequent 30 min. Amplitude changes of process variables are given at the bottom of subplots.
Figure 18. Time sequence plots of three representative data segments in z3 = [+1, +1, −1, +1, +1] with additional data in the previous and subsequent 30 min. Amplitude changes of process variables are given at the bottom of subplots.
‘all the process variables are in steady trends simultaneously’. It shows that the system is in steady states. z3 = [+1, + 1, − 1, + 1, + 1] and z5 = [−1, −1, +1, −1, −1] mean that X3 is in the opposite direction with the others. The significant qualitative trend combinations and the corresponding similar data segments can be used for alarm generation and root-cause identification. For the alarm generation, the significant qualitative trend combinations z3 = [+1, +1, −1, +1, +1] and z5 = [−1, −1, +1, −1, −1] are found to occur when the pump is in abnormal conditions. Thus, the qualitative trends of X1, X2, ···, X5 can be calculated in an online manner,44 and an alarm will be generated when the qualitative trends are the same as z3 or z5. Note that if an alarm is generated solely based on the nonsynchronized qualitative trends of the five process variables, then there are too many combinations of qualitative trend combinations to be monitored without historical supports. For the root-cause identification, z3 and z5
say that the root-cause process variable could be the inlet flow pressure X3 behaving oppositely to the others. If the system is currently in an abnormal condition, industrial plant operators can compare the current time sequences of X with the similar data segments of z3 and z5 in Figures 18 and 19. If a high similarity is found, then the root cause X3 could be quickly identified, and the operators could recall their experiences in handling similar abnormal conditions in the past and benefit in dealing with the current abnormal condition.
4. DISCUSSION This section analyzes the strengths and weakness of the proposed method, together with our experience with the method. The complexity of the proposed method is mainly from two parts: the PLR method in (3) to obtain qualitative trends and the EM method in eqs 21 and 27 to build the MFTM. The run time 4530
DOI: 10.1021/acs.iecr.8b05864 Ind. Eng. Chem. Res. 2019, 58, 4518−4533
Article
Industrial & Engineering Chemistry Research
Figure 19. Time sequence plots of three representative data segments in z5 = [−1, −1, +1, −1, −1] with additional data in the previous and subsequent 30 min. Amplitude changes of process variables are given at the bottom of subplots.
utilization of qualitative trend histograms, the proposed method could handle the asynchronism of multiple process variables. The qualitative trend combinations were determined using the clusters from the MFTM, and similar data segments in same qualitative trend combinations were produced. Numerical and industrial examples illustrated the effectiveness of the proposed method.
of PLR methods is dependent on the specific methods, which has been studied by Keogh et al.34 For the bottom-up algorithm used in this manuscript, the computational complexity is O(LT), so that the computational complexity for the PLR method in eq 3 is O(RLT). The complexity analysis for the EM method applied in building the PLSA model has been done by Hofmann.40 Since R independent features are used in the MFTM, the complexity is R times to building the PLSA model. For each iteration in the EM procedures, the computational complexity would be O(RNmax(Mr )K ). Usually, 100 iterations
■
AUTHOR INFORMATION
Corresponding Author
r
*Phone: +86 (532) 8605-8103; e-mail:
[email protected].
are enough to converge. For example, the MFTM in Example 3 costs less than 0.2 s at a personal computer with a Intel Core i7− 4770 CPU @3.40 GHz and 8 GB memory. Thus, the computational cost of the MFTM is minor. The proposed method is designed for finding significant qualitative trend combinations in multivariate systems. The novelties are 3-fold: (1) The qualitative trend histograms are utilized in order to deal with the asynchronism in multiple process variables. (2) The MFTM is developed as a novel extension of the PLSA model in the higher dimension feature space in order to deal with multivariate systems. (3) The MFTM assigns labels for data segments with probabilities, instead of hard partitioning. The proposed method has the following limitations. The PLR method yields only the first-order qualitative trends, that is, decreasing, steady, and increasing trends. Other features like the second-order qualitative trends could be exploited in the future work. Moreover, since the optimization method in building the MFTM is the EM method, the results are not guaranteed to converge to a global optimal solution.45 To deal with the adverse impact of local optimality, a commonly used practical solution40,45 is to run the optimization methods repeatedly with different initializations and to choose the solution with the highest L1(θ) in eq 19.
ORCID
Jiandong Wang: 0000-0003-2635-8724 Notes
The authors declare no competing financial interest.
■
ACKNOWLEDGMENTS This work was supported by the National Natural Science Foundation of China under Grant No. 61433001 and the Research Fund for the Taishan Scholar Project of Shandong Province of China.
■
VARIABLES Xr = a process variable X = a multivariate system, X = [X1, X2,···, XR] R = the number of process variables in X xr(t) = the value of Xr at time instant t X⃗ r = the qualitative trend of Xr x⃗r(t) = the value of X⃗ r at time instant t L = the length of historical time sequence ck = a cluster C = the set of ck K = the number of clusters in C z = a qualitative trend combination zk = the corresponding qualitative trend combination of ck Z = the set of zk Z0 = the set of significant zk di = a data segment D = the set of di
5. CONCLUSION This paper proposed a probabilistic clustering method to find significant qualitative trend combinations of multivariate systems. To deal with multiple process variables, the existing PLSA model was extended to the MFTM. Owing to the 4531
DOI: 10.1021/acs.iecr.8b05864 Ind. Eng. Chem. Res. 2019, 58, 4518−4533
Article
Industrial & Engineering Chemistry Research
(11) Villez, K. Qualitative path estimation: a fast and reliable algorithm for qualitative trend analysis. AIChE J. 2015, 61, 1535−1546. (12) Zhou, B.; Ye, H. A study of polynomial fit-based methods for qualitative trend analysis. J. Process Control 2016, 37, 21−33. (13) Liao, T. W. Clustering of time series data - a survey. Pattern Recognit 2005, 38, 1857−1874. (14) Aghabozorgi, S.; Shirkhorshidi, A. S.; Wah, T. Y. Time-series clustering−a decade review. Inf. Syst. 2015, 53, 16−38. (15) Hills, J.; Lines, J.; Baranauskas, E.; Mapp, J.; Bagnall, A. Classification of time series by shapelet transformation. Data Min. Knowl. Discovery 2014, 28, 851−881. (16) Yeh, C. C. M.; Zhu, Y.; Ulanova, L.; Begum, N.; Ding, Y.; Dau, H. A.; Silva, D. F.; Mueen, A.; Keogh, E. Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In IEEE 16th International Conference on Data Mining (ICDM), 2016; pp 1317−1322. (17) Keogh, E. J.; Pazzani, M. J. Scaling up dynamic time warping to massive datasets. In European Conference on Principles of Data Mining and Knowledge Discovery, 1999; pp 1−11. (18) Srinivasan, R.; Qian, M. S. Off-line temporal signal comparison using singular points augmented time warping. Ind. Eng. Chem. Res. 2005, 44, 63−5. (19) Banko, Z.; Abonyi, J. Correlation based dynamic time warping of multivariate time series. Expert Syst. Appl. 2012, 39, 12814−12823. (20) Mei, J.; Liu, M.; Wang, Y.; Gao, H. Learning a mahalanobis distance-based dynamic time warping measure for multivariate time series classification. IEEE Trans. Cybern. 2016, 46, 1363−1374. (21) Paparrizos, J.; Gravano, L. Fast and accurate time-series clustering. ACM Trans. Database Syst. 2017, 42, 8. (22) Banerjee, A.; Ghosh, J. Clickstream clustering using weighted longest common subsequences. In Proceedings of the Web Mining Workshop at the 1st SIAM Conference on Data Mining, 2001; p 144. (23) Vlachos, M.; Kollios, G.; Gunopulos, D. Discovering similar multidimensional trajectories. In 18th International Conference on Data Engineering, 2002; pp 673−684. (24) Shumway, R. H. Time-frequency clustering and discriminant analysis. Stat. Probab. Lett. 2003, 63, 307−314. (25) Chen, L.; Ng, R. On the marriage of lp-norms and edit distance. Proceedings of the Thirtieth International Conference on Very Large Data Bases. 2004; pp 792−803. (26) Chen, L.; Ö zsu, M. T.; Oria, V. Robust and fast similarity search for moving object trajectories. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data,2005; pp 491−502. (27) Hastie, T.; Tibshirani, R.; Friedman, J. H.; Franklin, J. The Elements of Statistical Learning, Second Ed.: Data Mining, Inference, And Prediction; Springer, 2009. (28) Kaufman, L.; Rousseeuw, P. J. Finding Groups In Data: An Introduction To Cluster Analysis; John Wiley & Sons, 2009. (29) Bezdek, J. C. Pattern Recognition with Fuzzy Objective Function Algorithms; Springer Science & Business Media, 2013. (30) Wang, X.; Smith, K. A.; Hyndman, R.; Alahakoon, D. A scalable Method for Time Series Clustering Technical Report; Monash University 2004,. (31) Ng, Y. S.; Srinivasan, R. Multivariate temporal data analysis using self-organizing maps. 2. Monitoring and diagnosis of multistate operations. Ind. Eng. Chem. Res. 2008, 47, 7758−7771. (32) Venkatasubramanian, V.; Rengaswamy, R.; Kavuri, S. N. A review of process fault detection and diagnosis. Part II: Qualitative models and search strategies. Comput. Chem. Eng. 2003, 27, 313−326. (33) Björklund, S. A Survey and Comparison of Time-Delay Estimation Methods in Linear Systems; Linkoping University, 2003. (34) Keogh, E.; Chu, S.; Hart, D.; Pazzani, M. Segmenting time series: a survey and novel approach. Data Mining in Time Series Databases 2004, 57, 1−22. (35) Zhou, B.; Ye, H. A study of polynomial fit-based methods for qualitative trend analysis. J. Process Control 2016, 37, 21−33. (36) Maurya, M. R.; Rengaswamy, R.; Venkatasubramanian, V. Fault diagnosis using dynamic trend analysis: A review and recent developments. Eng. Appl. Artif. Intell. 2007, 20, 133−146.
N = the number of data segments in D f r,jr = the type of qualitative trends of Xr in di Fr = the set of f r,jr Mr = the size of Fr x̂r(t) = the linear fitting value at time t in PLR for Xr ai,r = the intercept of linear segments in PLR for Xr bi,r = the slope of linear segments in PLR for Xr ti,r = the starting time index of linear segments in PLR for Xr ti = the integrated split time index Sr = the set of ti,r S = the set of ti T = the minimum data length to generate S from Sr zi = the ground truth of the qualitative trend combination for di ẑi = the most likely qualitative trend combination for di pi = the probability P(ẑi|di) μk = the mean value of [P(ck|di)]q for ck μ0 = the global mean value of [P(ck|di)]q n(·) = a counting function θ = unknown parameters, θ = {P(ck|di), P(f r,jr|ck)} L1(θ) = the log-likelihood function η(K) = the perplexity with K γ = a ratio to determine K in η(K) v = the signal-to-noise ratio α = the accuracy of {ẑi}iN= 1 to the ground truth {zi}iN= 1
■
INDICES i = the index of data segments j = the type of qualitative trends k = the index of qualitative trend combinations r = the index of process variables s = the index of iterations in the EM algorithms
■
REFERENCES
(1) Apté, C. Data mining: an industrial research perspective. IEEE Comput. Sci. Eng. 1997, 4, 6−9. (2) Gupta, A.; Giridhar, A.; Venkatasubramanian, V.; Reklaitis, G. V. Intelligent alarm management applied to continuous pharmaceutical tablet manufacturing: an integrated approach. Ind. Eng. Chem. Res. 2013, 52, 12357−12368. (3) Wang, J.; Yang, F.; Chen, T.; Shah, S. L. An overview of industrial alarm systems: main causes for alarm overloading, research status, and open problems. IEEE Trans. Autom. Sci. Eng. 2016, 13, 1045−1061. (4) Charbonnier, S.; Gentil, S. A trend-based alarm system to improve patient monitoring in intensive care units. Control Eng. Practice 2007, 15, 1039−1050. (5) Charbonnier, S.; Gentil, S. On-line adaptive trend extraction of multiple physiological signals for alarm filtering in intensive care units. Int. J. Adapt. Control Signal Process. 2008, 24, 382−408. (6) Pillai, V. K.; Beris, A. N.; Dhurjati, P. S. Implementation of modelbased optimal temperature profiles for autoclave curing of composites using a knowledge-based system. Ind. Eng. Chem. Res. 1994, 33, 2443− 2452. (7) Flehmig, F. U.; Marquardt, W. Detection of multivariable trends in measured process quantities. J. Process Control 2006, 16, 947−957. (8) Fuchs, E.; Gruber, T.; Nitschke, J.; Sick, B. Online segmentation of time series based on polynomial least-squares approximations. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 2232−2245. (9) Maurya, M. R.; Paritosh, P. K.; Rengaswamy, R.; Venkatasubramanian, V. A framework for on-line trend extraction and fault diagnosis. Eng. Appl. Artif. Intell. 2010, 23, 950−960. (10) Charbonnier, S.; Portet, F. A self-tuning adaptive trend extraction method for process monitoring and diagnosis. J. Process Control 2012, 22, 1127−1138. 4532
DOI: 10.1021/acs.iecr.8b05864 Ind. Eng. Chem. Res. 2019, 58, 4518−4533
Article
Industrial & Engineering Chemistry Research (37) Hamilton, J. D. Time Series Analysis; Princeton University Press, 1994. (38) ISA, ANSI/ISA-18.2. Management of Alarm Systems for the Process Industries; International Society of Automation, 2009. (39) EEMUA, EEMUA-191. Alarm Systems − a Guide to Design, Management and Procurement; Engineering Equipment and Materials Users Association, 2013. (40) Hofmann, T. Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn 2001, 42, 177−196. (41) Dempster, A. P.; Laird, N. M.; Rubin, D. B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B-Stat. Methodol. 1977, 39, 1−38. (42) Bahrani, M.; Sameti, H. A new bigram-PLSA language model for speech recognition. EURASIP J. Adv. Signal Process. 2010, 2010, 308437. (43) Blei, D. M.; Ng, A. Y.; Jordan, M. I. Latent Dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993−1022. (44) Chen, K.; Wang, J. Normal and abnormal data segmentation based on variational directions and clustering algorithms. Ind. Eng. Chem. Res. 2017, 56, 7799−7813. (45) Wu, C. J. On the convergence properties of the EM algorithm. Annals of statistics 1983, 11, 95−103.
4533
DOI: 10.1021/acs.iecr.8b05864 Ind. Eng. Chem. Res. 2019, 58, 4518−4533