Tensor Global-Local Preserving Projections for Batch Process

May 19, 2014 - framework for global structure preserving and local structure preserving. .... LPP for three-way data arrays, which is also termed as t...
0 downloads 0 Views 1MB Size
Article pubs.acs.org/IECR

Tensor Global-Local Preserving Projections for Batch Process Monitoring Lijia Luo,*,† Shiyi Bao,† Zengliang Gao,† and Jingqi Yuan‡ †

College of Mechanical Engineering, Zhejiang University of Technology, Hangzhou 310014, China Department of Automation, Shanghai Jiao Tong University, Shanghai, China



ABSTRACT: A novel data projection algorithm named tensor global-local preserving projections (TGLPP) is proposed for three-way data arrays. TGLPP is able to preserve global and local structures of data simultaneously. TGLPP builds a unified framework for global structure preserving and local structure preserving. Tensor principal component analysis (TPCA) and tensor locality preserving projections (TLPP) are unified in the TGLPP framework. They are proved to be two limiting cases of TGLPP. A batch process monitoring method is then developed on the basis of TGLPP. The SPD statistic and R2 statistic are used for fault detection. The contribution plot is applied for fault identification. Two case studies are carried out to validate the efficiency of TGLPP-based monitoring method.

1. INTRODUCTION The batch process has been widely used in chemical and other industries because of its high flexibility in the production of lowvolume and high-value-added products.1−4 One of the most important issues of batch processes is to ensure process safety and consistent product quality. Online performance monitoring is an efficient means to guarantee the safe operation of batch processes, because it provides a chance to take corrective actions before faults disrupt production or damage equipment. However, monitoring a batch process is difficult, because it suffers from some intractable process features, for example, multimode, multiphase, high uncertainty, and so on. Generally, batch process monitoring is carried out on the basis of process models, knowledge, or data. Because of the high complexity and the quick product-to-market setting of batch processes, it is difficult to build precise process models or obtain enough process knowledge within limited time.5 As a result, model- and knowledge-based monitoring methods are not applicable in many cases. On the other hand, with quick applications of advanced measure technologies in batch processes, lots of process data are recorded. These historical data contain rich information related to process characteristics, and thus can be applied for process monitoring. Therefore, data-driven multivariate statistical process monitoring (MSPM) methods have attracted more attention in recent years. So far, many successful applications of MSPM methods in batch processes have been reported. Lee et al. proposed a multiway kernel PCA method for batch process monitoring to deal with the process nonlinearity.6 Camacho and Picó applied a multiphase PCA to monitor multiphase batch processes.7,8 Chen and Liu presented an online batch process monitoring method based on dynamic PCA to capture the time-variant behaviors of the process.9 Rannar et al. proposed a hierarchical PCA for adaptive monitoring of batch processes.10 Lee and Vanrolleghem proposed a multiblock PCA for monitoring complex batch processes.11 Yoon and MacGregor12 developed a multiscale PCA method for batch process monitoring. Some survey papers have summarized more batch process monitoring methods.13−15 © 2014 American Chemical Society

The multiway PCA-based method and its various updated versions is one kind of the most widely used MSPM approaches for batch processes. However, most of these methods have two major drawbacks. First, PCA only focuses on the preserving of global Euclidean structure of data, while it ignores the local neighborhood structure. The loss of local neighborhood structure may decrease the discriminating power of PCA-based methods. Second, because historical data of a batch process are often arranged in a three-way data array along directions of batch, variable, and sample time, the multiway PCA-based methods need to unfold this three-way array into a two-dimensional matrix for building the monitoring model. However, this data unfolding process may destroy the intrinsic structure of the original batch data, leading to information loss. The performance of multiway PCA-based monitoring methods therefore degrades. Recently, several new process monitoring methods have been proposed on the basis of locality preserving projections (LPP),16 such as multiway LPP (MLPP),17 tensor LPP (TLPP),3 and generalized orthogonal LPP (GOLPP).18 It has been shown that these LPP-based monitoring methods have better performance than conventional PCA-based methods.3,17,18 As opposed to PCA, LPP mainly preserves the local neighborhood structure of data, and so does MLPP, TLPP, and GOLPP. Particularly, GOLPP also has the ability to preserve the global structure of data, because it produces orthogonal basis functions.18 Theoretically, if all of the orthogonal basis functions of GOLPP are used, the projective map is simply a rotation map, which will not distort the global structure of data set. However, GOLPP fails to provide a method to control this rotation to get the best exhibition of global structure after projection. In other words, the global structure preserving of GOLPP is implied and uncontrollable. Because most LPP-based methods do not explicitly take the Received: Revised: Accepted: Published: 10166

November 23, 2013 May 9, 2014 May 19, 2014 May 19, 2014 dx.doi.org/10.1021/ie403973w | Ind. Eng. Chem. Res. 2014, 53, 10166−10176

Industrial & Engineering Chemistry Research

Article

X(N × K × J) = {Xi} with Xi ∈ 9 K ⊗ 9 J , i = 1, 2, ..., N, TPCA solves the following optimization problem:

global structure preserving into account, they may result in the loss of variance information. This means that LPP-based methods may perform poorly when the data set has significant variance information. In particular, Luo et al. recently proposed a tensor globallocal structure analysis (TGLSA) method by integrating PCA with LPP.19 As compared to PCA and LPP, TGLSA is able to preserve both global and local structures of data.19 However, TGLSA just simply bonds PCA and LPP together, but fails to uncover its intrinsic relationships with PCA and LPP. In this Article, the tensor global-local preserving projections (TGLPP) algorithm is proposed. Similar to TGLSA, TGLPP aims to preserve both global and local structures of data, and it performs a tensor projection on three-way data arrays to avoid data unfolding. An outstanding advantage of TGLPP is that it builds a unified framework for preserving global and local structures of data. Tensor PCA (TPCA) and TLPP can be easily interpreted under the TGLPP framework, and they are proved to be two limiting cases of TGLPP. The objective function of TGLPP consists of two subobjective functions, where each subobjective function corresponds to one kind of structure preserving. The trade-off between two subobjectives can be adjusted by a weight coefficient. The global structure preserving of TGLPP is thus explicit and controllable as compared to GOLPP. A TGLPP-based monitoring method is proposed for batch processes. Its performance is tested in two case studies. The results indicate that it outperforms MPCA, TPCA, and TLPP methods.

1 N

JTPCA (U , V ) = max U ,V

T

∑ || Yi − Y ̅ ||2 i

T

s.t. U U = I ,

V V=I

(3)

where Yi = U XiV, Y̅ = ∑iYi/N, I denotes the identity matrix, and U(K × K1) and V(J × J1) are two transformation matrixes. 2.3. STDD. Luo et al. extended support vector domain description (SVDD)21 to support tensor domain description (STDD) for dealing with three-way data arrays.19 For a data set X = {Xi|Xi ∈ 9 K ⊗ 9 J}, i = 1, 2, ..., N, STDD seeks a minimum feature hypersphere to describe the boundary of the data set. This can be formulated as an optimization problem:19 T

min R2 + C ∑ ξi

R ,A ,ξ

i 2

s.t. || Xi − A || ≤ R + ξi , ξi ≥ 0

(4)

where A and R denote the center and the radius of the feature hypersphere, C is a penalty factor, and ξi are slack variables. Applying the Lagrange multiplier method to eq 4 leads to its dual problem:19 min ∑ αitr(XiXiT) − αi

i

∑ αiαjtr(XiXjT) ij

s.t. 0 ≤ αi ≤ C ,

∑ αi = 1 (5)

i

2. BACKGROUND TECHNIQUES 2.1. LPP and TLPP. LPP is a manifold learning algorithm that aims at extracting the local neighborhood structure of data. Tensor subspace analysis (TSA)20 is an extended version of LPP for three-way data arrays, which is also termed as tensor locality preserving projections (TLPP) in ref 3. For a three-way array X(N × K × J) = {Xi} with Xi ∈ 9 K ⊗ 9 J , i = 1, 2, ..., N, TLPP seeks two transformation matrices U(K × K1)(K1 ≤ K) and V(J × J1)(J1≤ J) mapping X(N × K × J) to Y(N × K1 × J1) = {Yi} with Yi = UTXiV ∈ 9 K1 ⊗ 9 J1, such that Y optimally preserves the local structure of X. The objective function of TLPP is20

where αi ≥ 0 are Lagrange multipliers. Maximizing eq 5 gives a set of αi. Data points with αi = 0 are inside the feature hypersphere. Data points corresponding to 0 < αi < C are at the edge of the feature hypersphere and termed as support tensors. Data points with αi = C are outliers that exceed the feature hypersphere. After obtaining αi, A and R can be calculated by19

A=

∑ αiXi

(6)

i

R2 = tr(XsXsT) − 2 ∑ αitr(XsXiT) + i

∑ αiαjtr(XiXjT) ij

2

JTLPP (U , V ) = min ∑ || UTXiV − UTXjV || Wij U ,V

ij

(7)

where Xs can be any support tensor. A new sample X0 is acceptable if it locates inside the feature hypersphere:

(1)

where Wij is a weight coefficient representing the adjacent relationship between Xi and Xj. 2.2. PCA, MPCA, and TPCA. PCA is a classical dimensionality reduction algorithm. PCA seeks a subspace that captures the maximum variance of data. Multiway PCA (MPCA)1,2 is an extended version of PCA for three-way data arrays. MPCA unfolds a three-way data array X(N × K × J) into a matrix X(N × KJ) first, then decomposes X(N × KJ) as

|| X 0 − A ||2 = tr(X 0X 0T) − 2 ∑ αitr(X 0XiT) + i

∑ ti piT + E i=1

ij

(8)

3. TENSOR GLOBAL-LOCAL PRESERVING PROJECTIONS (TGLPP) As mentioned above, PCA and LPP only concern one aspect of data structures. PCA mainly preserves the global structure (i.e., variance information) of data, while LPP preserves the local structure of data. Therefore, PCA may break the topology relationship of data points, and LPP may lead to the loss of variance information. To overcome these shortcomings, tensor global-local preserving projections (TGLPP) is proposed for three-way data arrays. TGLPP aims to preserve global and local structures of data simultaneously. Moreover, TGLPP adopts the tensor projection to avoid the data unfolding in multiwaybased methods.

l

X=

∑ αiαjtr(XiXjT) ≤ R2

(2)

where t denotes the principal component (PC), p is the loading vector, l is the number of PCs, and E is a residual matrix. Similar to the extension of LPP to TLPP, PCA also can be easily extended to TPCA to deal with three-way data arrays. TPCA seeks a tensor projection that optimally preserves the global Euclidean structure of data. For a three-way array 10167

dx.doi.org/10.1021/ie403973w | Ind. Eng. Chem. Res. 2014, 53, 10166−10176

Industrial & Engineering Chemistry Research

Article

3.2. Formulation of TGLPP. Because ∥A∥2 = tr(AAT), eq 12 is reduced to

3.1. Algorithm Description. Given a three-way data array X(N × K × J ) = {Xi|Xi ∈ 9 K ⊗ 9 J}, i = 1, 2, ..., N, TGLPP seeks two transformation matrices U ∈ 9 K ⊗ 9 K1 and V ∈ 9 J ⊗ 9 J1 that project X to a lower dimensional data set Y (N × K1 × J1) = {Yi |Yi ∈ 9 K1 ⊗ 9 J1} (K1 ≤ K, J1 ≤ J) with Yi = UTXiV, and ensures that Y well preserves global and local structures of X. The objective function of TGLPP is constructed as follows:

JTGLPP (U , V ) =min{η1JLocal (U , V ) + (1 − η1)JGlobal (U , V )} U ,V

= min U ,V

×

= min{η1tr(UT(D V − WV )U ) − (1 − η1) U ,V

U ,V

×

× tr(UT(D̅ V − W̅V )U )}

∑ || Yi − Yj ||2 Wij̅ , ij

1 2

= min tr(UT(η1(D V − WV ) − (1 − η1) U ,V

× (D̅ V − W̅V ))U )

⎫ ⎪

= min tr(UT(η1L V − (1 − η1)L̅ V )U )

∑ || Yi − Yj ||2 Wij⎬⎪ ij



U ,V

(9)

= min tr(UTMV U )

where the subobjective function JGlobal(U,V) = −1/2∑ij∥Yi − Yj∥2W̅ ij corresponds to the global structure preserving, and the subobjective function JLocal(U,V) = 1/2∑ij∥Yi − Yj∥2Wij is related to the local structure preserving. Wij and W̅ ij are weighting coefficients that represent the adjacent relationship and the nonadjacent relationship between Xi and Xj. W is an adjacency weighting matrix formed by Wij, and all W̅ ij constitute a nonadjacent weighting matrix W̅ . Weighting coefficients Wij and W̅ ij are calculated by ⎧ − || X i − Xj ||2 /σ1 ⎪e if Xj ∈ Ω(Xi) or Xi ∈ Ω(Xj) Wij = ⎨ ⎪ ⎩0 otherwise

U ,V

(14)

where DV = ∑iD iiXiVV TXiT with Dii = ∑jWij , W V = ∑ijWijXiVVTXTj , D̅ V = ∑iD̅ iiXiVVTXTi with D̅ ii = ∑jW̅ ij, W̅ V = ∑ijW̅ ijXiVVTXTj . SLocal and SGlobal are calculated by SLocal = ρ(L V ), SGlobal = ρ(L̅ V )

η1 = (10)

ρ(L̅ V ) ρ(L V ) + ρ(L̅ V )

(16)

If taking the importance of each data point Xi into account, which can be quantified by D̂ ii = η1Dii − (1 − η1)D̅ ii, and further to avoid the singularity problem of TGLPP, we also maximize the following objective function: (11)

J1(U , V ) = tr(UT(η1D̂ V + (1 − η1)IK )U ) = tr(UTNV U )

where σ1 > 0 and σ2 > 0 are two empirical parameters for adjusting Wij and W̅ ij, and their values can be selected by experience or trials. ∥·∥ is the Frobenius norm of matrix, that is, ∥A∥ = (∑ija2ij)1/2. Ω(Xi) denotes the neighborhood of Xi, which is often defined by two methods: k nearest neighbors and ε-neighborhoods.16 The k (k ∈ N) nearest neighbor of Xi is Ωk(Xi) = {X|X is among k nearest neighbors of Xi}. The ε-neighborhood (ε ∈ R) of Xi is Ωε(Xi) = {X |∥X − Xi∥2 < ε}. In this Article, the first method is adopted because it is simpler to choose k. Equation 9 is a typical dual-objective optimization problem. The weighted sum method is thus applied to solve it. By introducing a weight coefficient η ∈ [0,1], eq 9 is reduced to

(17)

where D̂ V = ∑iD̂ iiXiVVTXTi and IK denote an identity matrix. Similarly, according to ∥A∥2 = tr(ATA), eq 12 is also reduced to JTGLPP (U , V ) =min{η2JLocal (U , V ) + (1 − η2)JGlobal U ,V

× (U , V )} = min U ,V

×

1 {η ∑ || Yi − Yj ||2 Wij − (1 − η2) 2 2 ij

∑ || Yi − Yj ||2 Wij̅ } ij

JTGLPP (U , V ) =min{ηJLocal (U , V ) + (1 − η)JGlobal (U , V )} U ,V

= min{η2tr(V T(D U −WU )V ) − (1 − η2) U ,V

1 = min {η ∑ || Yi − Yj ||2 Wij − (1 − η) U ,V 2 ij

× tr(V T(D̅ U −WU̅ )V )}

∑ || Yi − Yj ||2 Wij̅ }

= min tr(V T(η2(D U −WU ) − (1 − η2) U ,V

ij

× (D̅ U −WU̅ ))V )

(12)

The performance of TGLPP is closely related to η. To give the same attention to global and local structures, η is calculated by ηSLocal = (1 − η)SGlobal

(15)

where ρ(·) denotes the spectral radius of a matrix. Substituting eq 15 into eq 13 yields

⎧ − || X i − Xj ||2 /σ2 ⎪e if Xj ∉ Ω(Xi) and Xi ∉ Ω(Xj) Wij̅ = ⎨ ⎪ ⎩0 otherwise

×

∑ || Yi − Yj ||2 Wij̅ } ij

JTGLPP (U , V ) =min{JGlobal (U , V ), JLocal (U , V )} ⎧ ⎪ 1 = min⎨− U ,V ⎪ 2 ⎩

1 {η ∑ || Yi − Yj ||2 Wij − (1 − η1) 2 1 ij

= min tr(V T(η2LU − (1 − η2)L̅U )V ) U ,V

= min tr(V TMU V )

(13)

U ,V

where SLocal and SGlobal are scales of JLocal(U,V) and JGlobal(U,V).

(18) 10168

dx.doi.org/10.1021/ie403973w | Ind. Eng. Chem. Res. 2014, 53, 10166−10176

Industrial & Engineering Chemistry Research

Article

where DU = ∑iDiiXTi UUTXi, WU = ∑ijWijXTj UUTXi, D̅ U = ∑iD̅ iiXTi UUTXi, W̅ U = ∑ijW̅ ijXTj UUTXi. The weight coefficient η2 is η2 =

ρ(L̅U ) ρ(LU ) + ρ(L̅U )

JTPCA (U , V ) =max U ,V

where D̃ U = ∑iD̃ iiXTi UUTXi with D̃ ii = η2Dii − (1 − η2)D̃ ii, and IJ is an identity matrix. In summary, TGLPP solves two optimization problems:

min U ,V



i

(21)

⎞ ⎛ 1 T T ⎟ = max tr ⎜⎜ 2 [2N ∑ YY i i − 2 ∑ YY i j ]⎟ U ,V ⎠ ⎝ 2N i ij

(22)

⎛ 1 = max tr ⎜⎜ 2 U ,V N 2 ⎝

tr(UTMV U )

tr(V TMU V ) tr(V TNU V )



T T ̅ ̅ ⎟⎟ ∑ YY i i − YY

⎛ 1 T = max tr ⎜⎜ 2 [N ∑ YY i i − (∑ Yi ) U ,V i i ⎝N ⎞ × (∑ YjT)]⎟⎟ j ⎠

(20)

tr(UTNV U )

i

⎛1 = max tr ⎜⎜ U ,V ⎝N

(19)

J2 (U , V ) = tr(V T(η2D̃U + (1 − η2)IJ )V ) = tr(V TNU V )

U ,V

∑ || Yi − Y ̅ ||2

1 = max tr(∑ (Yi − Y ̅ )(Yi − Y ̅ )T ) U ,V N i

We also maximize the following objective function:

min

1 N

Equations 21 and 22 are equal to two generalized eigenvector problems:

= max U ,V

1 2N 2

⎞ T T T ⎟ YY YY Y Y ( 2 ) − + ∑ ii i j j j ⎟ ⎠ ij

∑ || Yi − Yj ||2 ij

(25)

MU v = λNU v

(23)

MV u = λNV u

(24)

If we let k = 0 and η = 0, the optimization problem of TGLPP reduces to JTGLPP (U , V ) = max

where u and v are column vectors of U and V, respectively. Equations 23 and 24 should be solved together because of their interdependence. An iterative calculation method is thus applied to solve them. First, U is initialized as the identity matrix, then η2 is calculated by eq 19, and V is computed by solving eq 23. Once V is obtained, η1 is calculated by eq 16, and U is updated by solving eq 24. By repeating this process many times until the termination criterion is satisfied, eqs 23 and 24 are solved. The termination criterion can be set as the maximum iteration number or the convergence accuracy. Those eigenvectors corresponding to the J1 and K1 smallest eigenvalues of eqs 23 and 24 then constitute matrices V and U, that is, V = [v1, v2, ..., vJ1] and U = [u1, u2, ..., uK1]. In this

U ,V

T

s.t. U U = I ,

T

1 2

∑ || Yi − Yj ||2 Wij̅

V V=I

ij

(26)

which is weighted TPCA. If we urther let W̅ = 1N1TN, JTGLPP(U,V) will be proportional to JTPCA(U,V), that is, JTGLPP(U,V) ∝ JTPCA(U,V). TGLPP is thus consistent with the normal TPCA. Lemma 1 reveals that TPCA only preserves the global structure of data, but totally ignores the local structure. Therefore, TPCA can be viewed as a limiting case of TGLPP without regard to the neighborhood relationship of data points. 2. Relationship with TLPP. The relationship of TGLPP with TLPP is expressed as lemma 2. Lemma 2: If we let η = 1, TGLPP reduces to TLPP. Proof: Let η = 1, and the optimization problems of TGLPP become

Article, J1 and K1 are simply chosen to be J1 = J and K1 = K for defining monitoring statistics. 3.3. Algorithm Analysis. The relationship of TGLPP with TPCA, TLPP, and TGLSA is revealed to highlight its several characteristics. 1. Relationship with TPCA. The relationship of TGLPP with TPCA is expressed as lemma 1. Lemma 1: If we ignore the neighborhood relationship between data point (i.e., k = 0) and let η = 0, TGLPP converts to weighted TPCA. If we further set the nonadjacent weighting matrix as W̅ = 1N1TN, TGLPP reduces to normal TPCA. Proof: The objective function of TPCA in eq 3 is reformulated as

min U ,V

min U ,V

tr(UTL V U ) tr(UTDV U )

(27)

tr(V TLU V ) tr(V TDU V )

(28)

Equations 27 and 28 are in accord with optimization problems of TLPP.3,20 Lemma 2 reveals that TLPP is the other limiting case of TGLPP, which only preserves the local structure of data. Besides, TLPP solves two generalized eigenvector problems: LUv = λDUv and LVu = λDVu, which may suffer from the singularity of DV and DU in the undersampled situation. TGLPP 10169

dx.doi.org/10.1021/ie403973w | Ind. Eng. Chem. Res. 2014, 53, 10166−10176

Industrial & Engineering Chemistry Research

Article

overcomes this drawback by adding an item (1 − η)I in eqs 17 and 20. 3. Relationship with TGLSA. Both TGLPP and TGLSA19 have the ability of preserving global and local structures of data. However, they accomplish the global structure preserving through totally different ways. TGLSA directly adopts the idea of PCA to preserve the global structure. TGLPP builds a novel subobjective function to preserve the global structure, by taking the nonadjacent relationship of data points into account and introducing a nonadjacent weighting matrix to represent this relationship. This subobjective function has not only the same ability of preserving global structure as PCA, but also a form similar to that of the sun-objective function corresponding to the local structure preserving. TGLPP thus well unifies global structure preserving and local structure preserving under the same framework. TLPP and TPCA also can be easily interpreted under the TGLPP framework, and they are proved to be two special cases of TGLPP. However, TGLSA fails to reveal its intrinsic relationship with TLPP and TPCA.

where αi are Lagrange multipliers obtained by performing STDD on the data set Y(I × K × J). The R2 statistic of a new batch Xnew(K × J) is 2 T R new = tr(YnewYnew ) − 2 ∑ αitr(YnewYiT) + i

Yi = U XiV

where Ynew is the projection of Xnew, and αi are the same as those in eq 32. The control limit of R2 statistic is also estimated by a weighted χ2-distribution: R α2 = gχh2, α , g =

ckjSPD = ekj2 = (xkj − ykj )2

2 T = tr(YnewYnew R new ) − 2 ∑ αitr(YnewYiT) i

+

ij T = tr(UTX new VYnew ) − 2 ∑ αitr(UTX new VYiT) i

+ =

T ∑ xkjtr(ukTbTj VYnew ) − 2 ∑ xkjαitr(u kTbTj VYiT) kj

+

ikj T ∑ αiαjtr(YY i j ) ij

where uk denotes the kth row of U, and bj is a column vector in which the jth element is 1 and others are 0. The former two terms at the right side of eq 36 actually are the summation of effects of all process variables on the R2 statistic at all sample times. The last term at the right side of eq 36 is a constant for a given training data set. Thus, after neglecting the constant term, the contribution of variable j to the R2 statistic at sample time k is defined as

(30)

v 2m2 ,h= 2m v

T ckjR = xkjtr(u kTbTj VYnew ) − 2 ∑ xkjαitr(u kTbTj VYiT)

(31)

i

where m and v denote the mean and the variance of SPD statistics from all reference batches in the training data set. Let data set Y(I × K × J) be the projection of the training data set X(I × K × J), the feature hypersphere of Y(I × K × J) can be obtained by STDD. The distance between the projection of each batch and the center of the feature hypersphere can be used to quantify the deviation degree of this batch from normal operations. The square of this distance is defined as R2 statistic. Thus, the R2 statistic of a reference batch Xk is19 i

T ∑ αiαjtr(YY i j ) ij

where ykj is the projection of xkj. The control limit of SPD statistic is needed to identify process faults. Assuming that projection differences ekj are independent and normally distributed, the SPD statistic follows a weighted χ2-distribution, and its control limit at significance level α is2,19

R k2 = tr(YkYkT) − 2 ∑ αitr(YkYiT) +

T ∑ αiαjtr(YY i j )

(36)

kj = 1

SPDα = gχh,2 α , g =

(35)

Equation 33 can be reformulated as

KJ

∑ ekj2 = ∑ (xkj − ykj )2 kj = 1

(34)

where m and v denote the mean and the variance of R statistics from all support tensors and outliers in the training data set. The contribution plot22 is used for fault diagnosis. It describes contributions of process variables to monitoring statistics. The contribution of variable j to the SPD statistic at sample time k is defined as

where U(K × K) and V(J × J) are transformation matrices. The projection of a new batch Xnew(K × J) on the monitoring model is Ynew = UTXnewV. 4.2. Fault Detection and Diagnosis. The SPD (squared projection difference) statistic and R2 statistic proposed by Luo et al.19 are used for fault detection. The SPD statistic for a batch data X(K × J) = {xkj} is defined as19 SPD =

v 2m2 ,h= 2m v 2

(29)

KJ

ij

(33)

4. BATCH PROCESS MONITORING WITH TGLPP 4.1. Development of Monitoring Model. Let X(I × K × J) denote a training batch data set collected under normal operating conditions, where I, J, and K are numbers of batches, variables, and samples, respectively. Normalizing X(I × K × J) with the data preprocessing method in ref 19, and performing TGLPP on the normalized data set X(I × K × J) yields the monitoring model: T

T ∑ αiαjtr(YY i j )

(37)

4.3. Online Monitoring. A moving data window technique is combined with TGLPP for online monitoring. This technique has two advantages: one is tracking the time-variant dynamics of the process, and the other is avoiding the prediction of future unmeasured data. If the window size is d, window data of Ith batch until sample time k are ⎡x i xi ⎢ k − d + 1,1 k − d + 1,2 ⎢ i xk − d + 2,1 xki − d + 2,2 X i (d × J | k ) = ⎢ ⎢ ⋮ ⋮ ⎢ ⎢⎣ xki ,1 xki ,2

T ∑ αiαjtr(YY i j ) ij

(32) 10170

··· xki − d + 1, J ⎤ ⎥ ⎥ i ··· xk − d + 2, J ⎥ ⋱ ⋮ ⎥ ⎥ ··· xki , J ⎥⎦

dx.doi.org/10.1021/ie403973w | Ind. Eng. Chem. Res. 2014, 53, 10166−10176

Industrial & Engineering Chemistry Research

Article

known faults.25 Similar to ref 19, fault detection rate (FDR) and false alarm rate (FAR) are adopted to quantify the monitoring performance. A larger FDR and a smaller FAR indicate a better monitoring performance. 4.4. Complete Monitoring Procedure. The whole monitoring procedure includes two phases: Phase I: Build offline WMMM. Step 1: Select reference batches to constitute the training data set. Step 2: Choose a proper widow size (d). Step 3: Normalize window data. Step 4: Perform TGLPP and build the WMMM. Step 5: Obtain control limits for SPD statistic and R2 statistic. Phase II: Online monitoring Step 1: Normalize the window data of a new batch at sample time k. Step 2: Map window data on the WMMM. Step 3: Calculate SPD and R2 statistics of the new batch. Step 4: If the SPD or R2 statistic exceeds its control limit, report a fault and diagnose the fault variable using contribution plot. Otherwise, return back to step 1 and monitor next sample time k + 1.

Table 1. Monitoring Variables of the Fed-Batch Penicillin Fermentation Process no.

process variables

no.

process variables

1 2 3 4 5 6 7 8 9

aeration rate agitator power input substrate feeding rate substrate feeding temperature substrate concentration DO saturation biomass concentration penicillin concentration culture volume

10 11 12 13 14 15 16 17

CO2 concentration pH bioreactor temperature generated heat acid feeding rate base feeding rate cold water flow rate hot water flow rate

Table 2. Fault Batches in the Fed-Batch Penicillin Fermentation Process fault no. 1 2 3 4 5 6

fault description 10% step increase in aeration rate 10% step increase in agitator power input 10% step increase in substrate feeding rate 5 W ramp decrease in agitator power input 2 L/h ramp decrease in aeration rate 10% step decrease in substrate feeding rate

occurrence time (h)

end time (h)

100 100

200 200

100

200

100

200

100 100

200 200

5. CASE STUDY 5.1. Case I: A Fed-Batch Penicillin Fermentation Process. Birol et al.26 developed a simulator (PenSim v.2.0) for a benchmark fed-batch penicillin fermentation process. This simulator has been widely applied to test process monitoring methods.3,27,28 In this case study, a training data set that consists of 80 reference batches was generated from the simulator. Each batch has the duration of 400 h. Seventeen process variables listed in Table 1 were measured with a sampling interval of 1 h. Similar to ref 19, six fault batches were generated considering different fault variables, magnitudes, and directions, as listed in Table 2. Fault nos. 1−4 were used to select a proper window size. Fault nos. 5 and 6 were applied to evaluate and compare the performance of different monitoring methods. A neighbor number k = 5 was used for the k nearest neighbors method. σ1 in eq 10 was twice the maximal distance between adjacent data points, and σ2 in eq 11 was twice the maximal distance between nonadjacent data points.

A window mode monitoring model (WMMM) thus can be built with window data. The WMMM is periodically updated with the window moving forward. Note that the window size d has a great effect on the fault sensitivity and the accuracy of the WMMM. Generally, a larger window size is helpful to enhance modeling accuracy, but may bring a heavy computation burden and decrease the fault sensitivity, as well as lead to time delays for fault detection. A smaller window size responses to faults quickly, while it may reduce the modeling accuracy and is easier to enlarge the disturbances of noises. Therefore, a proper window size is critical for the moving window technique. It is usually selected by experience or trials due to lack of efficient quantitative calculation methods.23,24 In this work, the window size is selected according to rapidity and accuracy of the WMMM in detecting

Table 3. FDR (%), FAR (%), and ACT (s) of SPD Statistic for Different Window Sizes fault no. 1

fault no. 2

fault no. 3

fault no. 4

window size

FDR

FAR

FDR

FAR

FDR

FAR

FDR

FAR

ACT

2 3 5 10 20

100 100 100 100 100

0 0 0 0 0

100 100 100 100 100

0 0 0 0 0

100 100 100 100 100

0 0 0 0 0

99 100 100 100 100

0 0 0 0 0

575 610 740 807 970

Table 4. FDR (%), FAR (%), and ACT (s) of R2 Statistic for Different Window Sizes fault no. 1

fault no. 2

fault no. 3

fault no. 4

window size

FDR

FAR

FDR

FAR

FDR

FAR

FDR

FAR

ACT

2 3 5 10 20

100 100 100 100 100

0 0 0 0 0

100 100 100 100 100

0 0 0 0 0

100 100 100 100 100

0 0 0 0 0

99 100 100 100 100

0 0 0 0 0

575 610 740 807 970

10171

dx.doi.org/10.1021/ie403973w | Ind. Eng. Chem. Res. 2014, 53, 10166−10176

Industrial & Engineering Chemistry Research

Article

Figure 1. Monitoring charts of TGLPP, MPCA, TLPP, and TPCA for fault batch no. 5.

computation time. The window size is thus chosen to be 5, considering modeling accuracy and computation efficiency. TGLPP, MPCA, TLPP, and TPCA are tested with fault batch no. 5 to make a comparison. The same window size is used for them. The number of principal components of MPCA was selected by the cumulative percent variance method.29,30 Similar to reference papers,1,2 T2 and Q statistics were used

Five window sizes were tested to select a proper window size. Tables 3 and 4 show the monitoring results for different window sizes. The average computation time (ACT) required by various window sizes to build offline WMMM and perform online monitoring are also compared in Tables 3 and 4. Obviously, the monitoring performance is perfect when the window size exceeds 2. However, a larger window size takes more 10172

dx.doi.org/10.1021/ie403973w | Ind. Eng. Chem. Res. 2014, 53, 10166−10176

Industrial & Engineering Chemistry Research

Article

Table 5. FDR (%) and FAR (%) of SPD or Q Statistic of Five Methods for Different Training Subdata Sets TGLPP

MPCA

TLPP

TPCA

TGLSA

data set no.

no. of reference batches

FDR

FAR

FDR

FAR

FDR

FAR

FDR

FAR

FDR

FAR

1 2 3 4 5 6

5 10 20 40 60 80

99 100 100 100 100 100

0 0 0 0 0 0

99 100 100 100 100 100

22 0 0 0 0 0

99 100 100 100 100 100

0 0 0 0 0 0

99 100 100 100 100 100

0 0 0 0 0 0

99 100 100 100 100 100

0 0 0 0 0 0

Table 6. FDR (%) and FAR (%) of R2 or T2 Statistic of Five Methods for Different Training Subdata Sets TGLPP

MPCA

TLPP

TPCA

TGLSA

data set no.

no. of reference batches

FDR

FAR

FDR

FAR

FDR

FAR

FDR

FAR

FDR

FAR

1 2 3 4 5 6

5 10 20 40 60 80

99 100 100 100 100 100

0 0 0 0 0 0

52 75 100 100 100 100

0 0 0 0 0 0

98 100 100 100 100 100

5 3 0 0 30 2

11 75 100 100 100 100

0 0 0 0 0 0

99 98 100 100 100 100

1 0 0 0 0 0

Figure 2. Scatter plots of the first two projection variables of all reference batches at 20 h generated by TLPP and TGLPP. Figure 3. TGLPP-based contribution plot at 100 h for fault no. 6.

with MPCA. For TLPP, the number of nearest neighbors was set the same as TGLPP. Monitoring charts of four methods for fault no. 5 are shown in Figure 1. Obviously, except for the TLPP-based R2 statistic, the other five statistics perfectly detected the fault with no FAR. As shown in Figure 1f, the TLPP-based R2 statistic detects the fault, while it has a FAR of 2%. To further illustrate the superiority of TGLPP, performances of MPCA, TLPP, TPCA, TGLPP, and TGLSA are tested with

six training subdata sets containing different numbers of reference batches and the fault batch no. 6. Tables 5 and 6 list monitoring results of five methods for different training subdata sets. Clearly, when the number of reference batches decreases to 10 and 5, the monitoring performance of all methods becomes worse, especially MPCA and TPCA. For the training subdata set with five reference batches, the T2 statistic 10173

dx.doi.org/10.1021/ie403973w | Ind. Eng. Chem. Res. 2014, 53, 10166−10176

Industrial & Engineering Chemistry Research

Article

Table 7. Mean Values of η1 and η2 for Different Training Subdata Sets data set no. no. of reference batches η1 η2

1

2

3

4

5

6

5 0.4126 0.4625

10 0.5900 0.7077

20 0.8110 0.9182

40 0.9133 0.9691

60 0.9541 0.9690

80 0.9560 0.9924

of MPCA has a small FDR of 52%, and the Q statistic has a large FAR of 22%. The SPD statistics of TPCA and TLPP show the same performance as that of TGLPP. However, the FAR of TLPP-based R2 statistic is larger than that of TGLPP, and the FDR of TPCA-based R2 statistic is smaller than that of TGLPP. TGLSA almost shows the same performance as TGLPP, although its R2 statistic gives a lower FDR or a higher FAR in some cases. To sum, TGLPP has a better monitoring performance than the other four methods, because both its SPD and R2 statistics have a higher FDR and a lower FAR when the number of reference batches is small. Note that the FAR of TLPP-based R2 statistic is often larger than that of TGLPP. For the training subdata set with 60 reference batches, scatter plots of the first two projection variables of all reference batches at 20 h generated by TLPP and TGLPP are shown in Figure 2. Clearly, TGLPP scatters projection data into a relatively large region, while TLPP maps data points into a narrow area. Besides, the mean of control limits of TLPP-based R2 statistic at different sample time is 0.8489, while it is 3.7769 for the TLGPP-based R2 statistic. These results indicate that TLPP loses the variance information on data, because it mainly preserves the local structure. This makes TLPP generate a small control limit for R2 statistic, leading to a high FAR. However, this drawback is overcome in TGLPP by taking both global structure preserving and local structure preserving into account. As a result, the control limit of TLGPP-based R2 statistic is much larger. For different training subdata sets, mean values of η1 and η2 corresponding to different sample times are listed in Table 7. Obviously, both η1 and η2 decrease with decreasing number of reference batches. In particular, for the training subdata set with only five reference batches, η1 and η2 drop below 0.5. This indicates that SLocal is bigger than SGlobal, and thus the local structure preserving deserves more consideration. Because MPCA and TPCA only seek to preserve the global structure of data, it is no doubt that they have poor monitoring performance. In contrast, TGLPP can make a good trade-off between local structure preserving and global structure preserving. That is the reason TGLPP has an excellent monitoring performance in most cases. The contribution plot was used to diagnosis the fault variable in fault batch no. 6. Figure 3 shows the TGLPP-based contribution plot at 100 h for fault no. 6. Clearly, the largest contributions to SPD and R2 statistics are provided by variable 3, substrate feeding rate. Hence, the substrate feeding rate is identified as the fault variable for fault batch no. 6, which is exactly the actual fault variable (as shown in Table 2). This result validates the efficiency of TGLPP-based contribution plot for fault diagnosis. 5.2. Case II: DuPont Benchmark Data. The benchmark data set from an industrial batch polymerization reactor of DuPont is tested in this case study. This data set includes 55 successful and unsuccessful batches. Each batch has the duration of 100 time intervals. Ten variables, including five temperatures (variables 1, 2, 3, 6, and 7), three pressures

Figure 4. Monitoring charts of TGLPP for an abnormal batch in the DuPont data.

(variables 4, 8, and 9), and two flow rates (variables 5 and 10), are measured during the batch run. More detailed information on the DuPont benchmark data can be found in ref 2. A training data set is constructed with 36 successful batches from the DuPont benchmark data, and it is used to build the WMMM. Using the same method as in case I, the window size is chosen to be 4 for this case. An abnormal batch (batch no. 49) is used to test the TGLPP-based monitoring method. Monitoring charts of this abnormal batch are shown in Figure 4. The SPD and R2 statistics successfully detect a fault for the abnormal batch. Obviously, SPD and R2 statistics violate their control limits since the 57th time point. Therefore, the fault occurred at around the 57th time point. This result agrees well with those reported in refs 2,9. For the abnormal batch, the TGLPP-based contribution plot at the 57th time point is shown in Figure 5. The variable 8 is identified to be the fault variable, because it provides the largest contributions to SPD and R2 statistics. The trajectory of variable 8 from the abnormal batch is compared to that from a normal batch in Figure 6. The variable 8 obviously deviates from normal measurements between the 57th and 65th time points. 10174

dx.doi.org/10.1021/ie403973w | Ind. Eng. Chem. Res. 2014, 53, 10166−10176

Industrial & Engineering Chemistry Research

Article

They can be viewed as two special cases of TGLPP that only consider preserving the global structure or the local structure of data. Moreover, by adopting the tensor projection, TGLPP avoids the information loss caused by data unfolding in conventional multiway-based methods, for example, MPCA. A batch process monitoring method was then developed on the basis of TGLPP. The SPD and R2 statistics were used for fault detection, and the contribution plot was applied for fault diagnosis. The performance of TGLPP-based monitoring method was investigated by two case studies. One is a benchmark fed-batch penicillin fermentation process, and the other is the DuPont benchmark data. The results indicate that the TGLPP-based monitoring method is better than MPCA, TLPP, TPCA, and TGLSA.



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work was supported by the National Natural Science Foundation of China (no. 61304116) and the Zhejiang Provincial Natural Science Foundation of China (no. LQ13B060004). We especially thank Professor J. F. MacGregor for providing the DuPont data.



REFERENCES

(1) Nomikos, P.; MacGregor, J. F. Monitoring batch processes using multiway principal component analysis. AIChE J. 1994, 40, 1361− 1375. (2) Nomikos, P.; MacGregor, J. F. Multivariate SPC charts for monitoring batch processes. Technometrics 1995, 37, 41−59. (3) Hu, K. L.; Yuan, J. Q. Batch process monitoring with tensor factorization. J. Process Control 2009, 19, 288−296. (4) Yao, Y.; Gao, F. Batch process monitoring in score space of twodimensional dynamic principal component analysis (PCA). Ind. Eng. Chem. Res. 2007, 46, 8033−8043. (5) Yao, Y.; Gao, F. A survey on multistage/multiphase statistical modeling methods for batch processes. Annu. Rev. Control 2009, 33, 172−183. (6) Lee, J. M.; Yoo, C.; Lee, I. B. Fault detection of batch processes using multiway kernel principal component analysis. Comput. Chem. Eng. 2004, 28, 1837−1847. (7) Camacho, J.; Picó, J. Multi-phase principal component analysis for batch processes modeling. Chem. Intell. Lab. Syst. 2006, 81, 127− 136. (8) Camacho, J.; Picó, J. Online monitoring of batch processes using multi-phase principal component analysis. J. Process Control 2006, 16, 1021−1035. (9) Chen, J.; Liu, K. C. On-line batch process monitoring using dynamic PCA and dynamic PLS models. Chem. Eng. Sci. 2002, 57, 63− 75. (10) Rannar, S.; MacGregor, J. F.; Wold, S. Adaptive batch monitoring using hierarchical PCA. Chemom. Intell. Lab. Syst. 1998, 41, 73−81. (11) Lee, D. S.; Vanrolleghem, P. A. Monitoring of a sequencing batch reactor using adaptive multiblock principal component analysis. Biotechnol. Bioeng. 2003, 82, 489−497. (12) Yoon, S.; MacGregor, J. F. Principle-component analysis of multiscale data for process monitoring and fault diagnosis. AIChE J. 2004, 50, 2891−2903. (13) Qin, S. J. Survey on data-driven industrial process monitoring and diagnosis. Annu. Rev. Control 2012, 36, 220−234.

Figure 5. TGLPP-based contribution plot at the 57th time point for the abnormal batch.

Figure 6. Trajectories of variable 8 from a normal batch and the abnormal one.

This validates the fault diagnosis result of the TGLPP-based contribution plot.

6. CONCLUSION In this Article, a new dimensional reduction algorithm termed as tensor global-local preserving projections (TGLPP) was proposed for three-way data arrays. TGLPP can preserve both global and local structures of data. It builds a unified framework for global structure preserving and local structure preserving. TPCA and TLPP are also unified in the TGLPP framework. 10175

dx.doi.org/10.1021/ie403973w | Ind. Eng. Chem. Res. 2014, 53, 10166−10176

Industrial & Engineering Chemistry Research

Article

(14) MacGregor, J. F.; Cinar, A. Monitoring, fault diagnosis, faulttolerant control and optimization: Data driven methods. Comput. Chem. Eng. 2012, 47, 111−120. (15) Ge, Z.; Song, Z.; Gao, F. Review of recent research on databased process monitoring. Ind. Eng. Chem. Res. 2013, 52, 3543−3562. (16) He, X. F.; Niyogi, P. Locality preserving projections. Proceedings of the Conference on Advances in Neural Information Processing Systems; 8−13 December 2003, Vancouver, Canada; MIT Press: Cambridge, MA, 2004. (17) Hu, K. L.; Yuan, J. Q. Multivariate statistical process control based on multiway locality preserving projections. J. Process Control 2008, 18, 797−807. (18) Shao, J. D.; Rong, G.; Lee, J. M. Generalized orthogonal locality preserving projections for nonlinear fault detection and diagnosis. Chem. Intell. Lab. Syst. 2009, 96, 75−83. (19) Luo, L. J.; Bao, S. Y.; Gao, Z. L.; Yuan, J. Q. Batch process monitoring with tensor global−local structure analysis. Ind. Eng. Chem. Res. 2013, 52, 18031−18042. (20) He, X. F.; Cai, D.; Niyogi, P. Tensor subspace analysis. Proceedings of the Conference on Advances in Neural Information Processing Systems; 5−8 December 2005, Vancouver, Canada; MIT Press: Cambridge, MA, 2006. (21) Tax, D.; Duin, R. Support vector domain description. Pattern Recogn. Lett. 1999, 20, 1191−1199. (22) Westerhuis, J. A.; Gurden, S. P.; Smilde, A. K. Generalized contribution plots in multivariate statistical process monitoring. Chem. Intell. Lab. Syst. 2000, 51, 95−114. (23) Singhal, A.; Seborg, D. E. Pattern matching in multivariate time series databases using a moving-window approach. Ind. Eng. Chem. Res. 2002, 41, 3822−3838. (24) Kano, M.; Hasebe, S.; Hashimoto, I.; Ohno, H. Statistical process monitoring based on dissimilarity of process data. AIChE J. 2002, 48, 1231−1239. (25) Wang, X.; Kruger, U.; Irwin, G. W. Process monitoring approach using fast moving window PCA. Ind. Eng. Chem. Res. 2005, 44, 5691−5702. (26) Birol, G.; Undey, C.; Cinar, A. A modular simulation package for fed-batch fermentation: penicillin production. Comput. Chem. Eng. 2002, 26, 1553−1565. (27) Lee, J. M.; Yoo, C.; Lee, I. B. On-line batch process monitoring using a consecutively updated multiway principal component analysis model. Comput. Chem. Eng. 2003, 27, 1903−1912. (28) Alvarez, C. R.; Brandolin, A.; Sánchez, M. C. Batch process monitoring in the original measurement’s space. J. Process Control 2010, 20, 716−725. (29) Jackson, J. E. A User’s Guide to Principal Components; Wiley: New York, 1991. (30) Jolliffe, I. T. Principal Component Analysis; Springer: New York, 2002.

10176

dx.doi.org/10.1021/ie403973w | Ind. Eng. Chem. Res. 2014, 53, 10166−10176