Batch Process Monitoring with Tensor Global–Local Structure Analysis

Nov 20, 2013 - Different from principal component analysis (PCA) and locality preserving projections (LPP), TGLSA aims at preserving both global and l...
2 downloads 0 Views 2MB Size
Article pubs.acs.org/IECR

Batch Process Monitoring with Tensor Global−Local Structure Analysis Lijia Luo,*,† Shiyi Bao,† Zengliang Gao,† and Jingqi Yuan‡ †

College of Mechanical Engineering, Zhejiang University of Technology, Hangzhou 310014, China Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, China



ABSTRACT: A novel method named tensor global−local structure analysis (TGLSA) is proposed for batch process monitoring. Different from principal component analysis (PCA) and locality preserving projections (LPP), TGLSA aims at preserving both global and local structures of data. Consequently, TGLSA has the ability to extract more meaningful information from data than PCA and LPP. Moreover, the tensor-based projection strategy makes TGLSA more applicable for the three-dimensional data than multiway-based methods, such as MPCA and MLPP. A TGLSA-based online monitoring approach is developed by combining TGLSA with a moving window technique. Two new statistics, i.e., SPD and R2 statistics, are constructed for fault detection and diagnosis. In particular, the R2 statistic is a novel monitoring statistic, which is proposed based on a support tensor domain description method. The effectiveness and advantages of the TGLSA-based monitoring approach are illustrated by a benchmark fed-batch penicillin fermentation process.

1. INTRODUCTION

the batch process monitoring can be found in recent survey papers.17−20 The PCA- and multiway-based approaches are the most widely used and investigated batch process monitoring techniques so far. A major limitation of such approaches is that PCA can only capture the global Euclidean structure (variance information) of data, while ignores the local neighborhood structure (intrinsic geometrical structure) of data.21 However, the local structure is also an important aspect of data structure, since it represents the neighborhood relationship of data points. The loss of such crucial information may make PCAbased approaches be easily affected by outliers and have less discriminating power.22 As a result, the process monitoring efficiency is reduced. Moreover, different from continuous processes, data collected from a batch process exhibit a threedimensional structure. Therefore, mulitway-based monitoring methods need to unfold the three-dimensional data set into two-dimensional data before developing the monitoring model. In other words, they convert each batch data matrix to a vector. However, the batch data is intrinsically a matrix. The relationship between row vectors of the matrix and that between column vectors may be important for finding a projection, especially when the number of training samples is small. This information may be lost after vectoring. Therefore, the data unfolding procedure may lead to information loss as the intrinsic data structure would be lost,3 and thus the process monitoring performance will also be degraded. On the other hand, in the area of pattern recognition, a wellknown linear dimension reduction technique, called locality preserving projections (LPP), has been developed by He and Niyogi.22 Different from PCA, LPP aims at preserving the local

As an important industrial production operation, the batch process has been widely applied in pharmaceutical, biochemistry, food, and semiconductor industries etc.1−4 Owing to its operational flexibility, the batch process is preferred in the production of low-volume and high-value added products.5 Process monitoring and fault diagnosis of batch processes are extremely important to ensure process safety and high quality products. However, monitoring of batch processes is a challenge problem due to their complicated characteristics, such as nonlinear and time-variant behaviors, finite duration, and nonsteady state, as well as operation uncertainty.6,7 Since Nomikos and MacGregor first proposed batch process monitoring approaches based on multiway principal component analysis (MPCA)1,2 and multiway partial least-squares (MPLS),6 lots of multivariate statistical monitoring methods have been developed for batch processes. Yoo et al.8 extended the independent component analysis (ICA) to multiway ICA (MICA) for batch process monitoring. Different from MPCA, MICA utilizes higher order statistics and is capable of extracting non-Gaussian components with statistical independency from data.8 Lee et al.,9 Tian et al.,10 and Yu et al.11 proposed kernelbased MPCA, MICA, and MLDA (multiway localized fisher discriminant analysis) methods, respectively, to deal with the monitoring of nonlinear batch processes. Rannar et al.12 and Li et al.13 developed two types of PCA-based adaptive monitoring techniques to deal with process dynamics. Lu et al.14 presented a two-dimensional dynamic PCA (2D-DPCA) to model and monitor time- and batch-wise dynamics simultaneously. Camacho and Picó15 developed the multiphase principal component analysis (MPPCA) for monitoring of multiphase batch processes. Yoon and MacGregor16 proposed a monitoring method based on multiscale PCA (MSPCA) by combining PCA with wavelet transformation, which utilizes the additional scale information for fault diagnosis. More investigations of © 2013 American Chemical Society

Received: Revised: Accepted: Published: 18031

July 23, 2013 October 17, 2013 November 20, 2013 November 20, 2013 dx.doi.org/10.1021/ie402355f | Ind. Eng. Chem. Res. 2013, 52, 18031−18042

Industrial & Engineering Chemistry Research

Article

Let X = [x1, x2, ···, xn]T ∈ 9 n × m denote a data matrix with zeromean and unit-variation, where n is the sample number and m is the number of variables. The PCA model of X is

neighborhood structure of data. Recently, successful applications of LPP for process monitoring and fault detection have been presented.3,23,24 Hu et al. developed the multiway locality preserving projections (MLPP)23 and tensor locality preserving projections (TLPP)3 for batch process monitoring. Shao et al.24 proposed the generalized orthogonal locality preserving projections (GOLPP) for nonlinear fault detection and diagnosis. However, a main drawback of LPP is that it does not explicitly take the variance information of data into account, which may destroy the global structure of data after projection.21 Therefore, LPP-based approaches may fail to provide a faithful representation for the data that has significant directions of variance information. The global Euclidean structure and local geometrical structure are two important aspects of the data structure. The global Euclidean structure defines the outer shape of the data set, and the local geometrical structure represents its inner organization.21 Both aspects should be well considered. However, either PCA or LPP only considers one of them. By combining PCA and LPP, Zhang et al.21 have proposed the global−local structure analysis (GLSA) for process monitoring. The performance of GLSA has been proved to be better than PCA and LPP. However, GLSA is only able to deal with twodimensional data sets from continuous processes but cannot be directly applied to three-dimensional data sets from batch processes. The multiway GLSA (MGLSA) is a feasible way to solve this problem, while it may have the same drawback as other multiway-based analysis methods. For three-dimensional data sets, tensor-based analysis methods (e.g., TLPP) are significantly superior to multiway-based analysis methods (e.g., MPCA and MLPP), because they can deal with threedimensional data sets directly and avoid the loss of data structure induced by data unfolding. In this paper, a new data projection algorithm named tensor global−local structure analysis (TGLSA) was proposed by integrating tensor analysis, PCA and LPP. Different from PCA and LPP, TGLSA aims at preserving global and local structures of the data set simultaneously. This aim is realized by constructing two subobjective functions, where each subobjective corresponds to one type of structure preserving. A weighted coefficient is introduced to make a trade-off between two subobjectives. TGLSA could well combine advantages of PCA and LPP by selecting an appropriate weighted coefficient. Moreover, TGLSA is a tensor-based method, which makes it more applicable for three-dimensional data sets. Based on TGLSA, a batch process monitoring method is then developed in this paper. Two new statistics are built for fault detection and diagnosis. One is the SPD (squared projection difference) statistic and the other is the R2 statistic. The R2 statistic is a novel statistic that is proposed by extending the support vector domain description (SVDD)25 to the support tensor domain description (STDD). The performance of TGLSA-based monitoring method was compared with those of MPCA, TLPP, and MGLSA in a fed-batch penicillin fermentation process. The results indicated that the TGLSA-based monitoring method has better performance.

d

X=

∑ tipiT

+ E = TPT + E (1)

i=1

where t is the principal component (PC) or score vector and p is the loading vector. Commonly, the first several PCs contain most variance information of X, and the last ones may contain only noise. Thus, by retaining first d PCs, the original data space is divided into a score space, X = TPT, plus a residual space, E. Thus, the most important variance information of X is extracted and the dimension of X is reduced. The most appropriate number of retained PCs is commonly selected by cumulative percent variance (CPV) or cross-validation.27−29 PCA is only used for modeling two-dimensional data, while the multiway PCA (MPCA) is widely used as an extension of PCA to deal with three-dimensional data. For a three-dimensional data set X(I × K × J), MPCA first unfolds it into a twodimensional data matrix X(I × KJ) and then performs a general PCA on X(I × KJ). A detailed introduction of PCA and MPCA can be found in refs 1, 2, and 27. 2.2. LPP, MLPP, and TLPP. Locality preserving projections (LPP) is a linear dimensionality reduction algorithm.22 Different from PCA, LPP aims to preserve the intrinsic geometrical structure, namely, the local neighborhood structure, of data.22 It is found that LPP is less sensitive to outliers and has more discriminating power than PCA.22 Similar to MPCA, multiway LPP (MLPP)23 is an extension of LPP for analyzing threedimensional data. More details about LPP and MLPP are traced back to refs 22, and 23. The other extension of LPP for three-dimensional data is the tensor locality preserving projections (TLPP) or tensor subspace analysis (TSA).3,30 TLPP seeks a linear projection in the tensor space that respects the local structure in the original data space.30 Specifically, for a given data set {Xi}, i = 1, 2, ..., m with Xi ∈ 9 n1 ⊗ 9 n2 , TLPP intends to find two transformation matrices U of size n1 × l1 (l1 < n1) and V of size n2 × l2 (l2 < n2) mapping Xi to Yi, Yi = UTXiV, such that Yi optimally represents Xi. Such matrices U and V can be obtained by solving an optimization problem.30 2.3. SVDD. Support vector domain description (SVDD) proposed by Tax and Duin25 is a powerful data domain description method. It can be used for novelty or outlier detection.25 SVDD constructs a spherically shaped decision boundary with minimal volume around data points by a set of support vectors describing the sphere boundary. For a given data set containing n data points, {xi, i = 1, 2, ..., n} ∈ 9 m, SVDD computes the sphere by solving the following minimization problem25 n

min R2 + C ∑ ξi

R , a , ξi

i=1

s.t . || xi − a ||2 ≤ R + ξi ,

2. BACKGROUND TECHNIQUES 2.1. PCA and MPCA. PCA is a widely used statistical analysis method in engineering and science applications. PCA maps data into a lower dimensional subspace which contains most variance of data to reach the minimum information loss.26

ξi ≥ 0

(2)

where a and R are center and radius of the sphere, the parameter C controls the trade-off between the volume of the sphere and the number of rejected data points, and ξi is a slack variable. The Lagrange dual problem of eq 2 is 18032

dx.doi.org/10.1021/ie402355f | Ind. Eng. Chem. Res. 2013, 52, 18031−18042

Industrial & Engineering Chemistry Research n

n

min ∑ αi(xi·xi) − αi

i=1

Article

n

following optimization problem is constructed to obtain U and V respecting both local and global structures of the data set

∑ ∑ αiαj(xi·xj) i=1 j=1

max{JT Global (U , V ), JT Local (U , V )}

n

(3)

i=1

where the subobjective JTGlobal(U,V) = ∑i∥Yi − Y̅∥2 corresponds to the preservation of global structure, and the local structure is preserved by the subobjective JTLocal(U,V) = −∑i,j∥Yi − Yj∥2Wij. Y̅ = ∑iYi/I is the mean of Yi. W is a symmetric weighting matrix with each element Wij representing the neighborhood relationship between Xi and Xj. A possible way of defining Wij is

where αi are Lagrange multipliers. Data points xi with αi = 0 are inside the sphere. Only data points xi corresponding to 0 < αi < C are needed in the description of sphere boundary, and these points are called support vectors of the description (SVs).25 Data points xi with αi = C are outside the sphere and considered as outliers. The center and radius of the sphere are n

a=

⎧ − || X i − Xj ||2 /σ ⎪e if Xi , Xj ∈ L(Xi , Xj) Wij = ⎨ ⎪ ⎩ 0 otherwise

∑ αixi i=1 n

n

R2 = (xk ·xk) − 2 ∑ αi(x i ·x k) + i=1

n

∑ ∑ αiαj(xi·xj) i=1 j=1

n

|| x0 − a ||2 = (x0·x0) − 2 ∑ αi(xi·x0) i=1

+

n

∑ ∑ αiαj(xi·xj) i=1 j=1

2

≤R

(7)

where σ is a suitable constant to be determined empirically.30,32 The function exp(−∥Xi − Xj∥2/σ) is the heat kernel.33 ∥·∥ is the Frobenius norm of matrix. L(Xi,Xj) denotes the neighborhood relationship between Xi and Xj, which can be expressed in two ways: k nearest neighbors and ε neighbors.22,34 In this paper, the way of k nearest neighbors is used to define L(Xi,Xj). Equation 6 is a typical multiobjective optimization problem. Generally, it is difficult to get an absolutely optimal solution for this problem due to the conflict between subobjectives. Many studies have addressed the multiobjective optimization problem and lots of solving methods were proposed.35,36 The weighted sum method was used in this study. By introducing a weighted coefficient η, the multiobjective function eq 6 is transformed into a single objective function

(4)

where xk is any support vector. A test data point x0 will be accepted if its distance to the center of the sphere a is smaller than or equal to the radius R

n

(6)

U ,V

∑ αi = 1

s.t . 0 ≤ αi ≤ C ,

(5)

If replacing x and (xi·xj) in eqs 3−5 with an implicit mapping Φ(x) and a kernel function K(xi, xj) = (Φ(xi)·Φ(xj)), a more flexible and accurate data description can be obtained.31 A detail introduction of SVDD can be found in refs 25, 31.

JTGLSA (U , V ) = max{ηJT Global (U , V ) + (1 − η)JT Local (U , V )} U ,V

I

3. TENSOR GLOBAL−LOCAL STRUCTURE ANALYSIS 3.1. Algorithm Description. Both PCA and LPP have the ability to project the high-dimensional data into a lowerdimensional feature space. PCA focuses on preserving the global Euclidean structure of data without considering local neighborhood relationships among data points, which may result in the loss of the intrinsic geometrical structure of data. On the contrary, LPP takes the preserving of local neighborhood structure into account while ignoring the global Euclidean structure, which may map distant points into a small region in the feature space, leading to the loss of variance information and the destruction of the global structure of data.21 In many real-world applications, both global and local structures of data are very important. To overcoming the shortcoming of PCA and LPP and extend them to deal with three-dimensional data, tensor global−local structure analysis (TGLSA) is proposed by integrating tensor analysis, PCA and LPP. TGLSA aims to find a projection from the original data space to the feature space that optimally preserves both local and global information of data. Given a data set X = {Xi}, i = 1, 2, ..., I in the tensor space 9 K ⊗ 9 J , TGLSA seeks two transformation matrices U ∈ 9 K ⊗ 9 K1 and V ∈ 9 J ⊗ 9 J1 that project Xi ∈ 9 K ⊗ 9 J into Yi ∈ 9 K1 ⊗ 9 J1 (K1 ≤ K, J1 ≤ J), such that Yi represents Xi, Yi = UTXiV. Inspirited by LPP and PCA, the

= max{η ∑ || Yi − Y ̅ ||2 − (1 − η) U ,V

i=1

I

∑ || Yi − Yj ||2 Wij} i ,j=1

(8)

The coefficient η is constrained as 0 ≤ η ≤ 1. The value of η has a great effect on the performance of TGLSA, since it defines the importance of two subobjectives. In particular, TGLSA will be reduced to TLPP with η = 0 and to TPCA (tensor PCA) with η = 1. 3.2. Selection of Parameter η. Inspired by refs 21 and 37 to balance JTGlobal(U,V) and JTLocal(U,V), the selection of η is based on the following principle ηST Global = (1 − η)ST Local

(9)

where STGlobal and STLocal denotes scales of JTGlobal(U,V) and JTLocal(U,V), respectively. Their definition will be given in Subsection 3.3. Actually, except for the principle eq 9, η can also be chosen adaptively for different data sets according to their own features, which makes TGLSA more flexible for practical applications. 3.3. Algorithm Formulation. After the weighted coefficient η was specified, the optimization problem eq 8 can be solved easily. Since ∥A∥2 = tr(AAT), eq 8 is formulated as 18033

dx.doi.org/10.1021/ie402355f | Ind. Eng. Chem. Res. 2013, 52, 18031−18042

Industrial & Engineering Chemistry Research

Article

Similar to ref 33, an iterative calculation method is proposed. Initially, U is set to be the identity matrix, and then V is computed by solving the generalized eigenvector problem

JTGLSA (U , V ) = max{η1JT Global (U , V ) + (1 − η1)JT Local (U , V )} U ,V

I

I

= max{η1 ∑ || Yi − Y ̅ ||2 − (1 − η1) U ,V

i=1

∑ || Yi − Yj ||2 Wij} i ,j=1

T

MU v = λNU v

T

= max{η1tr(U SV U ) − (1 − η1)tr(U (D V − WV )U )} U ,V

Once V is obtained, U is updated by solving the generalized eigenvector problem

= max tr(UT(η1SV − (1 − η1)(D V − WV ))U ) U ,V

= max tr(UTMV U )

MV u = λNV u

U ,V

(10)

ST Local = ρ(WV − DV )

where ρ(·) is the spectral radius of the matrix. According to eq 9, η1 is calculated as η1 =

ρ(WV − DV ) ρ(SV ) + ρ(WV − DV )

(11)

Similarly, because of ∥A∥ = tr(A A), eq 8 is also formulated as T

2

JTGLSA (U , V ) = max{η2JT Global (U , V ) + (1 − η2)JTLocal (U , V )} U ,V

I

I

= max{η2 ∑ || Yi − Y ̅ ||2 − (1 − η2) U ,V

i=1

∑ || Yi − Yj ||2 Wij} i ,j=1

T

T

= max{η2tr(V SU V ) − (1 − η2)tr(V (D U − WU )V )} U ,V

= max tr(VTMU V ) U ,V

(12)

where SU = ∑i(Xi − X̅ ) UU (Xi − X̅ ), DU = WU = ∑i,jWijXTj UUTXi. The weighted coefficient η2 is T

η2 =

∑iDiiXTi UUXi,

T

ρ(WU − DU ) ρ(SU ) + ρ(WU − DU )

and

(13)

Since a smaller Dii denotes a more “important” Yi and to avoid the singularity of DV and DU in the undersampled situation, the following constraints are imposed on objective functions eqs 10 and 12, respectively UT(η1I + (1 − η1)DV )U = UTNV U = I K1

(14)

VT(η2I + (1 − η2)DU )V = VTNU V = I J

(15)

1

where I is the identity matrix. Finally, the optimization problems of TGLSA are max tr(UTMV U ) U ,V

s.t . UTNV U = I K1

(16)

max tr(VTMU V ) U ,V

s.t . VTNU V = I J

1

(19)

Thus, generalized eigenvectors of eqs 18 and 19 can be finally obtained after iteration calculation, and the desired V and U are constructed from the J1 and K1 largest eigenvalues of eqs 18 and 19 as V = [v1, v2, ..., vJ1] and U = [u1, u2, ..., uK1]. In present paper, to facilitate the construction of SPD statistic for fault detection, J1 and K1 are set to J1 = J and K1 = K, respectively. 3.4. Algorithm Analysis. To give a deep perspective of the proposed algorithm, it is worthwhile to highlight several characteristics of TGLSA: (1) Similar to LPP and PCA, TGLSA is a linear projection algorithm that provides a linear mapping from the original space to the feature space. However, TGLSA takes both global and local structures of the data set into account by integrating LPP and PCA, and thus both variance information and local neighborhood structure of the data set are preserved after projection. TGLSA has the advantages of PCA and LPP, and thus it is capable of revealing the intrinsic structure of the data set and gives a more faithful representation of the data set than either PCA or LPP does. In fact, TLPP can be considered as a particular form of TGLSA with η = 0. The other particular form of TGLSA, namely, TPCA (tensor PCA), can be obtained with η = 1. (2) TGLSA extends the GLSA algorithm proposed by Zhang et al.21 from the two-dimensional data space to the threedimensional data space by integrating tensor analysis with GLSA. However, TGLSA is not simply a tensor version of GLSA. An important distinction between TGLSA and GLSA is that they impose different constraints on their objective functions. The orthogonal constraint is adopted in GLSA,21 which is same as PCA. The constraint of TGLSA is actually a combination of the orthogonal constraint and the constraint of LPP (see eqs 14 and 15). There are two advantages of this constraint. One is that, similar to LPP, items (1 − η1)UTDVU and (1 − η2) VTDUV in constraints take the importance of each projection point Yi into account. These items make TGLSA possess similar excellent properties as LPP. The other is that the item ηI in constraints makes TGLSA avoid the singularity problem of LPP in the undersampled situation. (3) Most previous multiway-based analysis approaches for three-dimensional data sets, such as MPCA and MLPP, are based on a vector representation of each data point. Therefore, an unfolding procedure is carried out to convert the three-dimensional data set to a two-dimensional data matrix before performing PCA or LPP. Such unfolding procedure may lead to information loss. However, TGLSA performs a tensor projection on the threedimensional data set and maintains the matrix representation of each data point. This makes TGLSA more applicable for three-dimensional data sets.

where SV = ∑i(Xi − X̅ )VVT(Xi − X̅ )T, DV = ∑iDiiXiVVTXTi , WV = ∑i,jWijVVTXTj , X̅ = ∑iXi/I. D is a diagonal matrix with elements Dii = ∑jWij. Thus, scales of JTGlobal(U,V) and JTLocal(U,V) can be defined as ST Global = ρ(SV ),

(18)

(17)

Thus, the optimal U and V can be computed by solving generalized eigenvector problems (MV, NV) and (MU, NU). However, these two generalized eigenvector problems cannot be solved independently, because they depend on each other. 18034

dx.doi.org/10.1021/ie402355f | Ind. Eng. Chem. Res. 2013, 52, 18031−18042

Industrial & Engineering Chemistry Research

Article

Figure 1. Unfolding−normalizing−folding procedure for training data set.

Yi = UTXiV

(4) The computation of TGLSA is mainly divided into three steps: search of k nearest neighbors, calculation of the parameter η, and solving eigenvector problems iteratively. The computational complexity of each step is O[(J × K + k)I2], O(J2 + K2), and O(J3 + K3), respectively. As a result, the complexity of TGLSA is about O[(J × K + k) I2 + (J3 + K3)]. In contrast, TLPP solves a k nearest neighbors search and two eigenvector problems, and thus it has a computational complexity of O[(J × K + k)I2 + (J3 + K3)]. The computational complexities of MPCA and MLPP are O(dK2J2) (where d is the dimension of the projection space) and O[(J × K + k)I2 + J3K3], respectively. Therefore, the computational complexity of TGLSA is at the same level of TLPP and larger than MPCA but smaller than MLPP.

(20)

where U and V are two transformation matrices with size of K × K and J × J, respectively, and Yi with size of K × J is a representation of Xi. For a new batch data, Xnew(K × J), it can be projected to the TGLSA-based NOC model as Ynew = UTXnewV. 4.2. Fault Detection and Diagnosis. The SPE (squared prediction error) statistic and T2 statistic are widely used in multiway-based monitoring methods,2 such as MPCA and MLPP. However, it is difficult to define these two statistics for TGLSA, because TGLSA deals with a three-dimensional data set directly and does not explicitly divide the data set into a score space plus a residual space. In this paper, two new statistics are developed for fault detection: SPD statistic and R2 statistic. The SPD statistic is a measure of the difference between sample Xi and its projection value Yi, which is computed as

4. BATCH PROCESS MONITORING WITH TGLSA 4.1. Offline Analysis of Batch Data. Let X̲ (I × K × J) denote a training data set of a batch process, where I is the number of batches, J is the number of variables, and K is the number of samples over the duration of the batch. Representing each batch data Xi(K × J), i = 1, 2, ..., I, as a second order tensor in the tensor space 9 K ⊗ 9 J , a normal operating condition (NOC) model can be built by TGLSA. Before constructing the NOC model, the data set X̲ (I × K × J) should be preprocessed to remove the nonstationary trajectories.1−3 As shown in Figure 1, the whole preprocessing procedure consists of three steps: the first step is unfolding X̲ (I × K × J) in the batch direction to a matrix X̲ (I × KJ) as the method used by Nomikos and MacGregor,1 then the batch-wise normalization is performed on X̲ (I × KJ) to highlight the variations between batches,17 and the final step is refolding the normalized matrix X(I × KJ) back to X(I × K × J). Although the unfolding procedure performed in this paper is similar to that of MPCA, their objectives are totally different. The unfolding procedure of MPCA is mainly used to obtain a two-dimensional data matrix for modeling by PCA. However, the unfolding procedure used here is just for normalization purposes. After normalization, the unfolded data set X(I × KJ) is refolded back to a threedimensional data set X(I × K × J). Therefore, this unfolding procedure will not bring about the loss of data structure. TGLSA is performed on the normalized training data set X(I × K × J), and the TGLSA-based NOC model is developed as

KJ

SPD =

KJ

∑ ekj 2 = ∑ (xkj − ykj )2 kj = 1

(21)

kj = 1

where xkj is the value of variable j at sample time k, and ykj is the projection value of xkj that is calculated from the NOC model. To determine whether the process is operated under normal conditions or not, a proper control limit must be defined. In this paper, the control limit of SPD statistic is obtained using the weighted χ2-distribution2 SPDα = gχh, α 2 ,

g=

v , m

h=

2m2 v

(22)

where m and v are the mean and variance of SPD statistics calculated from all the training batches and χh,α2 is the critical value of the chi-squared variable with h degrees of freedom at significance level α. Inspired by the SVDD method, a STDD (support tensor domain description) method is developed, and a new statistic named R2 statistic is proposed for fault detection. STDD constructs a hypersphere shaped decision boundary with minimal volume around a set of data points by support tensors describing the hypersphere boundary. A detail description of STDD is provided in Appendix A. Let Yk is the projection of Xk, i.e., Yk = UTXkV, and the R2 statistic of Xk is defined as the distance of Yk to the center of hypersphere 18035

dx.doi.org/10.1021/ie402355f | Ind. Eng. Chem. Res. 2013, 52, 18031−18042

Industrial & Engineering Chemistry Research R2 = tr(YkYkT ) − 2 ∑ αitr(YkYiT ) +

Article

T ∑ αiαjtr(YY i j )

i

Table 1. Variables Used in the Monitoring of the Benchmark Fed-Batch Penicillin Fermentation Process

(23)

i,j

where αi are Lagrange multipliers. For a new batch data Xnew, its R2 statistic is R new 2 = tr(YnewYnew T ) − 2 ∑ αitr(YnewYiT ) i

+



T αiαjtr(YY i j )

(24)

i,j

where Ynew = UTXnewV. To reduce the conservatism of monitoring results, the radius of the hypersphere is not directly considered as the control limit of the R2 statistic. Taking the outliers in the training data set into account, the control limit of the R2 statistic is calculated using KDE (kernel density estimation).38,39 Given a univariate sample with n sample points, {x1, x2, ..., xn} ∈ R, from a population distribution density f(x), a general estimator for f(x) is defined as38,39 1 nh

f ̂ (x , h) =

n

⎛ x − xi ⎞ ⎟ h ⎠

∑ K ⎜⎝ i=1

ekj 2 = (xkj − ykj )2

fault no. 1 2 3 4 5 6

T R new 2 = tr(YnewYnew ) − 2 ∑ αitr(YnewYiT ) + i

T ∑ αiαjtr(YY i j ) i ,j

T ∑ xkjtr(ukTbTj VYnew ) − 2 ∑ xkjαitr(ukT bTj VYiT ) i ,j ,k T ∑ αiαjtr(YY i j ) i ,j

(27)

where uk is the kth row of U, bj is a column vector with the jth element of 1 and others of 0, and the contribution of variable j at sample time k to the R2 statistic is defined as T ckj = xkjtr(ukT bTj VYnew ) − 2 ∑ xkjαitr(ukT bTj VYiT ) i

fault type 10% step increase in aeration rate 10% step increase in agitator power input 10% step increase in substrate feeding rate 5 W ramp decrease in agitator power input 2 L/h ramp decrease in aeration rate 10% step decrease in substrate feeding rate

occurrence time (h)

termination time (h)

100 100

200 200

100

200

100

200

100

200

100

200

WMNOC model is also updated periodically using the most recent samples. The key of the moving window technique is the selection of window length. However, it is difficult to choose an appropriate window length due to the lack of efficient quantitative calculation methods.40 Generally, the optimal window length depends on the sampling interval and the number of variables used in the monitoring as well as characteristics of the fault.3,40 A larger window length provides sufficient data for modeling but may lead to time delays for detecting process changes and the loss of fault sensitivity. On the contrary, a smaller window length responds to process changes quickly but may not provide sufficient data for modeling the real-time process operation conditions and makes it easier to enlarge the effect of measurement noises. The optimal window length was commonly selected based on experience or trials in most of the previous studies.3,40 In practice, either a fixed or adaptive window length can be used.41 In this study, the selection criterion of window length is how fast and correctly the WMNOC model detects known faults in the validation data set. Three indexes, i.e., fault detection delay (FDD), fault detection rate (FDR), and false alarm rate (FAR), are used to quantify the performance of different window lengths. FDD is the number of samples before the monitoring statistic exceeds the control limit after a fault occurs. FDR is the percentage of samples outside the control limit in fault

(26)

k ,j

aeration rate (L/h) agitator power input (W) substrate feeding rate (L/h) substrate feeding temperature (K) substrate concentration (g/L) DO saturation (%) biomass concentration (g/L) penicillin concentration (g/L) culture volume (L) CO2 concentration (mmol/L) pH bioreactor temperature (K) generated heat (kcal/h) acid feeding rate (mL/h) base feeding rate (mL/h) cold water flow rate (L/h) hot water flow rate (L/h)

Table 2. Six Fault Batches in the Benchmark Fed-Batch Penicillin Fermentation Process

Rewriting eq 24 as

+

process variables

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

(25)

where x is a data point under consideration, h is the smoothing parameter or bandwidth, and K(·) is a kernel function. In this paper, the Gaussian kernel function is used, and the optimal bandwidth is selected by the MISE (mean integrated squared error) cross validation. After obtaining the density function with KDE, the estimated control limit Rα2 for the R2 statistic at significance level α can be calculated. Thus, faults will be detected if Rnew2 > Rα2. After a fault has been successfully detected, it is important to locate the fault variable. In this paper, the contribution plot is used for fault diagnosis. The process variable with the highest contribution is possibly the cause of the fault. According to the definition of the SPD statistic, the contribution of variable j at sample time k to the SPD statistic is

=

number

(28)

4.3. Online Monitoring Strategy. To follow up time variant dynamics of the batch process and to avoid the prediction of unmeasured data, a moving window technique is applied for online monitoring. On the basis of this technique, a window mode NOC model (WMNOC) is constructed from data within the window. As the window slides along the time by adding the newest data and excluding the oldest ones, the 18036

dx.doi.org/10.1021/ie402355f | Ind. Eng. Chem. Res. 2013, 52, 18031−18042

Industrial & Engineering Chemistry Research

Article

Table 3. Fault Detection Delay (FDD, h), Fault Detection Rate (FDR, %), and False Alarm Rate (FAR, %) of SPD Statistic Corresponding to Different Window Lengths for Four Faults fault no. 1

fault no. 2

fault no. 3

fault no. 4

window length

FDD

FDR

FAR

FDD

FDR

FAR

FDD

FDR

FAR

FDD

FDR

FAR

2 3 5 10 20

0 0 0 0 0

99 100 100 100 100

0 0 0 0 0

0 0 0 0 0

99 100 100 100 100

0 0 0 0 0

0 0 0 0 0

100 100 100 100 100

0 0 0 0 0

0 0 0 0 0

99 100 100 100 100

0 0 0 0 0

Table 4. Fault Detection Delay (FDD, h), Fault Detection Rate (FDR, %), and False Alarm Rate (FAR, %) of R2 Statistic Corresponding to Different Window Lengths for Four Faults fault no. 1

fault no. 2

fault no. 3

fault no. 4

window length

FDD

FDR

FAR

FDD

FDR

FAR

FDD

FDR

FAR

FDD

FDR

FAR

2 3 5 10 20

0 0 0 0 0

100 100 100 100 100

0 0 0 0 0

0 0 0 0 0

99 100 100 100 100

0 0 0 0 0

0 0 0 0 0

99 100 100 100 100

0 0 0 0 0

0 0 0 0 0

99 100 100 100 100

0 0 0 0 0

(4) Check whether SPD or R2 is outside the control limit or not. (5) If SPD or R2 exceeds the control limit, diagnose which process variable causes the fault using the contribution plot; otherwise, keep on monitoring the next sample time k + 1 and return back to step (1).

conditions. FAR is the percentage of samples outside the control limit before a fault occurs. Consequently, smaller FDD and FAR as well as a larger FDR are regarded to perform better in detecting faults. 4.4. Complete Monitoring Procedure. Normal batch data are used to develop the WMNOC model. The SPD statistic and R2 statistic are used for fault detection and diagnosis. The moving window technique is applied for online monitoring. A complete monitoring procedure for a batch process is as follows. WMNOC model developing procedure: (1) Construct the training data set from normal batches. (2) Select the length l of moving window. (3) Present data of batch i until sample time k in the moving window (from sample time k − l + 1 to sample time k) as ⎡x i xi ⎢ k − l + 1,1 k − l + 1,2 ⎢ i xk − l + 2,1 xki − l + 2,2 Xi(k − l + 1: k , 1: J ) = ⎢ ⎢ ⋮ ⋮ ⎢ i ⎢⎣ xk ,1 xki ,2

5. CASE STUDY OF A BENCHMARK FED-BATCH PENICILLIN FERMENTATION PROCESS The simulator Pensim 2.0 developed by Birol et al.42 is a wellknown tool for simulating the fed-batch penicillin fermentation process. Its simulation data are frequently used to test the performance of various monitoring methods for batch processes.3,4,10,11,43,44 In this paper, a training data set containing 80 batches was produced by running the simulator repeatedly under normal operation conditions with small variations. The duration of each batch is 400 h. The sampling interval is 1 h. Each batch contains 17 variables, as listed in Table 1. The training data set was applied to develop the TGLSA-based WMNOC model. Furthermore, additional 6 fault batches were generated for testing, as shown in Table 2. In order to determine the best length of moving data window, different window lengths (2, 3, 5, 10, and 20) were tested for fault detection. FDD, FDR, and FAR of SPD and R2 statistics are used to evaluate the monitoring performance. Their values corresponding to different window lengths for four faults (fault nos. 1−4 in Table 2) are shown in Table 3 and Table 4. It can be seen that both statistics show perfect monitoring performance if the window length is larger than 2. However, a larger window length leads to a heavier computation burden. As a compromise between monitoring performance and computation efficiency, the window length is selected as 5. TGLSA was compared with MPCA and TLPP to evaluate its monitoring performance. The same window length was applied to these three methods. For the MPCA model, the number of retained PCs was chosen using the cumulative percent variance (CPV) method,27,28 where the desired CPV was set as 85%. The fault batch no. 5 in Table 2 was selected for testing. Monitoring charts are shown in Figure 2. It is evident that the other five statistics perfectly detected the fault with no FDD and FAR

··· xki − l + 1, J ⎤ ⎥ ⎥ i ··· xk − l + 2, J ⎥ ⋱ ⋮ ⎥ ⎥ ··· xki , J ⎥⎦

(4) Normalize training data in the window to zero-mean and unit-variance. (5) Perform TGLSA to develop the WMNOC model. (6) Calculate control limits of SPD and R2 statistics. Online monitoring procedure: (1) For a new batch, present its data until sample time k in the moving window as ⎡ xknew x new ⎢ − l + 1,1 k − l + 1,2 new ⎢ x new k − l + 2,1 xk − l + 2,2 X new (k − l + 1: k , 1: J ) = ⎢ ⎢ ⋮ ⋮ ⎢ new new xk ,2 ⎢⎣ xk ,1

⎤ ··· xknew − l + 1, J ⎥ ⎥ ··· xknew − l + 2, J ⎥ ⋱ ⋮ ⎥ ⎥ ··· xknew ⎥⎦ ,J

Normalize it with mean and variance obtained from the WMNOC modeling procedure. (2) Project the new batch data onto the WMNOC model. (3) Compute SPD and R2 statistics for the new batch. 18037

dx.doi.org/10.1021/ie402355f | Ind. Eng. Chem. Res. 2013, 52, 18031−18042

Industrial & Engineering Chemistry Research

Article

Figure 2. Monitoring charts of TGLSA, MPCA, and TLPP for fault batch no. 5.

Table 5. FDD (h), FDR (%), and FAR (%) of SPD or SPE Statistic for Fault Batch No. 6 with Different Numbers of Training Batches TGLSA

MPCA

TLPP

MGLSAb

MGLSA

batch number

FDD

FDR

FAR

FDD

FDR

FAR

FDD

FDR

FAR

FDD

FDR

FAR

FDD

FDR

FAR

5 10 20 50 80

1 0 0 0 0

99 100 100 100 100

0 0 0 0 0

0 0 0 0

99 100 100 100 100

22 0 0 0 0

1 0 0 0 0

99 100 100 100 100

0 0 0 0 0

0 0 0

99 100 100 100 100

19 51 0 0 0

0 0 0 0

99 100 100 100 100

18 4 0 0 0

except for the TLPP-based R2 statistic. The TLPP-based R2 statistic successfully detects the fault that occurred at 100 h (see Figure 2f), while it has a false alarm rate (FAR) of 4%.

On the other hand, the effect of the number of training batches on monitoring performance was investigated for four methods: TGLSA, MPCA, TLPP, and MGLSA. Five different 18038

dx.doi.org/10.1021/ie402355f | Ind. Eng. Chem. Res. 2013, 52, 18031−18042

Industrial & Engineering Chemistry Research

Article

Table 6. FDD (h), FDR (%), and FAR (%) of R2 or T2 Statistic for Fault Batch No. 6 with Different Numbers of Training Batches TGLSA

MPCA

TLPP

MGLSAb

MGLSA

batch number

FDD

FDR

FAR

FDD

FDR

FAR

FDD

FDR

FAR

FDD

FDR

FAR

FDD

FDR

FAR

5 10 20 50 80

1 0 0 0 0

99 98 100 100 100

1 0 0 0 0

4 0 0 0 0

52 75 100 100 100

0 0 0 0 0

2 0 0 0

98 100 100 100 100

16 16 8 48 12

3 3 1 1 1

81 58 99 99 99

0 0 0 0 0

3 0 0 0 0

85 95 100 100 100

0 0 0 0 0

Table 7. Mean Values of η1 and η2 in All Moving Windows for Fault Batch No. 6 with Different Numbers of Training Batches batch number 5

10

20

50

80

η1

0.7837

0.7726

0.7538

0.6803

0.6662

η2

0.7814

0.7410

0.5932

0.4805

0.4514

numbers (80, 50, 20, 10, and 5) of training batches were tested for fault batch no. 6 in Table 2. FDD, FDR, and FAR corresponding to different numbers of training batches are compared in Table 5 and Table 6. It is indicated that, as the training batch number decreases, the monitoring performance becomes worse. The decrease of monitoring performance is very small for TGLSA. However, for MPCA and MGLSA, their monitoring performance degrades obviously. When the number of training batches decreases to 5, the FAR of MPCA-based SPE statistic increases to 22% and the FDR of MPCA-based T2 statistic decreases to 52%. MGLSA shows the worst performance with 5 training batches, where the FAR of SPE statistic reaches to 51% and the FDR of T2 statistic is only 58%. For small numbers of training batches, the SPD statistic of TLPP exhibits similar monitoring performance as that of TGLSA, while the R2 statistic of TLPP presents a larger FAR than that of TGLSA. These results show that, compared to TLPP, MPCA, and MGLSA, TGLSA has better monitoring performance when the number of training batches is small. As shown in Table 6, the TLPP-based R2 statistic always gives a larger FAR than the TGLSA-based R2 statistic, especially for the training batch number of 50. The reason may be that TLPP projects training data into a small region since it only focuses on preserving the local neighborhood structure of the data set, which leads to a very small control limit for the R2 statistic (see Figure 2f). As a result, the false alarm easily occurs. However, TGLSA overcomes this problem by preserving both global and local structures of the data set. It is evident that MGLSA has bad performance when the training batch numbers are 5 and 10. There may be two main reasons for this. One is that the data unfolding from the threedimensional space to the two-dimensional space induces the loss of data structure. The other is that the orthogonal constraint in GLSA may be too strict. By replacing the orthogonal constraint aTa = 1 of GLSA in the paper of Zhang et al.21 with the combined constraint aT(ηI + (1 − η)XDXT)a = 1, another type of GLSA named GLSAb can be obtained. The monitoring results of multiway GLSAb (MGLSAb) corresponding to different training batch numbers are listed in the column of MGLSAb in Tables 5 and 6. Obviously, the performance of MGLSAb is better than MGLSA, especially for the training batch number of 10.

Figure 3. Contribution plots of TGLSA-based SPD and R2 statistics at 100 h for fault batch no. 5.

For different training batch numbers, the mean values of η1 and η2 in all moving windows are listed in Table 7. It can be found that the mean values of both η1 and η2 increase with the number of training batches decreasing. For training batch numbers of 5 and 10, mean values of η1 and η2 are both larger than 0.74. This means that the local structure of the training data set becomes more important than the global structure. In this case, a more faithful representation of the training data set in the low-dimensional data space should well preserve the local structure of the data set. Since MPCA only seeks to preserve the global structure of the data set, it is reasonable that it has a worse monitoring performance. On the contrary, by adjusting the weighted coefficient η according to intrinsic features of the data set, TGLSA well balances the preserving of local and global structures. This property makes TGLSA more adaptable and flexible in practice, and it also guarantees that TGLSA has a better monitoring performance in most cases. After the fault was detected, the contribution plot is used to diagnose the fault variable. For fault batch no. 5 in Table 2, 18039

dx.doi.org/10.1021/ie402355f | Ind. Eng. Chem. Res. 2013, 52, 18031−18042

Industrial & Engineering Chemistry Research

Article

contribution plots of TGLSA-based SPD and R2 statistics at 100 h are shown in Figure 3. Figure 3 indicates that the variable 1 (i.e., aeration rate) provides the largest contribution to the deviation of the process from the normal operation. Therefore, the aeration rate is identified to be the root cause of the fault. This result is consistent with the preset fault variable (see Table 2).

n

min R2 + C ∑ ξi

R , A , ξi

s.t . || Xi − A ||2 ≤ R + ξi ,

ξi ≥ 0

(A1)

where A and R are the center and the radius of the hypersphere, the parameter C is a penalty factor that controls the trade-off between the volume of hypersphere and the number of target data points rejected, ξi is a slack variable, and ∥·∥ is the Frobenius norm of the matrix. Incorporating the constraints into the objective function by using Lagrange multipliers

6. CONCLUSION A new data projection method named tensor global−local structure analysis (TGLSA) was proposed in this paper. Different from principal component analysis (PCA) and locality preserving projections (LPP), TGLSA takes both global and local structures of data into account, and it introduces a weighted coefficient η to make a trade-off between them. Actually, tensor principal component analysis (TPCA) and tensor locality preserving projections (TLPP) can be considered as two particular forms of TGLSA with η = 1 and η = 0. Therefore, TGLSA preserves more meaningful information after projection and gives a more faithful representation of data than MPCA and TLPP. Because η can be adaptively determined for different data according to their own features, TGLSA is more flexible for practical applications. Moreover, in comparison with multiway-based methods (e.g., MPCA, MLPP, and MGLSA), TGLSA avoids the loss of data structure caused by the data unfolding procedure, because it performs a tensor projection on the three-dimensional data set and maintains the matrix representation of each data point. This makes TGLSA more applicable for three-dimensional data sets. A TGLSA-based online monitoring method was then developed for batch processes. The moving window technique was applied to capture process dynamics. Two new statistics were built for fault detection and diagnosis. One is the SPD statistic, which is defined as the difference between the sample and its projection value. The other one is the R2 statistic, which is proposed by extending the support vector domain description (SVDD) to the support tensor domain description (STDD). The TGLSA-based monitoring method was tested in a benchmark fed-batch penicillin fermentation process. It exhibited a good performance in detecting different faults. Both SPD and R2 statistics successfully detected faults and identified fault variables. The TGLSA-based monitoring method was also compared with those based on MPCA, TLPP, and MGLSA. It is indicated that the TGLSA-based monitoring method outperforms other methods in terms of smaller fault detection delays (FDD), higher fault detection rate (FDR), and lower false alarm rate (FAR), especially in the case of a small training data set. The results showed that the TGLSA-based monitoring method has good potential for practical applications. For further study, the kernel TGLSA for nonlinear batch process monitoring would be investigated.



i=1

L(R , A , αi , γ1 , ξi) = R2 + C ∑ ξi i



∑ αi{R2 + ξi − || Xi − A ||2 } i



∑ γξi i

(A2)

i

with Lagrange multipliers αi ≥ 0 and γi ≥ 0. Setting the partial derivatives of the function L(·) to zero gives some new constraints: ∂L =0: ∂R

∑ αi = 1

∂L =0: ∂A

A=

∂L =0: ∂ξi

C − αi − γi = 0

(A3)

i

∑ αiXi

(A4)

i

(A5)

Because αi ≥ 0 and γi ≥ 0, the constraint eq A5 can be replaced by 0 ≤ αi ≤ C

(A6)

Substituting eqs A3−A5 into eq A2 results in L=

∑ αitr(XiXiT ) − ∑ αiαjtr(XiXTj ) i

(A7)

i,j

which subjects to constraints eqs A3 and A6. A set of αi can be computed by maximizing eq A7. Data points with αi = 0 are inside the hypersphere. Data points corresponding to 0 < αi < C are called support tensors. Data points with αi = C are outside the hypersphere and considered as outliers. The center and the radius of the hypersphere are

A=

∑ αiXi

(A8)

i

R2 = tr(XkXkT ) − 2 ∑ αitr(XkXiT ) + i

∑ αiαjtr(XiXTj ) i,j

(A9)

APPENDIX A

where Xk is any support tensor. A new object X0 is accepted when its distance to the center A of the hypersphere is smaller than or equal to the radius R2:

Assuming {Xi}, i = 1, 2, ..., n is a data set in the tensor space containing n matrix data points, the support tensor domain description (STDD) tries to find a hypersphere described by center A and radius R with minimum volume that contains all (or most of) the data points. Similar to the SVDD method, we construct the following minimization problem to obtain the hypersphere

|| X 0 − A ||2 = tr(X 0X 0T ) − 2 ∑ αitr(X 0XiT ) i

+

∑ i,j

18040

αiαjtr(XiXTj )

2

≤R

(A10) dx.doi.org/10.1021/ie402355f | Ind. Eng. Chem. Res. 2013, 52, 18031−18042

Industrial & Engineering Chemistry Research

Article

If replacing X and tr(XiXTj )in eqs A1−A10 with an implicit mapping Φ(X) and a kernel function K(Xi, Xj) = tr(Φ(Xi)Φ(Xj)T), a kernel STDD (KSTDD) can be obtained.



(18) Qin, S. J. Survey on data-driven industrial process monitoring and diagnosis. Annu. Rev. Control 2012, 36, 220−234. (19) MacGregor, J. F.; Cinar, A. Monitoring, fault diagnosis, faulttolerant control and optimization: Data driven methods. Comput. Chem. Eng. 2012, 47, 111−120. (20) Ge, Z.; Song, Z.; Gao, F. Review of recent research on databased process monitoring. Ind. Eng. Chem. Res. 2013, 52, 3543−3562. (21) Zhang, M.; Ge, Z.; Song, Z.; Fu, R. Global−local structure analysis model and its application for fault detection and identification. Ind. Eng. Chem. Res. 2011, 50, 6387−6848. (22) He, X. F.; Niyogi, P. Locality preserving projections. In Proceedings of the Conference on Advances in Neural Information Processing Systems, December 8−13, 2003, Vancouver, Canada; MIT Press: Cambridge, MA, 2004. (23) Hu, K. L.; Yuan, J. Q. Multivariate statistical process control based on multiway locality preserving projections. J. Process Control 2008, 18, 797−807. (24) Shao, J. D.; Rong, G.; Lee, J. M. Generalized orthogonal locality preserving projections for nonlinear fault detection and diagnosis. Chem. Intell. Lab. Syst. 2009, 96, 75−83. (25) Tax, D.; Duin, R. Support vector domain description. Pattern Recogn. Lett. 1999, 20, 1191−1199. (26) Wold, S.; Esbensen, K.; Geladi, P. Principal component analysis. Chem. Intell. Lab. Syst. 1987, 2, 37−52. (27) Jackson, J. E. A user’s guide to principal components; Wiley: New York, 1991. (28) Jolliffe, I. T. Principal component analysis; Springer: New York, 2002. (29) Wold, S. Cross validatory estimation of the number of components in factor and principal component analysis. Technometrics 1978, 20, 397−405. (30) He, X. F.; Cai, D.; Niyogi, P. Tensor subspace analysis. In Proceedings of the Conference on Advances in Neural Information Processing Systems; December 5−8, 2005, Vancouver, Canada; MIT Press: Cambridge, MA, 2006. (31) Tax, D. M. J.; Duin, R. P. W. Support vector data description. Mach. Learn. 2004, 54, 45−66. (32) Belkin, M.; Niyogi, P. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 2003, 15, 1373− 1396. (33) Belkin, M.; Niyogi, P. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Proceedings of the Conference on Advances in Neural Information Processing Systems; December 3−8 2001, Vancouver, Canada; MIT Press: Cambridge, MA, 2002. (34) Roweis, S. T.; Saul, L. K. Nonlinear dimensionality reduction by locally linear embedding. Science 2000, 290, 2323−2326. (35) Ehrgott, M. Multicriteria optimization; Springer: Berlin, 2005. (36) Miettinen, K. M. Nonlinear multiobjective optimization; Kluwer Academic Publishers: Boston, 1999. (37) Liu, Q.; Tang, X.; Lu, H.; Ma, S. Face Recognition Using Kernel Scatter-Difference-Based Discriminant Analysis. IEEE Trans. Neural Network 2006, 17, 1081−1085. (38) Martin, E. B.; Morris, A. J. Non-parametric confidence bounds for process performance monitoring charts. J. Process Control 1996, 6, 349−358. (39) Chen, Q.; Wynne, R. J.; Goulding, P.; Sandoz, D. The application of principal component analysis and kernel density estimation to enhance process monitoring. Control Eng. Pract. 2000, 8, 531−543. (40) Zhao, C. H.; Wang, F. L.; Jia, M. X. Dissimilarity analysis based batch process monitoring using moving windows. AIChE J. 2007, 53, 1267−1277. (41) He, X. B.; Yang, Y. P. Variable MWPCA for adaptive process monitoring. Ind. Eng. Chem. Res. 2008, 47, 419−427. (42) Birol, G.; Undey, C.; Cinar, A. A modular simulation package for fed-batch fermentation: penicillin production. Comput. Chem. Eng. 2002, 26, 1553−1565.

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work was supported by the National Natural Science Foundation of China (No. 61304116), the Zhejiang Provincial Natural Science Foundation of China (No. LQ13B060004), and the “Five-twelfth” National Science and Technology Support Program (No. 2011BAK06B02-03).



REFERENCES

(1) Nomikos, P.; MacGregor, J. F. Monitoring batch processes using multiway principal component analysis. AIChE J. 1994, 40, 1361− 1375. (2) Nomikos, P.; MacGregor, J. F. Multivariate SPC charts for monitoring batch processes. Technometrics 1995, 37, 41−59. (3) Hu, K. L.; Yuan, J. Q. Batch process monitoring with tensor factorization. J. Process Control 2009, 19, 288−296. (4) Alvarez, C. R.; Brandolin, A.; Sánchez, M. C. Batch process monitoring in the original measurement’s space. J. Process Control 2010, 20, 716−725. (5) Sun, W.; Meng, Y.; Palazoglu, A.; Zhao, J.; Zhang, H.; Zhang, J. A method for multiphase batch process monitoring based on auto phase identification. J. Process Control 2011, 21, 627−638. (6) Nomikos, P.; MacGregor, J. F. Multi-way partial least squares in monitoring batch processes. Chemom. Intell. Lab. Syst. 1995, 30, 97− 108. (7) Chen, J.; Liu, K. C. On-line batch process monitoring using dynamic PCA and dynamic PLS models. Chem. Eng. Sci. 2002, 57, 63− 75. (8) Yoo, C. K.; Lee, J. M.; Vanrolleghem, P. A.; Lee, I. B. On-line monitoring of batch processes using multiway independent component analysis. Chemom. Intell. Lab. Syst. 2004, 71, 151−163. (9) Lee, J. M.; Yoo, C.; Lee, I. B. Fault detection of batch processes using multiway kernel principal component analysis. Comput. Chem. Eng. 2004, 28, 1837−1847. (10) Tian, X. M.; Zhang, X. L.; Deng, X. G.; Chen, S. Multiway kernel independent component analysis based on feature samples for batch process monitoring. Neurocomputing 2009, 72, 1584−1596. (11) Yu, J. Nonlinear bioprocess monitoring using multiway kernel localized fisher discriminant analysis. Ind. Eng. Chem. Res. 2011, 50, 3390−3402. (12) Rannar, S.; MacGregor, J. F.; Wold, S. Adaptive batch monitoring using hierarchical PCA. Chemom. Intell. Lab. Syst. 1998, 41, 73−81. (13) Li, W. H.; Yue, H. H.; Valle-Cervantes, S.; Qin, S. J. Recursive PCA for adaptive process monitoring. J. Process Control 2000, 10, 471−486. (14) Lu, N. Y.; Yao, Y.; Gao, F. R. Two-dimensional dynamic PCA for batch process monitoring. AIChE J. 2005, 51, 3300−3304. (15) Camacho, J.; Picó, J. Online monitoring of batch processes using multi-phase principal component analysis. J. Process Control 2006, 16, 1021−1035. (16) Yoon, S.; MacGregor, J. F. Principle-component analysis of multiscale data for process monitoring and fault diagnosis. AIChE J. 2004, 50, 2891−2903. (17) Yao, Y.; Gao, F. A survey on multistage/multiphase statistical modeling methods for batch processes. Annu. Rev. Control 2009, 33, 172−183. 18041

dx.doi.org/10.1021/ie402355f | Ind. Eng. Chem. Res. 2013, 52, 18031−18042

Industrial & Engineering Chemistry Research

Article

(43) Chang, H.; Chen, J.; Ho, Y. Batch process monitoring by wavelet transform based fractal encoding. Ind. Eng. Chem. Res. 2006, 45, 3864−3879. (44) Yu, J.; Qin, S. J. Multiway gaussian mixture model based multiphase batch process monitoring. Ind. Eng. Chem. Res. 2009, 48, 8585−8594.

18042

dx.doi.org/10.1021/ie402355f | Ind. Eng. Chem. Res. 2013, 52, 18031−18042