Batch Process Monitoring with GTucker2 Model - Industrial

Sep 4, 2014 - In a word, GTucker2 shows the best monitoring performance, ..... of the 7th International Conference on Computer Application in Biotechn...
0 downloads 0 Views 1MB Size
Article pubs.acs.org/IECR

Batch Process Monitoring with GTucker2 Model Lijia Luo,*,† Shiyi Bao,† Zengliang Gao,† and Jingqi Yuan‡ †

College of Mechanical Engineering, Zhejiang University of Technology, Hangzhou, China Department of Automation, Shanghai Jiao Tong University, Shanghai, China



ABSTRACT: In this paper, the GTucker2 model is proposed for monitoring both even-length and uneven-length batch processes. The GTucker2 model has two prominent advantages. The first one is that it performs tensor decomposition on the three-way data array and thus avoids potential problems of information loss and “curse of dimensionality” induced by data unfolding. The second one is that it solves the uneven-length problem in a “natural” way without using batch trajectory synchronization, which prevents distorting data and fault patterns and guarantees higher modeling and monitoring precisions. An online batch process monitoring method is then developed by integrating GTucker2 with the moving data window technique. Three monitoring statistics named Q, R2, and T2 statistics are constructed for fault detection and diagnosis. The effectiveness and advantages of the GTucker2-based monitoring method are illustrated by two case studies in a benchmark fed-batch penicillin fermentation process.

1. INTRODUCTION Batch processes have been widely used to manufacture lowvolume and high-value-added products in many industrial sectors, including pharmaceutical, biochemical, food, and semiconductor industries.1−3 The batch process has some advantages over the continuous process, such as high flexibility and quick response to the changing market needs. With increasing global competition, the batch process is playing a more and more important role in industrial production. Process safety and consistent product quality are always two important issues of batch processes that have caught much attention. This makes process monitoring a highly necessary part of process operation. The aim of process monitoring is to achieve an early warning of abnormal operating conditions that may result in production interruption, low-quality products, and even equipment damage. Such an early warning provides a chance to take corrective actions to recover normal production or at least to avoid the waste of more raw materials. However, achieving reliable and robust monitoring for batch processes is a challenging problem, which suffers from those intractable process features, such as multiphasing, heavy coupling, nonlinearity, time variance, and so on.4 As contemporary batch processes become more and more flexible and complicated, monitoring methods based on mathematical models or process knowledge are not applicable in many cases. On the other hand, with the fast development of process automation techniques, a large amount of process data is recorded. These data contain much useful information about process operating status and product quality for modeling, monitoring, and control. This motivates the development of data-driven statistical process control concepts and methods. Since Nomikos and MacGregor1 first proposed the MPCAbased monitoring method, multivariate statistical process control (MSPC) has become a hot topic in recent years. Different from continuous processes, historical data collected from a batch process usually have the form of a three-way array. This three-way data array is mainly analyzed with bilinear and trilinear methods. Up to now, bilinear methods are adopted © 2014 American Chemical Society

more often than trilinear methods. Bilinear methods need to unfold the three-way data array into a matrix before building a monitoring model. Typical bilinear methods include multiway principal component analysis (MPCA),1 multiway partial leastsquares (MPLS),2 multiway independent component analysis (MICA),5 multiway locality preserving projections (MLPP),6 and so on. Trilinear methods model the three-way data array directly with tensor decomposition, such as trilinear decomposition (TLD),7 parallel factor analysis (PARAFAC),8,9 and Tucker3 decomposition.9 Since batch data set are inherently three-dimensional, data unfolding may destroy the intrinsic data structure, potentially leading to information loss. Moreover, data unfolding is easy to induce the “curse of dimensionality” problem for a larger scale data set, which means a lot of model parameters need to be identified,3,9 hindering the establishment of a monitoring model. These drawbacks reduce the monitoring performance of bilinear methods. Compared with bilinear methods, trilinear methods are expected to be more reliable and stable,9 because they avoid the data unfolding and in the mean time compress data in two or three directions, which is benefit to building a more concise and precise monitoring model. Most batch process monitoring methods are based on the assumption that all batches have the same duration. Apparently, this assumption is too idealistic, since the duration of a practical batch process is difficult to fix due to unavoidable disturbances and changes in operating conditions. In such a situation, the batch trajectory synchronization is often adopted.10 The simplest trajectory synchronization method is to cut all batches to the minimum length.11 The other method is to treat the absent parts of shorter batch trajectories as missing data and then try to derive a model with data from long batches.12 A more reasonable method is to find a surrogate variable that has the same starting and ending value for all batches to replace the Received: Revised: Accepted: Published: 15101

April 11, 2014 September 1, 2014 September 4, 2014 September 4, 2014 dx.doi.org/10.1021/ie5015102 | Ind. Eng. Chem. Res. 2014, 53, 15101−15110

Industrial & Engineering Chemistry Research

Article

time dimension.2 In addition, dynamic time warping (DTW) and correlation optimization warping (COW) are also effective methods for trajectory synchronization.13,14 However, most trajectory synchronization methods alter the original data record to make it fit the model, which may distort data and fault patterns and thus reduce the fault detection ability.10 To overcome drawbacks of trajectory synchronization, other methods have been developed to handle the uneven-length problem without aligning batch trajectories. Lu et al. proposed a stage-based sub-PCA method for modeling uneven-length batches.15 Zhao et al. classified irregular batches into different uneven-length groups first and then modeled group-common and specific information separately to cope with the varyinglength problem.16 Zhao et al. presented a two-level phase division method to trace inner-phase evolutions in multiphase processes and handled the uneven-length problem using innerphase analysis.17 However, in these references, the variable-wise data unfolding is required for building monitoring models, which alters the original three-way structure of batch data and may lower the monitoring efficiency. In particular, Wise et al. applied the PARAFAC2 model to monitor an uneven-length batch process.18 The PARAFAC2 shows a distinct advantage that it can directly model the original uneven-length batch data without using data unfolding or trajectory synchronization.18 However, some constraints of the PARAFAC2 model are too strict, which reduces modeling efficiency and precision and even causes the PARAFAC2 model to not exist in some special cases. In this work, a new trilinear modeling method, termed as GTucker2, is proposed for batch process monitoring. A model fitting algorithm is also developed for GTucker2. GTucker2 can be used to model both even-length and uneven-length threeway data arrays. Similar to other trilinear methods, GTucker2 is able to directly model the three-way data array without using data unfolding. GTucker2 compresses the three-way data array in two directions by using two factor matrices. One factor matrix is fixed, but the other one is allowed to vary with data lengths, which solves the uneven-length problem in a “natural” way with no need for batch trajectory synchronization. GTucker2 is less restrictive than PARAFAC2, and thus it is easier to get higher modeling efficiency and precision. An online batch process monitoring method is then developed based on GTucker2. Q, R2, and T2 statistics are constructed for fault detection and diagnosis. The performance of the GTucker2-based monitoring method is tested in a fed-batch penicillin fermentation process. The results indicate that it outperforms monitoring methods based on MPCA, Tucker3, PARAFAC, and PARAFAC2.

between columns of A and B. M and N are numbers of components (i.e., columns) in factor matrices A and B, respectively. If M or N is smaller than J or K, the core slice Gi can be thought of as a compressed version of Xi. A schematic diagram of the Tucker2 model is illustrated in Figure 1. The

Figure 1. Schematic diagram of the Tucker2 model.

Tucker2 model is not unique.20 To improve its uniqueness properties, additional constraints should be imposed such that factor matrices A and B are column-wise orthonormal, i.e., ATA = IM and BTB = IN. Furthermore, PARAFAC22 could be viewed as a constrained version of Tucker2 when all core slice Gi are diagonal matrices.21 The Tucker2 model can be identified by fitting to the threeway data array X in the least-squares sense, which amounts to minimizing I

σ1(A , G1 , ···, GI , B) =

(2)

i=1 T

over all of its arguments, subjected to constraints that A A = IM and BTB = IN. Various algorithms for computing a Tucker2 model have been proposed, such as higher-order SVD (HOSVD),23 alternating least algorithm (TUCKALS2),24 higher-order orthogonal iteration (HOOI),25 and Newton− Grassmann approach.26 More detailed introductions of Tucker2 can be found in reference papers.19−21 2.2. Generalized Tucker2 (GTucker2) Model. In the Tucker2 model, all slices Xi of the three-way data array must have the same size J × K. However, in practical applications, such as batch processes, slices Xi may have varying sizes. In such situations, the Tucker2 model is not applicable. Therefore, in this paper, a generalized Tucker2 (GTucker2) model is developed to handle the case that the size of Xi is varying. 2.2.1. Model Formulation. For a set of matrices {Xi |Xi ∈ 9 J × K i , i = 1, ..., I}, where Ki is allowed to vary with index i, the GTucker2 model is expressed as Xi = AGiBiT + Ei

(3)

which is almost identical to the Tucker2 model, except that Bi ∈ 9 K i × N is not fixed for all the slices Xi but varies with index i. The schematic diagram of the GTucker2 model for a set of matrices with varying sizes is illustrated in Figure 2. Similar to Tucker2, GTucker2 is not unique without additional constraints. For example, if U ∈ 9 M × M is a nonsingular matrix, then

2. TUCKER2 AND GTUCKER2 MODELS 2.1. Tucker2 Model. The Tucker2 model was originally proposed by Tucker.19 Given a three-way array {xijk | i = 1, ..., I ; j = 1, ..., J ; k = 1, ..., K}, denoted by X ∈ 9 I × J × K or X(I × J × K), let the matrix Xi ∈ 9 J × K be the ith slice of X, consisting of elements xi::.20 Then, the Tucker2 model of X is21 Xi = AGiBT + Ei

2

∑ || Xi − AGiBT ||

T + T AGiBiT = AU −1VV i i UGiBi = TVS i i

(4)

(1)

where A ∈ 9 J × M and B ∈ 9 K × N are factor matrices, Ei ∈ 9 J × K is the residual matrix, and Gi ∈ 9 M × N is a slice of the extended core tensor G ∈ 9 I × M × N or, in brief, a core slice. The elements of core slices express the relative strength of links

Figure 2. Schematic diagram of the GTucker2 model. 15102

dx.doi.org/10.1021/ie5015102 | Ind. Eng. Chem. Res. 2014, 53, 15101−15110

Industrial & Engineering Chemistry Research

Article

and minimizing σ̂3 to update A, G1...GI, and F. The running of this algorithm needs initial values for matrices A, G1...GI, and F. These matrices can be initialized randomly or with particular values that have a relatively high chance of leading to the global minimum of the function. In this paper, A is initialized as the left singular vector of ΣIi=1XiXTi using SVD, G1...GI are initialized randomly, and F is initialized as the identity matrix. The complete fitting algorithm for GTucker2 is summarized as follows: Step 1: Initialize A, G1...GI, and F. Step 2a: Compute the SVD of XTi AGiFT = UiΣiVTi and update Pi as UiVTi for i = 1, ..., I. Step 2b: Update A, G1...GI, and F by applying a fitting algorithm of Tucker2 to the three-way array X̅ ∈ 9 I × J × N with slices X̅ i = XiPi, i = 1...I. new ≤ Step 2c: Evaluate the value of σ3 using eq 6. If σold 3 − σ3 for a small positive ε or maximum iterations exhausted, go εσold 3 to step 3, else repeat step 2. Step 3: Output the latest matrices A, F and Pi, Gi for i = 1...I. 2.2.3. Analysis of GTucker2. To give a better understanding of GTucker2, the relationship between GTucker2 and other classical trilinear models, such as Tucker2 and PARAFAC2, is revealed, and several characteristics of GTucker2 are highlighted. (1) Relationship with Tucker2: GTucker2 can be treated as an extended version of Tucker2. GTucker2 is less restrictive than Tucker2. Both two factor matrices (A and B) are fixed in Tucker2, while GTucker2 only restricts one factor matrix (A) to be fixed but allows the other factor matrix (Bi) to vary with data sizes. A main advantage of GTucker2 is that it not only can model an even-length three-way data array with fewer constraints than Tucker2 but also can be applied to model an uneven-length three-way data array with varying sizes in one direction. GTucker2 is reduced to Tucker2 if restricting all slices Xi to have the same size and Bi to be fixed over index i. (2) Relationship with PARAFAC2: PARAFAC2 has been developed to model an uneven-length three-way data array.28 In PARAFAC2, the core slice Gi should be a diagonal matrix. However, GTucker2 removes this constraint, and the core slice Gi can be any ordinary matrix of size M × N. Therefore, PARAFAC2 can be viewed as a constrained version of GTucker2. Generally, the GTucker2 model fits data better than the PARAFAC2 model, except that the perfect PARAFAC2 fitting exists. In fact, the PARAFAC2 model is a low-rank approximation as PARAFAC, while the GTucker2 model is a subspace approximation as Tucker3 that retains the maximum amount of variation.29 GTucker2 is thus more practical than PARAFAC2, because the low-rank approximation might not even exist in some special cases. However, a very important property of PARAFAC2 is that it is unique under certain mild assumptions,27 which is a very powerful advantage for the model interpretation.

is an equally valid model, where T = AU−1, Vi ∈ 9 M × N , Si = N ×M + is the pseudoinverse matrix of Vi. BiGTi UTV+T i , and V i ∈ 9 Hence, to improve the uniqueness property, two constraints are imposed on GTucker2 as follows: (1) the factor matrix A is orthonormal, i.e., ATA = IM, and (2) the cross product BTi Bi is constant over index i, i.e., BTi Bi = BTj Bj = W for all i, j = 1, ..., I. 2.2.2. Model Fitting Algorithm. The GTucker2 model can be identified by minimizing I

σ2(A , G1 , ···, GI , B1 , ···, BI ) =

2

∑ || Xi − AGiBiT ||

(5)

i=1 T

BTj Bj

over all of its arguments, subjected to A A = IM and =W for i = 1, ..., I. It is easy to prove that the cross-product constraint (i.e., BTi Bi = BTj Bj for all i, j = 1, ..., I) can be replaced by the constraint that Bi = PiF for some column-wise orthonormal matrices Pi ∈ 9 K i × N (i.e., PTi Pi = IN) and a fixed nonsingular matrix F ∈ 9 N × N .27 Substituting Bi = PiF for Bi in eq 5 yields I

σ3(A , G1 , ..., GI , P1 , ..., PI , F ) =

2

∑ || Xi − AGiFTPiT || i=1

(6)

According to tr(AB) = tr(BA), ∥A∥ = tr(A A) = tr(AA ), and PTi Pi = IN, we have T

2

2

|| Xi − AGiFT PiT ||

T

= tr[(Xi − AGiFT PiT )T (Xi − AGiFT PiT )] = tr(XiT Xi) − 2tr(XiT AGiFT PiT ) + tr[(AGiFT PiT )T (AGiFT PiT )] = || Xi ||2 − 2tr(PiTXiT AGiFT ) T T + tr(AGiFT PiTPFG i i A ) = || Xi ||2 − 2tr(PiTXiT AGiFT ) 2 + || AGiFT ||

(7)

where tr(·) denotes the trace of the matrix. Thus, minimizing σ3 over Pi is equivalent to maximizing I

σ4(P1 , ..., PI ) =

∑ tr(PiTXiT AGiFT )

(8)

i=1

XTi AGiFT

Performing singular value decomposition (SVD) on leads to XiTAGiFT = UiΣiViT, and then the column-wise orthonormal matrix Pi that maximizes σ4 is given by T Pi = UV i i

(9)

After all the Pi are obtained, the problem of minimizing σ3 over A, G1...GI, and F changes to minimize I

σ̂3(A , G1 , ..., GI , F |P1 , ···, PI ) =

2

∑ || XiPi − AGiFT ||

3. BATCH PROCESS MONITORING WITH GTUCKER2 3.1. Development of GTucker2 Model with Batch Data. Let X(I × J × Ki) denote the historical data set (i.e., training data set or reference batches) of a batch process collected under normal operating conditions, where I is the number of batches, J is the number of variables, and Ki is the number of samples in the ith batch. The raw historical data unavoidably contain noises, missing data, outliers, and so on, and thus data pretreatment is necessary for better modeling and

+c

i=1

(10)

where c is a constant with respect to A, G1...GI, and F. Obviously, minimizing σ̂3 is equivalent to minimizing σ1 in eq 2 with Xi replaced by XiPi. Therefore, any fitting algorithm for Tucker2 can be applied to minimize σ̂3 over A, G1...GI, and F. Finally, a complete alternating least-squares algorithm can be established for GTucker2 by alternately maximizing σ4 over Pi 15103

dx.doi.org/10.1021/ie5015102 | Ind. Eng. Chem. Res. 2014, 53, 15101−15110

Industrial & Engineering Chemistry Research

Article

Figure 3. Normalization procedure for batch data set. (a) Batch-wise normalization, (b) variable-wise normalization.

normalization procedure consists of three steps: (1) unfolding X(I × J × Ki) to a matrix X(I × JKi) or X(J × IKi) using batchwise unfolding or variable-wise unfolding, (2) normalizing X(I × JKi) or X(J × IKi) to have zero mean and unit variance, (3) refolding the normalized matrix X(I × JKi) or X(J × IKi) back to a three-way data array X(I × J × Ki). The schematic diagrams of batch-wise normalization and variable-wise normalization are illustrated in Figure 3. Performing the GTucker2 algorithm on the normalized three-way array X(I × J × Ki) yields the GTucker2 model

monitoring. Data normalization is a critical step in data pretreatment. Its purpose is to eliminate the influence of scales of different process variables and highlight the difference between variables or batches. Because X(I × J × Ki) is a three-way data array, it should be unfolded to a matrix for normalizing. Batch-wise unfolding and variable-wise unfolding are the two most widely used unfolding methods, which unfold X(I × J × Ki) to X (I × JKi) and X(J × IKi), respectively.10,30 The normalization procedures corresponding to them are termed batch-wise normalization and variable-wise normalization. Batch-wise unfolding is helpful to highlight variations between batches, but it needs estimations of future measurement of variables for online applications and cannot capture different characteristics between operating phases.10 Moreover, batch-wise normalization cannot be used for uneven-length batch data sets. Variable-wise unfolding can avoid the estimation of future measurement in online applications30 and has the advantage to spontaneously solve the uneven batch duration problem without using trajectory synchronization.10 However, variable-wise unfolding fails to emphasize systematic variations around normal trajectories and thus reduces the monitoring efficiency. The unfolded data matrix should be refolded back to the original form of a three-way data array after normalizing for building the GTucker2 model. Thus, the complete data

Xi = AGiBiT + Ei = AGiFT PiT + Ei

(11)

where Xi (J × Ki) is the ith slice of X(I × J × Ki), representing data of the ith batch, and matrices A, Gi, Bi, Ei, F, and Pi have the same definitions as those in eqs 1, 3, and 6. For a new batch X0(J × K0), it is first normalized to X0(J × K0) using the data normalization information from the training data set. Then, an existing GTucker2 model can be fit to X0(J × K0) by minimizing 2

σ0(G0 , P0|A , F ) = || X 0 − AG0FT P0T || 15104

(12)

dx.doi.org/10.1021/ie5015102 | Ind. Eng. Chem. Res. 2014, 53, 15101−15110

Industrial & Engineering Chemistry Research

Article

subjected to PT0 P0 = IN and the additional constraint that F and A, determined from the training data set, are fixed. The fitting algorithm described in section 2.2.2 can be used to solve this problem; however, the step that updates F and A is omitted. The resulting GTucker2 model for X0(J × K0) is X0 =

AG0FT P0T

+ E0

using kernel density estimation (KDE). Regarding the R2 statistics of reference batches corresponding to 0 < αi ≤ C in eq 16 as an univariate sample{R12, R22,...,Rn2} ∈ Rn, a general estimator of its distribution density is31,32 f ̂ (R2 , θ ) =

(13)

3.2. Fault Detection and Diagnosis. Equation 11 indicates that the GTucker2 model explicitly divides the original data space into a core subspace (spanned by core slices Gi) plus a residual subspace (spanned by residual matrices Ei). Thus, based on the GTucker2 model, the Q statistic and R2 statistic are developed for fault detection. The Q statistic measures the deviation in the residual subspace, and the R2 statistic measures the deviation in the core subspace. The Q statistic for batch Xs (J × Ks) is defined as the normalized sum of squared residuals Q s = || Es ||2 /Ks =

where Es is the residual matrix, βs = 1/Ks is the normalization coefficient, and Ks is the number of time samples. According to the definition of the R2 statistic in ref 31, the R2 statistic for batch Xs (J × Ks) is I i=1

∑ αiαj tr(Gi̅ G̅jT )

T02 = C0S −1C0T

i,j=1

I

αi

i=1

with C0 = diag(G0). The control limit of the T statistic is Tα2 ∼

I

R s2 = tr(Gs̅ Gs̅ T ) − 2 ∑ αi tr(Gs̅ Gi̅ T ) +

(16)

i

where C is an empirical parameter. To determine whether the process is operated under normal conditions or not, proper control limits must be defined for the Q and R2 statistic. Thus, for a new batch, faults will be detected if the Q or R2 statistic exceeds the control limit. Assuming that all residuals Ei of reference batches are independent and normally distributed, the Q statistic follows a χ2-distribution, and its control limit is Qα =

(22)

Equation 15 can be rewritten as

∑ αi = 1

2m2 v g= ,h= 2m v

(21)

csQ, jk = βs e 2s , jk

I

gχh,2 α ,

N (I 2 − 1) Fα(N , I − N ) I (I − N )

where N is the size of diagonal matrix Gi, I is the number of reference batches, and Fα(N, I − N) denotes an F distribution with N and I − N degrees of freedom at significance level α. If a fault has been detected, the contribution plot33 is used for fault diagnosis. It shows the contribution of each variable to monitoring statistics. The largest contributor is treated as the fault variable. According to the definition of the Q statistic in eq 14, the contribution of variable j to the Q statistic at sample time k is

∑ ∑ αiαj tr(Gi̅ G̅jT )

i=1

(20) 2

i=1 j=1

s.t. 0 ≤ αi ≤ C ,

(19)

where S is the covariance matrix of C, i.e., S = C C/(I − 1). For a new batch X0(J × K0), its T2 statistic can be computed by

where G̅ s = λsGs is the normalized core slice of Xs (J × Ks), G̅ i = λiGi denotes the normalized core slices of reference batches Xi (J × Ki), λs and λi are normalization coefficients, and αi are Lagrange multipliers. For an uneven-length batch process, λi and λs are set to be Ki/Kt and Ks/Kt, respectively, where Kt = ∑Ii=1Ki is the total number of time samples of all reference batches. For an even-length batch process, λi and λs can also be set as 1, except for Ki/Kt and Ks/Kt. All the Lagrange multipliers αi are obtained with a support tensor domain description (STDD) method by solving the following optimization problem:31 I

(18)

T

(15)

min ∑ αi tr(Gi̅ Gi̅ T ) −

i=1

T 2 = CiS −1CiT

I

R s2 = tr(Gs̅ Gs̅ T ) − 2 ∑ αi tr(Gs̅ Gi̅ T ) +

⎛ R2 − R 2 ⎞ i ⎟ θ ⎝ ⎠

where R denotes a data point under consideration, θ is the smoothing parameter, and K(·) is a kernel function. After ̂ 2, θ) with KDE, the control limit Rα2 at obtaining f(R significance level α for the R2 statistic is calculated by the ̂ 2, θ). inverse cumulative distribution function of f(R In particular, as discussed in section 2.2.3, GTucker2 will be reduced to PARAFAC2 if all core slices Gi are restricted to be diagonal matrices of size N × N. In this case, the T2 statistic is adopted to replace the R2 statistic, because it is simpler to compute. Once a PARAFAC2 model is identified from the training data set, all core slices Gi(N × N) are collected in a single matrix C(I × N), such that the ith row (Ci) of C is equal to diagonal elements of Gi, i.e., Ci = diag(Gi). Then, the T2 statistic for the ith reference batch is defined as

(14)

j,k

n

∑ K⎜

2

∑ e2s ,jk /Ks = βs ∑ e2s ,jk j,k

1 nθ

∑ αiαj tr(Gi̅ G̅jT ) i,j

−T T −T T Gs̅ ) − 2 ∑ αi tr(λsAT XsPF Gi̅ ) = tr(λsAT XsPF s s i

+

∑ αiαj tr(Gi̅ G̅jT ) i,j

=

−T T ∑ xs ,jkλs tr(aTj bkPF Gs̅ ) − 2 ∑ xs , jkλsαi s j,k −T T tr(aTj bk PF Gi̅ ) s

i ,j,k

+

∑ αiαj tr(Gi̅ G̅jT ) i,j

(17)

(23)

where m and v are mean and variance of Q statistics of all reference batches, and χ2h,α is the critical value of the χ-squared variable with h degrees of freedom at significance level α. Similar to ref 31, the control limit of the R2 statistic is calculated

where F−T denotes the inverse matrix of FT, aj is the jth row of factor matrix A, and bk is a row vector with the kth element of 1 and others of zero. Then, the contribution of variable j to the R2 statistic at sample time k is defined as 15105

dx.doi.org/10.1021/ie5015102 | Ind. Eng. Chem. Res. 2014, 53, 15101−15110

Industrial & Engineering Chemistry Research

Article

Step 2: Determine the size d of the moving data window. Step 3 (for an even-length batch process): For all sample times k ≥ d, normalize the window data of all reference batches using batch-wise normalization. Step 3′ (for an uneven-length batch process): Find the reference batch with the smallest length, and record its total sample time as kc. Normalize window data of all reference batches using batch-wise normalization for sample times d ≤ k ≤ kc, while applying variable-wise normalization for sample times k > kc. Step 4: Perform the GTucker2 algorithm, and build WMGTucker2 models for all sample times k ≥ d. Step 5: Calculate control limits of Q and R2 (or T2) statistics for all sample times k ≥ d. Phase II: Online monitoring Step 1: For a new batch, record its window data at the sample time k ≥ d as

−T T csR, jk = xs , jkλs tr(aTj bk PF Gs̅ ) s −T T Gi̅ ) − 2 ∑ xs , jkλsαitr(aTj bk PF s

(24)

i

Equation 20 can be reduced to T02 = C0S −1C0T = C0S −1 diag(AT X 0P0F −T ) =

∑ x0,jkC0S−1 diag(aTj bkP0F −T ) (25)

j,k 2

The contribution of variable j to the T statistic at sample time k is defined as c0,Tjk = x0, jkC0S −1 diag(aTj bk P0F −T )

(26)

3.3. Online Monitoring. The moving data window technique is combined with GTucker2 for online monitoring to track dynamic changes of the batch process and avoid predicting future measurement data. To some extent, this technique is also able to handle the multiphase characteristic in some batch processes. However, for monitoring multiphase batch processes, a better choice is using phase-division methods and phase-separated modeling techniques.10,34 Assuming the size of moving data window is d, the window data of ith reference batch until sample time k is ⎡xi xi ⎢ 1, k − d + 1 2, k − d + 1 ⎢ i x1, k − d + 2 x 2,i k − d + 2 X (J × d | k ) = ⎢ ⎢ ⋮ −i ⋮ ⎢ ⎢⎣ x1,i k x 2,i k

⎡ x1,new x new ⎢ k − d + 1 2, k − d + 1 new ⎢ x new 1, k − d + 2 x 2, k − d + 2 X (J × d | k ) = ⎢ − new ⎢ ⋮ ⋮ ⎢ new new x 2, k ⎢⎣ x1, k

⎤ ··· xJnew ,k−d+1 ⎥ ⎥ ··· xJnew ,k−d+2 ⎥ ⋱ ⋮ ⎥ ⎥ ··· xJnew ⎥⎦ ,k

Normalize Xnew(J × d|k) with the mean and the variance obtained from phase I. Step 2: Fit the existing WMGTucker2 model to the new batch data. Step 3: Compute Q and R2 (or T2) statistics of the new batch using eq 14 and eq 15 (or eq 20). Step 4: Check whether the Q or R2 (or T2) statistic exceeds its control limit or not. If yes, go to step 6; else go to step 5. Step 5: Return to step 1 and monitor the next sample time k + 1. Step 6: Diagnose which process variable causes the fault using contribution plots.

··· xJi , k − d + 1 ⎤ ⎥ ⎥ i ··· xJ , k − d + 2 ⎥ ⋱ ⋮ ⎥ ⎥ ··· xJi , k ⎥⎦

After normalizing Xi(J × d|k) to Xi(J × d|k), a window mode GTucker2 model (WMGTucker2) can be built from Xi(J × d| k), i = 1...I. For an even-length batch process, window data of all batches have the same size at any sample time k, and thus the batch-wise normalization can be used. However, for an uneven-length batch process, window data of all batches will have different sizes after a certain sample time kc. In this case, the batch-wise normalization is used before kc, while the variable-wise normalization is adopted after kc. In online monitoring, as the window slides along the sample time by adding the newest sample data and excluding the oldest ones, the WMGTucker2 model will be updated periodically. Window size is the key parameter for the moving data window technique, because it has a great effect on the fault sensitivity and the accuracy of the WMGTucker2 model. This parameter is mainly selected based on experience or trials due to lacks of quantitative calculation methods.35 In this paper, the window size is chosen according to the accuracy and rapidity of the WMGTucker2 model to detect known faults in a test data set. Two quantitative indexes, i.e., fault detection rate (FDR) and false alarm rate (FAR),31 are used to evaluate the performance of fault detection. 3.4. Complete Monitoring Procedure. A complete GTucker2-based monitoring procedure for a batch process consists of two phases: Phase I: Build offline WMGTucker2 model Step 1: Select reference batches, and construct the training data set.

4. CASE STUDY Birol et al.36 have developed a simulator (PenSim v2.0) for a benchmark fed-batch penicillin fermentation process. This simulator has been widely used to test various batch process monitoring methods.3,37 In this study, PenSim v2.0 is adopted to generate simulation data. Two types of training data sets were generated by running the simulator repeatedly under normal operation conditions with small variations. One type is the even-length training data set, and the other type is the uneven-length training data set. Each type of training data set contains five training subdata sets that consist of 8, 10, 30, 50, and 80 reference batches, respectively. For even-length training data sets, the duration of each batch is 200 h. For unevenlength training data sets, durations of all batches are random between 150 and 200 h. The information on all training data sets is summarized in Table 1. In all cases, the sampling interval is set to be 1 h, and 17 process variables, as listed in Table 2, are used for monitoring. Moreover, as depicted in Table 3, a total of 10 fault batches were generated considering different batch durations, fault variables, magnitudes, and directions. Fault numbers 1−4 and numbers 6−9 are used as test data to choose window sizes for two different types of training data sets, respectively. Fault numbers 5 and 10 are applied to evaluate the performance of the GTucker2-based monitoring method. 15106

dx.doi.org/10.1021/ie5015102 | Ind. Eng. Chem. Res. 2014, 53, 15101−15110

Industrial & Engineering Chemistry Research

Article

4.1. Case I: Even-Length Batch Process. The even-length training data set was used in this case study. To select an appropriate window size, five different window sizes (2, 3, 5, 10, and 20) were tested for fault detection. FDR and FAR of Q and R2 statistics corresponding to different window sizes for four faults (fault numbers 1−4 in Table 3) are summarized in Tables 4 and 5. Clearly, WMGTucker2 shows perfect monitoring

Table 1. Summary of All Training Data Sets data set no. 1 2 3 4 5 6 7 8 9 10

type

total number of reference batches

durations of each batch

8 10 30 50 80 8

200 h 200 h 200 h 200 h 200 h 150−200 h

10

150−200 h

30

150−200 h

even-length even-length even-length even-length even-length unevenlength unevenlength unevenlength unevenlength unevenlength

50

150−200 h

80

150−200 h

Table 4. FDR (%) and FAR (%) of Q Statistic Corresponding to Different Window Sizes for Four Faults fault no. 1

Table 2. Monitoring Variables in the Fed-Batch Penicillin Fermentation Process number

process variables

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

aeration rate (L/h) agitator power input (W) substrate feeding rate (L/h) substrate feeding temperature (K) substrate concentration (g/L) DO saturation (%) biomass concentration (g/L) penicillin concentration (g/L) culture volume (L) CO2 concentration (mmol/L) pH bioreactor temperature (K) generated heat (kcal/h) acid feeding rate (mL/h) base feeding rate (mL/h) cold water flow rate (L/h) hot water flow rate (L/h)

1 2 3 4 5 6 7 8 9 10

fault type 10% step increase in aeration rate 10% step increase in agitator power input 10% step increase in substrate feeding rate 5 W ramp decrease in agitator power input 10% step decrease in substrate feeding rate 10% step increase in aeration rate 10% step increase in agitator power input 10% step increase in substrate feeding rate 5 W ramp decrease in agitator power input 10% step decrease in substrate feeding rate

occurrence time (h)

200

100

200

100

200

100

200

100

200

FDR

FAR

FDR

FAR

FDR

FAR

2 3 5 10 20

99 100 100 100 100

0 0 0 0 0

99 100 100 100 100

0 0 0 0 0

100 100 100 100 100

0 0 0 0 0

99 100 100 100 100

0 0 0 0 0

170

100

160

100

174

100

198

100

random 150−200

fault no. 2

fault no. 3

fault no. 4

window size

FDR

FAR

FDR

FAR

FDR

FAR

FDR

FAR

2 3 5 10 20

99 99 100 100 100

0 0 0 0 0

98 100 100 100 100

0 0 0 0 0

93 99 100 100 100

0 0 0 0 0

97 100 100 100 100

0 0 0 0 0

performance when the window size exceeds 5. Considering that a larger window size results in heavier computation burden, the window size is set as 5. Fault number 5 in Table 3 was used for testing. The window mode MPCA (WMMPCA), PARAFAC2 (WMPARAFAC2), PARAFAC (WMPARAFAC), and Tucker3 (WMTucker3) were used for comparison. The same window size was applied for these models. For WMMPCA, the number of retained principal components was chosen by the cumulative percent variance (CPV) method.38 The Q and T2 statistics were used with WMPARAFAC2. Monitoring methods based on PARAFAC and Tucker3 have been proposed by Louwerse and Smilde.9 In this paper, Q and T2 statistics reported by Louwerse and Smilde9 were used with WMPARAFAC and WMTucker3. Parameter settings for different trilinear models are listed in Table 6. Monitoring performances of different methods were tested with five even-length training data sets (numbers 1−5 in Table 6. Parameter Settings for Different Trilinear Modelsa data set no. 1

100

fault no. 4

FAR

fault no. 1

termination time (h)

100

fault no. 3

FDR

Table 5. FDR (%) and FAR (%) of R2 Statistic Corresponding to Different Window Sizes for Four Faults

Table 3. Summary of 10 Fault Batches in the Fed-Batch Penicillin Fermentation Process fault no.

fault no. 2

window size

2 3 4 5

GTucker2 M = 3, N=3 M = 3, N=3 M = 3, N=3 M = 3, N=3 M = 3, N=3

PARAFAC2 M=N=3 M=N=3 M=N=3 M=N=3 M=N=3

Tucker3 M = 3, N = 3, H=4 M = 3, N = 3, H=4 M = 10, N = 4, H=6 M = 10, N = 4, H=6 M = 10, N = 4, H=6

PARAFAC M=N= H=3 M=N= H=3 M=N= H = 12 M=N= H= 14 M=N= H = 14

a

Note: M, N, and H are the retained number of components for three different modes. 15107

dx.doi.org/10.1021/ie5015102 | Ind. Eng. Chem. Res. 2014, 53, 15101−15110

Industrial & Engineering Chemistry Research

Article

Table 7. FDR (%) and FAR (%) of Q Statistic of Five Methods for Different Training Data Sets GTucker2

MPCA

PARAFAC2

Tucker3

PARAFAC

data set no.

FDR

FAR

FDR

FAR

FDR

FAR

FDR

FAR

FDR

FAR

1 2 3 4 5

100 100 100 100 100

0 0 0 0 0

100 100 100 100 100

0 0 0 0 0

100 100 100 100 100

0 0 0 0 0

100 100 100 100 100

0 0 0 0 0

100 100 100 100 100

0 0 0 0 0

Table 8. FDR (%) and FAR (%) of R2 or T2 Statistic of Five Methods for Different Training Data Sets GTucker2

MPCA

PARAFAC2

Tucker3

PARAFAC

data set no.

FDR

FAR

FDR

FAR

FDR

FAR

FDR

FAR

FDR

FAR

1 2 3 4 5

98 99 100 100 100

0 0 0 0 0

75 75 100 100 100

0 0 0 0 0

80 84 100 100 100

1 0 0 0 0

88 88 100 100 100

0 0 0 0 0

75 75 100 100 100

0 0 6 10 12

Table 1). Control limits of all monitoring statistics were set at a 99% confidence limit, corresponding to a significance level of 0.01. Fault detection results are listed in Tables 7 and 8. Monitoring charts of WMGTucker2 for data set number 1 and fault number 5 are shown in Figure 4. Clearly, for training data sets with a larger total number of reference batches, all other monitoring statistics perfectly detect the fault with no FAR, except the T2 statistic of PARAFAC. However, as the total number of reference batches in training

data set decreases to 10 and 8, monitoring performances of other four methods obviously become worse, except for GTucker2. The performances of MPCA and PARAFAC degrade most significantly, and then PARAFAC2 and Tucker3. FDRs of T2 statistics of MPCA and PARAFAC decrease to 75%. FDRs of T2 statistic of PARAFAC2 reduce to around 80%, and FDRs of the Tucker3-based R2 statistic decrease to 88%. The performance of GTucker2 hardly degrades, since both its Q and R2 statistics still detect the fault perfectly. In a word, GTucker2 shows the best monitoring performance, because both its Q and R2 statistics have the highest FDRs. 4.2. Case II: Uneven-Length Batch Process. In this case study, the uneven-length training data set is considered. Using the same method as in case I, the window size was chosen as 5 for this case. Fault number 10 in Table 3 was used for testing. WMGTucker2 and WMPARAFAC2 were used for comparison. Their monitoring performances are tested with five unevenlength training data sets (numbers 6−10 in Table 1). Parameter settings in Table 6 are still used in this case study. Fault detection results are listed in Table 9. Monitoring charts of Table 9. Fault Detection Results of GTucker2 and PARAFAC2 for Different Training Data Sets GTucker2 Q

PARAFAC2 R

2

T2

Q

data set no.

FDR

FAR

FDR

FAR

FDR

FAR

FDR

FAR

6 7 8 9 10

100 100 100 100 100

0 0 0 0 0

97 100 100 100 100

0 0 0 0 0

100 100 100 100 100

0 0 0 0 0

76 81 100 100 100

1 0 0 0 0

WMGTucker2 are shown in Figure 5. It is indicated that PARAFAC2 almost shows the same monitoring performance as GTucker2 when the training data set contains a larger total number of reference batches. However, for the training data set with a small total number of reference batches, the monitoring performance of GTucker2 is significantly better than that of PARAFAC2, because the GTucker2-based R2 statistic has larger FDRs than the PARAFAC2-based T2 statistic. Moreover, as mentioned in section 2.2.3, it is found that GTucker2 has much

Figure 4. Monitoring charts of WMGTucker2 for data set number 1 and fault number 5. (a) Q statistic, (b) R2 statistic. 15108

dx.doi.org/10.1021/ie5015102 | Ind. Eng. Chem. Res. 2014, 53, 15101−15110

Industrial & Engineering Chemistry Research

Article

Figure 6. Contribution plots of GTucker2-based Q and R2 statistics at 100 h for fault number 10 and data set number 9. (a) Q statistic, (b) R2 statistic.

Figure 5. Monitoring charts of WMGTucker2 for data set number 6 and fault number 10. (a) Q statistic, (b) R2 statistic.

higher modeling precisions than PARAFAC2 for all training data sets. These results indicate that GTucker2 outperforms PARAFAC2 in terms of higher modeling precision and better fault detection performance. After the fault was detected, the contribution plot was applied to diagnose the fault variable. For fault number 10 in Table 3 and training data set number 9 in Table 1, contribution plots of GTucker2-based Q and R2 statistics at 100 h are shown in Figure 6. Obviously, variable 3, i.e., the substrate feeding rate, provides the largest contribution to all monitoring statistics. Hence, the substrate feeding rate is identified to be the fault variable. This result is consistent with the preset fault variable (see Table 3), which validates the effectiveness of the GTucker2-based contribution plot for fault diagnosis.

detection and diagnosis. This monitoring method is applicable for both even-length and uneven-length batch processes. The biggest advantage of GTucker2 is its ability to solve the unevenlength problem in a “natural” way without using batch trajectory synchronization. The other advantage is that GTucker2 avoids potential problems of information loss and the “curse of dimensionality” induced by data unfolding, because it directly performs tensor decomposition on the threeway batch data array. The performance of the GTucker2-based monitoring method was tested in a benchmark fed-batch penicillin fermentation process by two case studies. The results indicated that GTucker2 was better than MPCA, Tucker3, and PARAFAC for monitoring an even-length batch process. For both even-length and uneven-length batch processes, GTucker2 outperformed PARAFAC2 in terms of higher modeling precision and fault detection rates.

5. CONCLUSION In this paper, GTucker2 has been proposed as a trilinear regression model for three-way data arrays. A direct fitting algorithm is also developed for GTucker2. Different from Tucker2, GTucker2 can be used not only on even-length data arrays but also on uneven-length data arrays with varying sizes in one direction. PARAFAC2 can be viewed as a constrained version of GTucker2 that further restricts the core slice to be a diagonal matrix. GTucker2 is expected to have a better modeling capability, such as higher modeling efficiency and precision, than PARAFAC2 in most cases. This advantage makes GTucker2 more practical than PARAFAC2. An online batch process monitoring method was then developed by combining GTucker2 with a moving data window technique. Q, R2, and T2 statistics were constructed for fault



AUTHOR INFORMATION

Corresponding Author

*Tel.: +86-0571-88320349. Fax: +86-0571-88320842. E-mail: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work was supported by the National Natural Science Foundation of China (No. 61304116) and the Zhejiang Provincial Natural Science Foundation of China (No. LQ13B060004). 15109

dx.doi.org/10.1021/ie5015102 | Ind. Eng. Chem. Res. 2014, 53, 15101−15110

Industrial & Engineering Chemistry Research



Article

(24) Kroonenberg, P. M.; De Leeuw, J. Principal component analysis of three-mode data by means of alternating least squares algorithms. Psychometrika 1980, 45, 69−97. (25) De Lathauwer, L.; De Moor, B.; Vandewalle, J. On the best rank-1 and rank-(R1,R2,...,RN) approximation of higher-order tensors. SIAM J. Matrix Anal. Appl. 2000, 21, 1324−1342. (26) Elden, L.; Savas, B. A Newton−Grassmann method for computing the best multilinear rank-(r1, r2, r3) approximation of a tensor. SIAM J. Matrix Anal. Appl. 2009, 31, 248−271. (27) Kiers, H. A. L.; Ten Berge, J. M. F.; Bro, R.; PARAFAC2-Part, I. A direct fitting algorithm for the PARAFAC2Model. J. Chemom. 1999, 13, 275−294. (28) Harshman, R. A. PARAFAC2: Mathematical and technical notes. UCLA Working Papers in Phonetics 1972, 22, 30−47. (29) Bro, R.; Smilde, A.; De Jong, S. On the difference between lowrank and subspace approximation: improved model for multilinear PLS regression. Chem. Intell. Lab. Syst. 2001, 58, 3−13. (30) Camacho, J.; Pico, J.; Ferrer, A. Bilinear modeling of batch processes. Part I: theoretical discussion. J. Chemom. 2008, 22, 299− 308. (31) Luo, L. J.; Bao, S. Y.; Gao, Z. L.; Yuan, J. Q. Batch process monitoring with tensor global-local structure analysis. Ind. Eng. Chem. Res. 2013, 52, 18031−18042. (32) Martin, E. B.; Morris, A. J. Non-parametric confidence bounds for process performance monitoring charts. J. Process Control 1996, 6, 349−358. (33) Westerhuis, J. A.; Gurden, S. P.; Smilde, A. K. Generalized contribution plots in multivariate statistical process monitoring. Chem. Intell. Lab. Syst. 2000, 51, 95−114. (34) Zhao, C. H.; Sun, Y. X. Step-wise sequential phase partition (SSPP) algorithm based statistical modeling and online process monitoring. Chemom. Intell. Lab. Syst. 2013, 125, 109−120. (35) Singhal, A.; Seborg, D. E. Pattern matching in multivariate time series databases using a moving-window approach. Ind. Eng. Chem. Res. 2002, 41, 3822−3838. (36) Birol, G.; Undey, C.; Cinar, A. A modular simulation package for fed-batch fermentation: penicillin production. Comput. Chem. Eng. 2002, 26, 1553−1565. (37) Alvarez, C. R.; Brandolin, A.; Sanchez, M. C. Batch process monitoring in the original measurement’s space. J. Process Control 2010, 20, 716−725. (38) Jolliffe, I. T. Principal Component Analysis; Springer: New York, 2002.

REFERENCES

(1) Nomikos, P.; MacGregor, J. F. Monitoring batch processes using multiway principal component analysis. AIChE J. 1994, 40, 1361− 1375. (2) Nomikos, P.; MacGregor, J. F. Multivariate SPC charts for monitoring batch processes. Technometrics 1995, 37, 41−59. (3) Hu, K. L.; Yuan, J. Q. Batch process monitoring with tensor factorization. J. Process Control 2009, 19, 288−296. (4) Luo, L. J. Tensor global-local preserving projections for batch process monitoring. Ind. Eng. Chem. Res. 2014, 53, 10166−10176. (5) Yoo, C. K.; Lee, J. M.; Vanrolleghem, P. A.; Lee, I. B. On-line monitoring of batch processes using multiway independent component analysis. Chemom. Intell. Lab. Syst. 2004, 71, 151−163. (6) Hu, K. L.; Yuan, J. Q. Multivariate statistical process control based on multiway locality preserving projections. J. Process Control 2008, 18, 797−807. (7) Wise, B. M.; Gallagher, N. B.; Butler, S. W.; White, D. D., Jr.; Barna, G. G. A comparison of principal component analysis, multiway principal component analysis, trilinear decomposition and parallel factor analysis for fault detection in a semiconductor etch process. J. Chemom. 1999, 13, 379−396. (8) Meng, X.; Morris, A. J.; Martin, E. B. On-line monitoring of batch processes using a PARAFAC representation. J. Chemom. 2003, 17, 65− 81. (9) Louwerse, D. J.; Smilde, A. K. Multivariate statistical process control of batch processes based on three-way models. Chem. Eng. Sci. 1999, 55, 1225−1235. (10) Yao, Y.; Gao, F. A survey on multistage/multiphase statistical modeling methods for batch processes. Annu. Rev. Control 2009, 33, 172−183. (11) Rothwell, S. G.; Martin, E. B.; Morris, A. J. Comparison of methods for dealing with uneven length batches. In Proceedings of the 7th International Conference on Computer Application in Biotechnology (CAB7); Elsevier Science: New York, 1998; pp 387−392. (12) Kourti, T. Multivariate dynamic data modeling for analysis and statistical process control of batch processes, start-ups and grade transitions. J. Chemom. 2003, 17, 93−109. (13) Kassidas, A.; MacGregor, J. F.; Taylor, P. A. Synchronization of batch trajectories using dynamic time warping. AIChE J. 1998, 44, 864−875. (14) Fransson, M.; Folestad, S. Real-time alignment of batch process data using COW for on-line process monitoring. Chem. Intell. Lab. Syst. 2006, 84, 56−61. (15) Lu, N.; Gao, F.; Yang, Y.; Wang, F. PCA-based modeling and on-line monitoring strategy for uneven-length batch processes. Ind. Eng. Chem. Res. 2004, 43, 3343−3352. (16) Zhao, C. H.; Mo, S. Y.; Gao, F. R.; Lu, N. Y.; Yao, Y. Statistical analysis and online monitoring for handling multiphase batch processes with varying durations. J. Process Control 2011, 21, 817−829. (17) Zhao, L. P.; Zhao, C. H.; Gao, F. R. Inner-phase analysis based statistical modeling and online monitoring for uneven multiphase batch processes. Ind. Eng. Chem. Res. 2013, 52, 4586−4596. (18) Wise, B. M.; Gallagher, N. B.; Martin, E. B. Application of PARAFAC2 to fault detection and diagnosis in semiconductor etch. J. Chemom. 2001, 15, 285−298. (19) Tucker, L. R. Some mathematical notes on three-mode factor analysis. Psychometrika 1966, 31, 279−311. (20) Kolda, T.; Bader, B. Tensor decompositions and applications. SIAM Rev. 2009, 51, 455−500. (21) Kroonenberg, P. M. Applied Multiway Data Analysis; Wiley: New York, 2008. (22) Harshman, R. A. Foundations of the PARAFAC procedure: Models and conditions for an “exploratory” multimode factor analysis. UCLA Working Papers in Phonetics 1970, 16, 1−84. (23) De Lathauwer, L.; De Moor, B.; Vandewalle, J. A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 2000, 21, 1253−1278. 15110

dx.doi.org/10.1021/ie5015102 | Ind. Eng. Chem. Res. 2014, 53, 15101−15110