Hidden Semi-Markov Probability Models for Monitoring Two

Feb 8, 2011 - The repetitive batch operation has a two-dimensional dynamic behavior, including a finite time interval of each batch run in the time do...
0 downloads 0 Views 3MB Size
ARTICLE pubs.acs.org/IECR

Hidden Semi-Markov Probability Models for Monitoring Two-Dimensional Batch Operation Junghui Chen* and Yan-Cheng Jiang R&D Center for Membrane Technology, Department of Chemical Engineering, Chung-Yuan Christian University, Chung-Li, Taiwan, 320, Republic of China ABSTRACT: The repetitive batch operation has a two-dimensional dynamic behavior, including a finite time interval of each batch run in the time domain and infinite repetitions along the batch domain. In this Article, a novel monitoring method that combines dynamic multiway principal component analysis (DMPCA) and hidden segmental semi-Markov models (HSMM) is proposed to resolve the problem caused by the two-dimensional behavior of batch processes. DMPCA utilizes the batch-to-batch dynamic characteristics and eliminates the batch correlation among process variables. HSMM is used to construct the temporal behavior among process variables during each batch run. By constructing a two-dimensional model, the proposed method can generate simple probability monitoring charts and monitor the progress in each batch run. The proposed method has the temporal property of HSMM and the batch-to-batch dynamic characteristics of DMPCA. Its advantages are demonstrated through a simulated fedbatch penicillin cultivation process characterized by fault sources.

1. INTRODUCTION The online monitoring of batch operation performance is necessary. It helps ensure the safety and the quality of products. Most of the batch adjustment strategies deal with repetitive tracking control or disturbance rejection in dynamic systems to produce proper outputs and reduce variability in final products. Because of the well-equipped computerization in the model of the industrial process, the process data can be accessible from the stored data at the touch of a button during any time period. The data can be systematically studied so that the variability of these sources and the correlation between the measurement profiles and the final product quality can be well understood. In recent years, several statistical techniques for mining process information of the operating measurement profiles have been developed. Nomikos and MacGregor (1994) first used multiway principal component analysis (MPCA) to analyze three-way batch data.1 Since then, batch process monitoring based on the multivariable statistical control process had been developed increasingly because of easy implementation of associated theoretical concepts.2-4 Comprehensive reviews on this area have been found in the literature.5,6 However, most of the existing multivariate statistical process control methods for batch monitoring are used to identify assignable or special causes of variability based on the assumption that each batch data are independent and identically distributed. As the same task is repeated many times, the system returns to the initial operation condition before the next batch run or the trial begins. In fact, the batch system is considered as a two-dimensional model that represents the functions of the operation number in the trial domain and of the operation time in the time domain. This assumption of the entire stage bounded as a single one does not necessarily hold true for the behavior in a two-dimensional manner. Hence, the conventional MPCA-based method is not appropriate for the events occurring in the batch operation process. To enhance safety and profits in the batch operation, the development of online monitoring for the two-dimensional r 2011 American Chemical Society

batch operation is critical to the successful implementation of the batch process monitoring scheme. Because of high dimensionality in time space and variable space, multiphase characteristics of each batch, and batch-tobatch dynamic variation, MPCA is not able to deal with the twodimensional batch operation because it assumes that the batch data are independent and identically distributed. Despite the characteristics of the two-dimensional batch process, less effort has been devoted to improving the monitoring scheme of batch processes in the past. The moving window MPCA approach was a strategy that generated a model by incorporating batch-tobatch information.7 The measurements of the past batch runs were included in the model with the measurements of the current batch run. Next, the conventional single MPCA was applied. The entire batch (including the current and the past) data were still treated as a single batch without considering the phase changes during the multiphase operation. Another two-dimensional PCA (2DPCA) approach representing the batch process in a twodimensional space was presented.8 Although the 2-D dynamic modeling techniques were applied, the determination of 2-D support region was very difficult for a batch operation with irregular and complicated process dynamics. However, the above two methods assumed that the statistical indices followed Gaussian distribution for easy construction of the control limits. The assumption is not realistic in the practical situation. Thus, the monitoring methods for the two-dimensional batch process need to be improved. In the past, the hidden Markov model (HMM) has been widely applied to the time series operation data. HMM has been a dominant method in speech recognition and character recognition in the past decade.9-11 The rich mathematical structure and Received: May 30, 2010 Accepted: December 30, 2010 Revised: November 26, 2010 Published: February 08, 2011 3345

dx.doi.org/10.1021/ie101189g | Ind. Eng. Chem. Res. 2011, 50, 3345–3355

Industrial & Engineering Chemistry Research proven accuracy on critical application increased the popularity of HMM.9 It provides a realistic representation of a dynamic system. HMM is a parametric model, which can be trained using the observation sequences. It characterizes the probability model with spatial phenomena and time-scale distances. It also provides an appropriate solution by means of its modeling and learning capability even though it may not have the knowledge of the exact mechanisms involved to solve the problem. However, the modeling of HMM is limited by the Markov property in processing realistic duration of different stages. It cannot accurately provide representation of the structure of multiphase operations due to the limitation of state duration of HMM to one with exponential nature.12 An extended model of HMM, called the hidden semi-Markov model (HSMM), has been developed. As it is flexible enough to describe the time spent in a given state, it has better modeling and analysis capabilities of sorting precision in state recognition.13 After this pioneering work, several problems related to HSMM were further investigated.14-18 To our best knowledge, the full potential of semiMarkov models has not yet been recognized in the development of the batch monitoring. With the integration of the HSMM framework, a statistical model of two-dimensional dynamic process variables for the batch process is developed using HSMM and dynamic MPCA (DMPCA) in this Article. The dynamic data array is constructed by incorporating both static and dynamic process characteristics using the prior and the current batches. The strongest relations of the scores are extracted by DMPCA. The HSMM model is trained with the extracted scores obtained from MPCA method to enhance the capability of statistical process monitoring. Thus, the control charts for a two-dimensional batch are developed in this research. The charts can not only analyze the two-dimensional spatial-temporal measurements but also capture the statistical characteristics of the practical data. This Article is structured as follows. The next section provides the background for the proposed method by presenting a brief overview (1) of MPCA as a tool for extracting the spatial information by decomposing the two-dimensional batch process variables, and (2) of HSMM as a tool for modeling uncertain multiphase of temporal information by training a sequential batch operation. The proposed monitoring method of batch units is developed in section 3. Its monitoring statistics is also established to build up the confidence limit at each sampling point to monitor the online operating profiles. In section 4, the effectiveness of the proposed method is demonstrated through a simulated fedbatch penicillin cultivation process. The example investigates the performance of the proposed method and makes a comparison with the conventional algorithms. Finally, concluding remarks are made.

2. BACKGROUND TECHNIQUES 2.1. Multiway Principal Component Analysis (MPCA). The data set of a batch process can be arranged as a three-dimensional array as shown in Figure 1. Assume the operation batch has J variables measured K times throughout one batch run. Similar runs will be repetitive at I batches. Thus, a three-dimensional array X with I  J  K can be constructed. In MPCA, this threeway matrix must be unfolded to obtain a two-way matrix on which PCA can perform. Figure 1 shows an approach to unfolding the three-way matrix. Each vertical slice I  J is put

ARTICLE

Figure 1. The unfolding structure of MPCA.

side by side to the right, starting with the slice corresponding to the first time interval. The resulting two-dimensional matrix (X) has dimensions I  JK. To get the information on the variability among the batch variables, PCA is performed: X ¼

R X a¼1

ta pTa þ E ¼ TPT þ E

ð1Þ

where R is the number of principal components, E is a residual matrix, T = [t1 t2 ... tR] represents the score matrix, and P = [p1 p2 ... pR] is the loading matrix. Several examples have been used to analyze batch data and monitor batch processes.3,4,19,20 However, the above model structure takes the entire batch data as a single object to extract the various features constituting the process trend. Morever, a common practice in the batch process is that the initial condition of each batch may not be exactly reset to the same. The dynamic behavior in the serial batch runs is inherent. This causes MPCA to be unable to differentiate the operation features at different temporal batches. 2.2. Hidden Semi-Markov Model (HSMM). HSMM illustrated in Figure 2 is a double stochastic model. It can not only capture the serial correlation in the observation sequence but also take process random factors into account. The operation process with different time duration may often lead to different behaviors. HSMM is a generalization of Markov models. It models the distribution over sojourn time. At each time epoch, there are two probabilities that need to be calculated. One is the probability of being in one discrete state transition from one state (hl) to another (hlþ1), and the other is the duration in that discrete state ({Sk}). At the bottom of Figure 2, the HSMM model consists of a pair of discrete-time stochastic processes, {Sk} and {yk}, k ∈ {1... K}. For simple explanation and convenience of notation reasons, the observation process {yk} is assumed to be one-dimensional in the above expression. The extension to the multivariable case will be discussed later. The observed process {yk} is linked to the hidden or nonobserved state process {Sk} by the conditional distribution bsk(yk) depending on the state process, bsk(yk) = P(yk| Sk). This means that the output process at time k depends only on the state of the underlying semi-Markov chain at time k. Because of the sojourn in each of the state, the entire sequence of 3346

dx.doi.org/10.1021/ie101189g |Ind. Eng. Chem. Res. 2011, 50, 3345–3355

Industrial & Engineering Chemistry Research

ARTICLE

Figure 2. The structure of hidden semi-Markov models.

microstates {S1,S2,...SK} can be reduced into the sequences of macrostate {h1,h2,...hL}: h1  fs1 , s2 , :::, sq1 g h2  fsq1 þ 1 , sq1 þ 2 , :::, sq1 þ q2 g hL  fsq1 þ 3 3 3 þ qL - 1 þ 1 , sq1 þ 3 3 3 þ qL - 1 þ 2 , :::, sK - 1 g

ð2Þ

where the state h1, h2, ..., hL represents the macrostate shown in Figure 2. Each macrostate consists of several microstates ({Sql-1þ1Sql-1þ2...Sql-1þql}). Figure 2 shows the aggregation of a macrostate and the corresponding microstates. hl has a finite set of states, say N. Transitions occur from state hl-1 to state hl when transition probability ahl-1hl is positive. When the system enters state hl, state sojourn time ql has an occupancy distribution phl(ql). Because of its nature in modeling of time series, the leftto-right models are applied, that is, ahlhj = 0 for all j e l. The advantages of this model are the sparsity of the transition matrix and explicit start and end points of the Markov chains. Thus, each state with N(N - 1) possible transitions is reachable from every other state. HSMM is like an HMM except that each macrostate can lead to a sequence of observations. To estimate the HSMM model λ, the expectation-maximization (EM) procedure can be used to obtain the probability model efficiently. λ ¼ arg max PðyK1 jλÞ ð3Þ λ

where the parameters of the HSMM model λ include the initial state π = {πh1}, the state transition probability A = {ahl-1hl}, the observation probability B ={bsk}, and the state duration probability D = {phl(ql)}. In this study, the state output is a mixture of Gaussian distribution characterized by the mean μhl,v, variance P 2 , and mixing weight chl,v for each node (v), and the duration hl,v distribution is a single Gaussian distribution characterized by the mean mhl and variance σhl: V X ð4Þ chl , v Nðyk ; μhl , v , Σ2hl , v Þ bsk ðyk Þ ¼ v¼1

phl ðql Þ ¼ Nðql ; mhl , σ2hl Þ The training consists of two steps: finding the HSMM parameters λ that maximize the probability and computing the joint probability. It is a recursive algorithm. Each iteration of the EM algorithm increases P(yK1 |λ), and, generally, the sequence of re-estimated parameters λ converges to a local maximum of P(yK1 |λ). Note that for many practical applications, the two-state Gaussian mixture model for the state output has been applied because it is simple, robust, and easy to use. The details of the training algorithm can be found in refs 14,16,17.

3. TWO-DIMENSIONAL BATCH MODEL: DMPCA-HSMM APPROACH In many batch processes, the same run is carried out repeatedly. The same time-varying trajectories are used batch after batch. The conventional MPCA monitoring method implicitly assumes that the entire batch data of each batch are statistically independent of the observation data at the previous batch runs. This assumption is valid only for the same fixed starting condition for all the batch runs, such as a repeated pick-and-place motion of a robot. In fact, the starting condition of the operation system at the current batch run is the result of the input actions at the previous batch runs. Moreover, as compared to continuous processes, batch process operations are by nature much more dynamic, and they involve several transitions that cover large operating envelopes of phase durations. This means that the multiplicity of the operation phases is an inherent nature of many batch processes and each phase exhibits significantly different underlying behaviors. However, the conventional MPCA model takes the entire batch data as a single object. It is unable to differentiate the local behaviors of the batch processes. In this section, a twodimensional batch model is developed for online monitoring of two-dimensional batch processes. 3.1. HSMM Modeling Based on Dynamic Batch Data. To capture information in the previous batches, a dynamic batch model should be constructed. It is not only concerned with the correlation in the sequence batch but also with the autocorrelation of the process variables. The serial correlation is taken into account when all the variables and their time histories of the previous batches are augmented into the data matrix for the current batch. Figure 3 shows an approach to an incorporation of batch-to-batch information. The serial batch correlations are taken into account by augmenting each observation vector with the observations of the previous d batches. Each horizontal slice K  J is put side by side to the right, starting with the slice corresponding to the first batch run until d þ 1 batch window lengths. The resulting two-dimensional matrix has dimensions of K  (d þ 1)J. Likewise, the second two-dimensional matrix is constructed starting with the second batch run. Keep doing the same procedures until I - d two-dimensional matrices are completed. Concatenate these two-dimensional matrices vertically, padding the matrices automatically to make a large twodimensional array: 3 2 X D1 7 6 6 l 7 7 6 D 7 XD ¼ 6 ð5Þ 6 Xl 7 6 l 7 5 4 XDI- d 3347

dx.doi.org/10.1021/ie101189g |Ind. Eng. Chem. Res. 2011, 50, 3345–3355

Industrial & Engineering Chemistry Research

ARTICLE

Figure 3. Time-wise unfolding used for constructing the batch lagged window.

where

is now considered as

X Dl ¼ ½ Xl

Xl þ 1 ::: Xl þ d , l 2 xl1, 1 xl1, 2 ::: 6 l 6 x2, 1 xl2, 2 ::: Xl ¼ 6 6 : 4 xlK , 1 xlK , 2 :::

¼ 1, 2, :::, I - d 3 xl1, J 7 xl2, J 7 7 7 5 l xK , J

XD ¼ TD PT þ explained

ð6Þ

where the observation of variable j at the time point k in batch run l is represented as xij,k. Here, all the batches have equal duration and are synchronized. If the batch duration varies within a series of batches, the batch length must be aligned prior to performing the next analysis. Several papers have addressed the problem of scaling the time axis.21,22 Like the single variable procedure of HSMM discussed above, although the multivariable batch data array (eq 5) can be directly applied to training HSMM, it is not always an easy task due to its complex, interactive, and time-varying nature among the monitored variables. Huge storage and computational load are also required. To reduce the variables, PCA is applied in this Article to decompose the process variables into a new set of loading variables in the lower dimensional space. It extracts the independent principal components, which could capture most of the characteristics of the system pattern in a multivariable process behavior. After the data array is decomposed, the DMPCA model

Ε

unexplained

ð7Þ

where P is the loading matrix, which retains the first R principal component loadings. E is a residual vector, which cannot be explained by the DMPCA model. The score matrix is 2 3 TD1 6 6 TD2 7 7 7 TD ¼ 6 ð8Þ 6 l 7 4 5 TDI- d and

2

3 tDl, 1 6 D 7 6 tl, 2 7 7 TDl ¼ 6 6 l 7, 4 5 tDl, K

l ¼ 1, 2, :::, I - d

ð9Þ

Note that the window size, d, is an important factor to capture dynamic batch information. However, the window length is not known a priori. To properly select the window size to reflect the real static and dynamic relations among batches, initially MPCA is carried out on the XD matrix with d = 0 only to obtain the 3348

dx.doi.org/10.1021/ie101189g |Ind. Eng. Chem. Res. 2011, 50, 3345–3355

Industrial & Engineering Chemistry Research

ARTICLE

Figure 4. Tying in the HSMM model across different batch sets.

residual matrix E. Both the prior batch data and the current batch data are incorporated to get a matrix with a new window size, and then PCA is performed. If the variation of the difference of the above residual matrices is larger than the prespecified threshold, data of one more previous batch are padded until convergence of the variation of the difference of the residual matrices for the two adjacent window sizes is achieved. Next, as shown in Figure 4, each TD l in the batch matrix D K T is extracted to form an observation sequence [TD l ]1 = D D D D K {tl,1tl,2...tl,K}, l = 1,2...I - d. Now the new variables [Tl ]1 that are linear combinations of the original ones are uncorrelated with each other. The above procedure enables us to easily model HSMM because it is only P necessary to perform the diagonal covariance matrices ( hl2,v) in eq 4, in contrast to the standard ones. Furthermore, instead of building HSMM upon the training data of one batch set only, data tying across all simultaneous batch sets is applied to training the HSMM model in the reduced space. Tying increases the amount of training data to avoid overfitting the HSMM model and achieve the robust training result. 3.2. DMPCA-HSMM-Based Online Control Charts. Once the HSMM model is trained, the most likely state sequence for an observed sequence should be found out. In a sequence batch operation, each observed sequence infers a corresponding hidden state path. However, there are potentially many state paths that can generate the same sequence. For the description of the process operation, the best state path of HSMM for each batch should be found in the trained model. The best state path should also be able to maximize the path probability for a given observation. The optimal state sequence is used instead of all

the possible state sequences for more efficient classification. For a D given model and an observation score sequence [TD]K1 = {tD 1 t2 ... D L tK }, the most likely underlying state sequence ((h1 )* = {h1*h2*... can be found. h*}) 3 



ðhL1 Þ ¼ f h1



h2

:::



Pð½TD K1 , hL1 jλ Þ hL g ¼ arg max L h1

ð10Þ A recursive Viterbi algorithm with dynamic programming can be used to find out the most likely state sequence.23 Repeat the same procedure for all the collected sets to compute the most likely state sequences for each set. Figure 5 shows the single best state path for each set is found. Not all of the sets follow the same best path. Different selected states for different sets account for the fact that the observed values of the process variables do not perfectly conform to the deterministic model. This variability is resulted from the uncertain variations and disturbances among the hidden variables that affect the system. An example of a 4  4 dynamic process with two state variables based on Chou and Verhaegen’s problem24 is revised as xði, k þ 1Þ ¼ Axði, kÞ þ B½uði, kÞ - vði, kÞ þ pði, kÞ yði, kÞ ¼ Cxði, kÞ þ D½uði, kÞ - vði, kÞ þ Eyði - 1, kÞ þ oði, kÞ ð11Þ where y = [y1y2y3y4]T and x = [x1x2]T. In eq 11, the process outputs y(i,k) are affected by the dynmaic system of the current batch as well as by the dynmaic system of the previous batch outputs y(i - 1,k). The process inputs are perturbed with a 3349

dx.doi.org/10.1021/ie101189g |Ind. Eng. Chem. Res. 2011, 50, 3345–3355

Industrial & Engineering Chemistry Research

ARTICLE

Figure 5. Finding the best path for each set with the Viterbi algorithm. Note that not all the sets follow the same best path.

multitude of sinusoidal signals with various frequencies, u = [10 sin(k) 10 sin(2k) 10 sin(3k) 10 sin(4k)]T. Both input and output variables are subject to measurement noise, and the process is subject to process noise. The noises (v(k), p(k), and o(k)) are disturbances subjected to a beta distribution. Assume that the system contains four different phase operations but the duration of each phase is not fixed. The system model matrices (A, B, C, and D) for the corresponding phases are different. First, a total of successful 50 batches of data sets are collected. Each batch run has the duration of 500 time intervals. The four measurement variables (y) at each time point are scaled to zero mean and unit variance. Utilizing the measurements of the current and previous batches, five principal components are selected by cross validation for DMPCA modeling. With these five principal components, DMPCA explains 91.0% of the variance of the normal batch data. With the trained model, the best state path with the D L highest probability (max P(tD l,1...tl,K|(h1 ),λ)) is found using eq 10. Figure 6 shows a pictorial display of the output score distribution K K (P(tD l,k(s1 )|s1 ,λ)) based on the selected optimal state path at each time point for all the batches. The distribution can help assess what the status of the current operating batch is and if the operation batch is in control. For online monitoring of the measurement variables, the control limits should be built up at each sampling time point. As the new batch unfolds, only the process values of the past batches and the current batch from the beginning until the current time (k) of the batch are available (xD c (k)). x Dc ðkÞ ¼ ½ x c - d ðkÞ x c - d þ 1 ðkÞ x c ðkÞ ¼ ½ xck, 1

xck, 2

:::

::: x c ðkÞ 

ð12Þ

Because the statistic distribution does not follow the Gaussian distribution, the conventional statistical control limits based on the mean and variance would not be applied. With the developed probability distribution that reflects the normal operation, the control limit for the selected state at each time point is required to detect any departure of the process from its standard behavior: Z  PðtDc ðkÞjðhL1 Þ , λÞ dt e 0:95  L D th Pðtc ðkÞjðh1 Þ , λÞ>P ðkÞ k ¼ 1, 2, :::K

xck, J , c ¼ 1, 2, :::I - d

The score matrix by projecting the data (xD c (k)) mentioned above onto the loading matrix (P) can be computed: tDc ðkÞ ¼ xDc ðkÞP

Figure 6. The output distribution based on the selected optimal state path at each time point.

ð13Þ

ð14Þ

This integral is approximated by bootstrapping the original batch data set to generate enough data and truly obtain quality-related variables from the batch process data. The control limit can be calculated. For example, the 95% control limit defines a region that encompasses 95% of the nominal process data. When the 3350

dx.doi.org/10.1021/ie101189g |Ind. Eng. Chem. Res. 2011, 50, 3345–3355

Industrial & Engineering Chemistry Research

ARTICLE

threshold value (0.4 g/L). Because of input variations, the batch/ fed-batch switch times shown in Figure 7 are quite different for different batch runs. The process performs the above prescribed tasks repeatedly. However, in the iterative operation, the initial condition of each batch may not be reset to the desired initial condition or inside a neighborhood of the desired initial condition, because small amounts of biomass and substrate in the previous batch run are remained in the tank. Thus, the initial concentrations of biomass (Cm(i,k = 1)) and substrate (Cs(i,k = 1)) at the starting point (k = 1) are the mixing of the biomass in the feed (Cfs(i,k = 1)) and the previous batch (Cs(i - 1,k = K)) as well as the mixing of the substrate in the feed (Cfm(i,k = 1)) and the previous batch (Cm(i - 1,k = K)), respectively: Cs ði, k ¼ 1Þ ¼ Cfs ði, k ¼ 1Þ þ ςs Cs ði - 1, k ¼ KÞ

ð17Þ

Cm ði, k ¼ 1Þ ¼ Cfm ði, k ¼ 1Þ þ ςm Cm ði - 1, k ¼ KÞ ð18Þ Figure 7. The switch time distribution of the penicillin cultivation process from the batch mode to the fed-batch mode.

new data fall outside of the 95% control limit, the operating process is classified as abnormal. For more information on the bootstrapping, refer to work by Efron and Tibshirani (1993).25 The data corresponding to 95% of the confidence limit are taken to get a likelihood threshold (Pth(k)) at time point k. The corresponding unexplained part can be obtained by applying DMPCA: eDc ðkÞ ¼ ½ ec - d ðkÞ

ec - d þ 1 ðkÞ :::

¼ x Dc ðkÞ - tDc, k PT

ec ðkÞ  ð15Þ

The SPE statistics for the deviations from the model behavior (residuals) of the process variation at sample k is defined as SPEðkÞ ¼ ec ðkÞeTc ðkÞ

ð16Þ

Note that the row vector ec(k) represents the residuals at time point k of the current batch. The confidence limits of SPE(k) cannot be determined directly from a particular distribution. The kernel density estimation is used to determine the confidence limits.26

4. ILLUSTRATION AND DISCUSSION A benchmark simulation of fed-batch penicillin production is used here to demonstrate the application of the proposed method. The system has nonlinear dynamics and multiphase characteristics. During the initial preculture phase, the necessary cell mass is generated. Penicillin then starts to be generated at the exponential growth rate and continues to be produced until the stationary phase. In general, the pH value and the fermentor temperature are controlled to provide the best conditions for making the desired product. The simulation condition and the relevant parameters are referred to in Birol et al.,27 but the batch data are generated on the basis of our own codes. The developed codes are based on the mechanistic model of Birol et al. to easily collect data for analysis. In all runs, a batch culture is followed by a fed-batch operation by depletion of carbon source. The system switching itself from batch modes to the fed-batch modes of operation depends on whether glucose concentration reaches its

where ςs and ςm are weighting factors. Equations 17 and 18 show the effect of the batch-to-batch interaction in the sense that carryover from the previous batch will affect the performance of the next batch. Also, because the substrate flow rate (Fs(i,k)) will affect the desired product, iterative learning control is conducted to adjust the flow rate to improve the process performance: Fs ði, kÞ ¼ Fs ði - 1, kÞ þ ω½Csp s ðkÞ - Cs ði, kÞ

ð19Þ

Cssp(k)

where is the desired substrate concentration at each time point k and ω is the adjusting ratio. Under the normal operation condition, small variations are added to the simulation input data to mimic the process variation under the normal operating condition. The duration of each batch is 400 h. The sampling interval is 45 min. Also, measurement noise is added to each of the monitoring variables. To build up a DMPCA-HSMM-based monitoring system, the normal operating condition of the penicillin fermentation is simulated in this study. In the normal condition, a total of 11 process variables (J) with six states (L) and two Gaussian functions (V) are considered to train DMPCA-HSMM. Before applying HSMM, DMPCA should be constructed. In DMPCA, a total of 11 measurement variables at each time point are mean centered and scaled to unit variance. Based on the proposed method, the batch window size (d) is one. That is, the current and the previous batch data are arranged in a batch lagged data matrix form (eq 5). The number of variables in the data matrix is more than 5800. Then this matrix is decomposed using PCA and cross-validation. It is found that 14 principal components are needed to describe the data set. This model captures 91.1% of the variation in the process data set. Among the selected principal components, the weights, which are assigned to variables and which represent the relative variances of the variables, are pretty similar. That is, the proposed model can easily extract the withinbatch and the batch-to-batch dynamic behaviors in the highly correlated batch measurements from the high-dimensionally measured space. Because of integration of DMPCA into HSMM, the number of parameters is significantly reduced as compared to HSMM alone. It only takes three to four iterations to train the DMPCA-HSMM model and achieve convergence. It is practical for industrial use. With these 36 batches, the upper control limits (95%) for the SPE-chart and the log P-chart values are computed at each time point. The control chart with the control limit of 95% for the normal batch is shown in Figure 8. 3351

dx.doi.org/10.1021/ie101189g |Ind. Eng. Chem. Res. 2011, 50, 3345–3355

Industrial & Engineering Chemistry Research

ARTICLE

Figure 9. Normal distribution plot of (a) score 1 and (b) score 2 from DMPCA.

Figure 8. Control charts for online monitoring of the normal batch in the example: (a) DMPCA, (b) 2DPCA, and (c) DMPCA-MSMM. Each chart contains 95% (solid line) control limit. The dashed line represents the normal batch.

Three additional batches, comprised of one normal batch and two abnormal batches, are generated for testing. In the first abnormal batch, it is assumed that the substrate feed rate is linearly decreased from 0.04 to 0.028 from the 60th hour until the

end of the batch. The second abnormal batch is the agitator power rate, which is decreased from 30 to 25.8 W from the 60th hour until the end of the batch. The first batch (normal) is monitored at each time point. Three different models, DMPCA, 2DPCA, and DMPCA-HSMM, are used to monitor the operating batch for a comparison. In DMPCA, the scheme of the data matrix by augmenting each batch data array with previous batch arrays is applied. The result of each time point shown in Figure 8a is mostly acceptable, and it is considered to be “in control”, but there are some false alarms at the transition of two different phases. In 2DPCA, the dynamic batch data of time-wise dimension and batch-wise dimension are used for batch process modeling. Appling 2DPCA gives the results presented in Figure 8b. There is still some false detection between the 40th hour and the 50th hour when the transition of two different phases occur. Again, DMPCA-HSMM is applied to the same test batch. In Figure 8c, the SPE and log P(tk|Sk,λ) charts at each time point are obtained from the proposed DMPCA-HSMM. It is evident from the charts that DMPCA-HSMM can capture the dominant behavior without getting wrong conclusions. In DMPCA, the statistic distributions of the first two scores shown in Figure 9 do not conform to a Gaussian distribution. The reliability of the monitoring charts of DMPCA (Figure 8b) is deteriorated due to the significant effect of autocorrelation on the statistic distribution 3352

dx.doi.org/10.1021/ie101189g |Ind. Eng. Chem. Res. 2011, 50, 3345–3355

Industrial & Engineering Chemistry Research

ARTICLE

Figure 10. Normal distribution plot of (a) score 1 and (b) score 2 from 2DPCA.

Figure 11. Control charts for online monitoring of the normal batch in the example: HSMM.

among batches. In 2DPCA, Figure 10 also shows the normal probability plot of the first scores of the stage PCA model. It indicates that there is a problem with the normality

Figure 12. Control charts for online monitoring of the normal batch in the example: (a) DMPCA, (b) 2DPCA, and (c) DMPCA-MSMM. Each chart contains 95% (solid line) control limits. The dashed line represents the abnormal batch.

assumption. In other words, PCA-based statistical monitoring methods assume that the observed variables are normally distributed in T2 test and the residuals are normally distributed in SPE test. As a result, DMPCA (Figure 8b) and 2DPCA 3353

dx.doi.org/10.1021/ie101189g |Ind. Eng. Chem. Res. 2011, 50, 3345–3355

Industrial & Engineering Chemistry Research

ARTICLE

Table 1. Overall Successful Detection Rates for the Individual SPE and T2, and the Combined SPE and T2 of DMPCA and 2DPCA, and for the Individual SPE and log P, and the Combined SPE and log P of DMPCA-HSMM under the 95% Control Limits in the Example methods DMPCA

normal 79.28

87.03

85.23

T2

89.08

41.43

30.45

SPE T2 SPE ∪ T2

DMPCA-HSMM

agitation power rate linearly decreases

SPE SPE ∪ T2

2DPCA

substrate feed rate decreases linearly

70.24

82.62

80.37

100.00

51.04

89.85

95.86 95.86

10.34 48.40

10.53 87.97

SPE

98.87

85.87

93.22

log P

100.00

67.23

96.61

98.87

88.14

96.61

SPE ∪ log P

(Figure 8c) have false detections when applying the T2and SPE test statistics. If the complete training data structure is directly applied to HSMM without variable reduction, the control chart of log P(yk| Sk,λ) is shown in Figure 11. HSMM has apparently identified certain observations to be outliers during the whole batch run, because it depends on the whole and intact data structure, excluding the dynamic window of the past batches. Furthermore, the number of parameters in DMPCA-HSMM and HSMM is 255 and 1355, respectively; the former is significantly less than the latter. Thus, the compact structure of DMPCA-HSMM still exhibits good detection capabilities. For the first abnormal batch, the monitoring outcomes of three models are shown in Figure 12. The initial time of the fault is induced at the 50th hour and until the end of cultivation. The decrease in the feed rate results in a reduction of penicillin production. As shown in Figure 12a and b, the SPE or T2 monitoring charts of DMPCA indicate that several false alarms occur before the 60th hour, while those of 2DPCA cannot detect this fault in the whole operating duration. Moreover, there are false detections of 2DPCA around time point 40. The detection abilities of DMPCA-HSMM have been significantly improved, and the fault is able to be detected earlier after applying DMPCA-MSMM to characterizing the temporal dynamics between batches. In Figure 12c, after the 150th hour, SPE has increased remarkably and fallen outside of the 95% confidence limit, and log P(tk|Sk,λ) also fallen outside of the 95% confidence limit after the 220th hour. The second abnormal batch is generated by introducing a decrease into the agitation power at the 60th hour until the end of the batch. The degree of agitation directly results in a decrease in the dissolved oxygen level, leading to reduced biomass growth and lower penicillin concentration. The proposed method can still reflect the dynamic relations of the batch process and detect the fault. For the sake of brevity, Table 1 shows the performance of the fault detection of the proposed method. For making comparison with other methods, the successful detection rate of each fault, including the normal condition, is achieved in Table 1. The proposed method has a good ability to capture the actual process relationship between the variables in the noise environment. The incidences of false alarms have been significantly decreased, and the control charts clearly indicate the occurrences of the actual events. Thus, the abnormal condition of this system can be detected earlier through making correlations of deviation before the completion of the batch.

5. CONCLUSION Monitoring the operations to check if there are abnormalities and limiting the incidence of these abnormalities have been the emphasis of many online monitoring batch operation studies. Although a skilled engineer can interpret and manually detect the fault pattern off-line at the end of batch operation, in the real operation manufacturing environment, the fault must be detected online within the batch run when the sensor data are measured. So, online process monitoring has been attracting increasing interest in the field of process safety and quality control. In the past research, most of the tools used for the detection of disturbance assumed that the process data follow a Gaussian distribution. The assumption is not necessarily satisfied in batch operation processes with complex and dynamic characteristics. In this research, the conventional multivariate SPC based on MPCA are extended by incorporating the HSMM structure to solve the detection problem for the two-dimensional batch operation. The approach consists of three major stages: (1) enriching information, augmenting the batch measurements with the previous batch ones; (2) extracting features, compressing the data information in the low dimensional space using MPCA; and (3) modeling probability distribution, training the probability distribution model to represent the two-dimensional batch behavior using HSMM. At the first stage, the current and the past batch measurements are stacked together to take into account the batch-to-batch serial correlations. In this stage, the temporal evolution of the twodimensional batch operation behavior can be tracked. At the extracting feature stage, the MPCA scheme is used to determine the correlation between the measurements within the adjacent batch runs as well as among other nonadjacent batch runs. The objective of the feature extraction stage is to increase the robustness of the two-dimensional pattern by reducing the dimensionality of the data and retaining most of the essential features. To quantitatively discriminate the probability distribution of multiphase operations, HSMM is applied at the modeling probability distribution stage. The two-dimensional patterns can be trained. The corresponding control limits that evaluate the current process at each time point can be estimated. To investigate the feasibility of the proposed DMPCAHSMM method, the comparisons of quantitative and qualitative results for the test problems are done. The proposed method based on the statistical distribution in the two-dimensional 3354

dx.doi.org/10.1021/ie101189g |Ind. Eng. Chem. Res. 2011, 50, 3345–3355

Industrial & Engineering Chemistry Research domain is shown to address the limitations of the existing MSPC methods. The proposed method for cross-fertilization between the dynamic MPCA data structure and the temporal HSMM enables engineers to assess the current operation batch system so that measure can be done to improve the system performance. Discriminating the disturbance types and estimating the missing data will be our potential research topics in the future.

’ AUTHOR INFORMATION Corresponding Author

*E-mail: [email protected].

’ ACKNOWLEDGMENT This work is supported by the National Science Council, R.O.C. and the Center-of-Excellence Program on Membrane Technology from the Ministry of Education, R.O.C. ’ REFERENCES (1) Nomikos, P.; MacGregor, J. F. Monitoring Batch Processes Using Multiway Principal Component Analysis. AIChE J. 1994, 40, 1361. (2) Chen, J.; Liu, K.-C. On-Line Batch Process Monitoring Using Dynamic PCA and Dynamic PLS Models. Chem. Eng. Sci. 2002, 57, 63. (3) Lennox, B.; Hiden, H. G.; Montague, G. A.; Kornfeld, G.; Goulding, P. R. Process Monitoring of an Industrial Fed-Batch Fermentation. Biotechnol. Bioeng. 2001, 74, 125. (4) Louwerse, D. J.; Smilde, A. K. Multivariate Statistical Process Control of Batch Processes Based on Three-way Models. Chem. Eng. Sci. 2000, 55, 1225. (5) Qin, S. J. Statistical Process Monitoring: Basics and beyond. J. Chemom. 2003, 17, 480. (6) Cinar, A.; Parulekar, S. J.; Undey, C.; Birol, G. Batch Fermentation: Modeling, Monitoring and Control; Marcel Dekker: New York, 2003. (7) Flores-Cerrillo, J.; MacGregor, J. F. Multivariate Monitoring of Batch Processes Using Batch-to-Batch Information. AIChE J. 2004, 50, 1219. (8) Lu, N.; Yao, Y.; Gao, F.; Wang, F. Two-dimensional Dynamic PCA for Batch Process Monitoring. AIChE J. 2005, 51, 3300. (9) Rabiner, L. R. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proc. IEEE 1989, 76, 257. (10) Schenkel, M.; Guyon, I.; Henderson, D. On-Line Cursive Script Recognition Using Time Delay Neural Networks and Hidden Markov Models. Machine Vision Appl. 1995, 8, 215. (11) Ge, M.; Du, R.; Xu, Y. Hidden Markov Model Based Fault Diagnosis for Stamping Processes. Mechanical Syst. Signal Process. 2004, 18, 391. (12) Camci, F.; Chinnam, R. B. Dynamic Bayesian Networks for Machine Diagnostics: Hierarchical Hidden Markov Models vs Competitive Learning. Proc. of the Internal Joint Conf. on Neural Networks; Montreal, Canada, 2005; Vol. 3, p 1752. (13) Ferguson, J. D. Variable Duration Models for Speech. Symp. Appl. Hidden Markov Models to Text and Speech 1980, 143. (14) Guedon, Y. Estimating Hidden Semi-Markov Chains from Discrete Sequences. J. Comput. Graphical Stat. 2003, 12, 604. (15) Sansom, J.; Thomson, P. Fitting Hidden Semi-Markov Models to Breakpoint Rainfall Data. J. Appl. Probability 2001, 38A, 142. (16) Yu, S.-Z.; Kobayashi, H. An Efficient Forward-backward Algorithm for an Explicit-Duration Hidden Markov Model. IEEE Signal Process. Lett. 2003, 10, 11. (17) Dong, M.; He, D. Hidden Semi-Markov Models for Machinery Health Diagnosis and Prognosis. Trans. NAMRI/SMEXXXII 2004, 32, 199. (18) Dong, M.; He, D. A Segmental Hidden Semi-Markov Model (HSMM)-Based Diagnostics and Prognostics Framework and Methodology. Mechanical Syst. Signal Process. 2007, 21, 2248.

ARTICLE

(19) Neogi, D.; Schlags, C. E. Multivariate Statistical Analysis of an Emulsion Batch Process. Ind. Eng. Chem. Res. 1998, 37, 3971. (20) Nomikos, P.; MacGregor, J. F. Multivariate SPC Charts for Monitoring Batch Processes. Technometrics 1995, 37, 41. (21) Neogi, D.; Schlags, C. E. Multivariate Statistical Analysis of an Emulsion Batch Process. Ind. Eng. Chem. Res. 1998, 37, 3971. (22) Kassidas, A.; MacGregor, J. F.; Htaylor, P. A. Synchronization of Batch Trajectories Using Dynamic Time Warping. AIChE J. 1998, 49, 864. (23) Moore, M. D. Most Likely State Sequence Speech Reconstruction Using a Generalized Hidden Semi-Markov Model with Two Distinct Regeneration Times Applied to English. Ph.D. Dissertation, Rensselear Polytechnic Institute, 2004. (24) Chou, C. T.; Verhaegen, M. Subspace Algorithms for the Identification of Multivariable Dynamic Errors-in-Variables Models. Automatica 1997, 33, 1857. (25) Efron, B.; Tibshirani, R. J. An Introduction to the Bootstrap; Chapman & Hall: New York, 1993. (26) Bishop, C. M. Neural Networks for Pattern Recognition; Clarendon Press: New :: York, 1995. (27) Birol, G.; Undey, C.; Cinar, A. A Modular Simulation Package for Fed-Batch Fermentation: Penicillin Production. Comput. Chem. Eng. 2002, 26, 1553.

3355

dx.doi.org/10.1021/ie101189g |Ind. Eng. Chem. Res. 2011, 50, 3345–3355