Multiway Gaussian Mixture Model Based Adaptive Kernel Partial Least

Sep 13, 2012 - Multi-model adaptive soft sensor modeling method using local learning and online support vector regression for nonlinear time-variant b...
12 downloads 11 Views 3MB Size
Article pubs.acs.org/IECR

Multiway Gaussian Mixture Model Based Adaptive Kernel Partial Least Squares Regression Method for Soft Sensor Estimation and Reliable Quality Prediction of Nonlinear Multiphase Batch Processes Jie Yu* Department of Chemical Engineering, McMaster University, Hamilton, Ontario, Canada L8S 4L7 ABSTRACT: The predictive model based soft sensor technique has become increasingly important to provide reliable online measurements, facilitate advanced process control and optimization, and improve product quality in process industries. The conventional soft sensors are normally single-model based and thus may not be appropriate for processes with shifting operating phases or conditions and the underlying changing dynamics. In this study, a multiway Gaussian mixture model (MGMM) based adaptive kernel partial least-squares (AKPLS) method is proposed to handle online quality prediction of batch or semibatch processes with multiple operating phases. The three-dimensional measurement data are first preprocessed and unfolded into two-dimensional matrix. Then, the multiway Gaussian mixture model is estimated in order to identify and isolate different operating phases. Further, the process and quality measurements are separated into multiple segments corresponding to those identified phases, and the various localized kernel PLS models are built in the high-dimensional nonlinear feature space to characterize the shifting dynamics across different operating phases. Using Bayesian inference strategy, each process measurement sample of a new batch is classified into a particular phase with the maximal posterior probability, and thus, the local kernel PLS model representing the identical phase can be adaptively chosen for online quality variable prediction. The presented soft sensor modeling method is applied to a simulated multiphase penicillin fermentation process, and the computational results demonstrate that the proposed MGMM-AKPLS approach is superior to the conventional kernel PLS model in terms of prediction accuracy and model reliability.

1. INTRODUCTION Batch or semibatch processes have been widely used to produce low-volume and high-value-added products in different industrial sectors including chemical, materials, food, pharmaceutical, biotechnology, and semiconductor industries. Batch operations often encounter the great challenge of lacking accurate real-time measurements of key product quality variables, which are essential for implementing advanced process control and optimization in the plant for continuously improving process efficiency and product quality.1−3 In recent years, soft sensor techniques have witnessed increasing popularity in process industries to provide reliable online measurements on critical product quality or environmental variables based upon predictive models instead of hardware instruments or off-line laboratory analysis.4−6 In literature work, soft sensor driven quality estimation typically relies on either first-principle mechanistic models or data-driven statistical models.6 The former class of approaches requires in-depth knowledge and understanding of process mechanisms, which may not always be available in complex industrial applications. Furthermore, the mechanistic model construction can be quite tedious and time-consuming so that the soft sensor development becomes a challenging task in practice.4,7,8 In contrast, the data-driven modeling techniques are more desirable for industrial applications as minimal process knowledge is needed while the plant historians provide abundant process measurement data for empirical model development.6,9 © 2012 American Chemical Society

The most common data-driven soft sensor methods are based on multivariate statistical techniques such as principal component analysis (PCA),10−13 partial least-squares (PLS),14−17 Fisher discriminant analysis (FDA),18,19 and independent component analysis (ICA).20 This class of methods usually project the original measurement data onto a linear subspace to extract variable features and then identify the predictive model within the lower-dimensional subspace. In addition to multivariate statistical analysis, the machine learning methods such as artificial neural networks (ANN) and support vector regression (SVR) have witnessed some success in soft sensor modeling of dynamic batch processes.8,21−24 Though different kinds of soft sensor modeling techniques have been developed for quality variable prediction in batch processes, they are usually based upon a single regression model given the underlying assumption of a constant operating phase and conditions throughout the entire duration of the batch process. In practice, however, batch processes often encounter shifting operation phases, which may further lead to switching process dynamics across different phases. Thus, the accuracy and reliability of quality variable prediction can significantly degrade as the operating phase and underlying Received: Revised: Accepted: Published: 13227

July 28, 2012 September 9, 2012 September 13, 2012 September 13, 2012 dx.doi.org/10.1021/ie3020186 | Ind. Eng. Chem. Res. 2012, 51, 13227−13237

Industrial & Engineering Chemistry Research

Article

where Γ is the regression coefficient matrix, Γ0 denotes the bias matrix, and W represents the input weighting matrix. The above PLS model can be solved from the nonlinear iterative partial least-squares (NIPALS) algorithm.26,27 2.2. Multiway Gaussian Mixture Model for Phase Isolation of Batch Processes. Multiway Gaussian mixture model has been proven effective in isolating multiple phases of batch operation for process monitoring and fault detection.28 Consider a three-way input data matrix X(B)(I(B) × J(B) × L(B)) with I(B), J(B), and I(B) representing the numbers of batches, process variables, and sampling instants, respectively. First, it is converted into a two-dimensional matrix X̅ (B)(I(B) × J(B)L(B)) via batchwise unfolding. Then each column vector of X̂ (B) can be mean-centered and the formed matrix X̂ (B)(I(B)L(B) × J(B)) is further rearranged to X̃ (B) through variablewise unfolding as illustrated in Figure 1. The multiway output data matrix of quality variables can be unfolded in the same fashion.

dynamics change. Though some effort has been attempted to design model updating strategy in order to handle state shifts, the different operating phases or conditions in batch processes are not specifically identified for adaptive model development and selection.20,24,25 Moreover, the batch or semibatch processes are typically of strong inherent nonlinearity even within each individual operating phase. In this paper, a novel multiway Gaussian mixture model based adaptive kernel partial least-squares (MGMM-AKPLS) method is proposed for soft sensor development and online quality variable prediction of nonlinear and dynamic batch or semibatch processes with multiple phases. The multiway process measurement data matrix is first preprocessed and converted into a two-dimensional data matrix through the hybrid batch- and variablewise unfoldings. Then the Gaussian mixture model is adopted to identify the multiple phases throughout batch operation. Within each phase, the subset of measurement data are further projected onto highdimensional kernel feature space so that a localized KPLS regression model is built between the process variables and quality variables. Further, the Bayesian inference base posterior probabilities for each test sample with respect to different phases are estimated and the localized KPLS model corresponding to the batch phase with the maximal posterior probability is adaptively chosen for online quality variable prediction. The rest of the paper is organized as follows. Partial leastsquares regression and multiway Gaussian mixture models are briefly described in section 2. Then section 3 presents the MGMM based adaptive kernel PLS method for soft sensor modeling and prediction in multiphase batch processes. The proposed approach is applied to the fedbatch penicillin fermentation process, and the results are compared with those of the conventional multiway kernel PLS model in section 4. Finally, the concluding remarks are drawn in section 5.

2. PRELIMINARIES 2.1. PLS Regression Model. Consider an input and an output data matrices X(I × JX) and Y(I × JY), where I represents the number of observations, JX and JY denote the numbers of input and output variables. The input and output data can be projected onto the S-dimensional latent variable subspace as follows X = TXPX T + EX

(1)

and Y = TY PY T + EY

Figure 1. Illustration of three-way data matrix unfoldings.

(2)

For the unfolded data matrix X̃ (B), each row vector x̃(B) j. can be assumed to follow a Gaussian mixture distribution as shown below

where TX(I × S) and TY(I × S) are the score matrices, PX(JX × S) and PY(JY × S) represents the loading matrices, and EX(I × JX) and EY(I × JY) are the residual matrices in the input and output spaces. The objective in PLS model is to maximize the correlation between the score vectors of the input and output data and the PLS based regression model is expressed as Y = X Γ + Γ0

C

p(xj(̃ .B)|θ ) =

i=1

(5)

where ωi is the prior probability of the ith Gaussian component and the corresponding distribution parameter θi includes the mean vector μi and covariance matrix Σi. The above Gaussian mixture model can be estimated from the modified expectationmaximization (E-M) algorithm,29 which involves the following two-step iterations

(3)

with Γ = W (PX TW )−1PY TY

∑ ωip(xj(̃ .B)|θi)

(4) 13228

dx.doi.org/10.1021/ie3020186 | Ind. Eng. Chem. Res. 2012, 51, 13227−13237

Industrial & Engineering Chemistry Research

Article

Figure 2. Schematic diagram of the proposed MGMM-AKPLS approach.

Table 1. Input and Output Variables of Soft Sensors in the Fed-Batch Penicillin Fermentation Process

Figure 3. Flow sheet of fed-batch penicillin fermentation process. 13229

no.

input variable

no.

output variable

1 2 3 4 5 6 7 8 9 10

substrate feed rate agitator power aeration rate substrate feed temperature fermenter temperature pH dissolved oxygen CO2 concentration culture volume generated heat

1 2 3

penicillin concentration substrate concentration biomass concentration

dx.doi.org/10.1021/ie3020186 | Ind. Eng. Chem. Res. 2012, 51, 13227−13237

Industrial & Engineering Chemistry Research

Article

• E-step:

Table 2. Initial Conditions and Set Points of Operational Parameters in the Fed-Batch Penicillin Fermentation Process

P(s)(θi|xj(̃ .B)) =

Initial Condition substrate concentration dissolved oxygen concentration biomass concentration penicillin concentration culture volume carbon dioxide concentration pH fermenter temperature generated heat Set Point

8−9 (L/h) 28.5−31.5 (W) 0.035−0.048 (L/h)

substrate feed temperature

294−298 (K)

fermenter temperature

296−300 (K)

pH

4.8−5.2

L(B)

∑l = 1 Pl(s)p(xj(̃ .B)|μl(s) , Σ(l s))

(6)

and • M-step:

13−17 (g/L) 1.05−1.25 (g/L) 0.05−0.15 (g/L) 0 (g/L) 99−104 (L) 0.5−1.0 (g/L) 4.5−5.5 293−303 (K) 0 (kcal)

aeration rate agitator power substrate feed flow rate

Pi(s)p(xj(̃ .B)|μi(s) , Σ(i s))

I (B)L(B)

μi(s + 1)

=

∑ j = 1 P(s)(θi|xj(̃ .B))xj(̃ .B) I (B)L(B)

∑ j = 1 P(s)(θi|xj(̃ .B))

(7)

I (B)L(B)

Σ(i s + 1)

=

∑ j = 1 P(s)(θi|xj(̃ .B))(xj(̃ .B) − μi(s + 1))(xj(̃ .B) − μi(s + 1))T I (B)L(B)

∑ j = 1 P(s)(θi|xj(̃ .B)) (8)

ωi(s + 1)

(B) (B)

{

V 2

}

∑i = 1 max 0, (∑ j = 1 P(s)(θi|xj(̃ .B))) −

V 2

I K

max 0, (∑ j = 1 = C

{

P(s)(θi|xj(̃ .B))) −

(B) (B)

I L

}

(9)

Figure 4. Trend plots of process variables in fed-batch penicillin fermentation process. 13230

dx.doi.org/10.1021/ie3020186 | Ind. Eng. Chem. Res. 2012, 51, 13227−13237

Industrial & Engineering Chemistry Research

Article

Figure 5. Phase identification of fed-batch penicillin fermentation process.

where s is the serial number of E-M iterations, V represents the total number of scalar parameters specifying each Gaussian component, and P(s)(θi|x̃(B) j. ) denotes the posterior probability of the jth training sampling within the ith Gaussian component at the sth iteration. After the different Gaussian components are identified from the above E-M procedure, the different batches at the same sampling point first need to be realigned to the same operation phase, and then, the outlier clusters should be merged into the most probable neighboring phase.24

Before nonlinear kernel projection, the mean centering is conducted on the above kernel matrix as follows K̅ = K − L I (B)L(B)K − KL I (B)L(B) + L I (B)L(B)KL I (B)L(B) (B) (B)

⎛ ∥x ̃ (B) − x(̃ B)∥2 ⎞ i. j. ⎟ K (xĩ (. B) , xj(̃ .B)) = exp⎜⎜ − 2 ⎟ 2 σ ⎝ ⎠

3. MGMM BASED ADAPTIVE KERNEL PLS METHOD The process and quality variables in batch processes are often characterized by a significantly nonlinear relationship. Therefore, the regular PLS model may not be effective to capture the local nonlinearity within different operating phases identified by MGMM. In this study, the nonlinear kernel PLS model is adopted for the local operating phases and the kernel function based mapping is employed to project the unfolded input matrix X̃ (B) from the original measurement space onto the highdimensional feature space for further PLS regression. Then, the Bayesian inference based posterior probabilities are estimated with respect to different phases and used to adaptively select the corresponding KPLS model from the multiple localized models for online quality variable prediction. Let ϕ:RI(B)L(B) → F be a nonlinear mapping function, where F represents the high-dimensional feature space. Thus the projected measurement sample of batch process can be expressed as ϕ(x̃(B) j. ). Due to the curse of dimensionality, it is not feasible to calculate the nonlinear mapping of each unfolded sample from batch processes and then conduct the PLS regression in the feature space. To tackle this issue, a nonlinear kernel function K can be defined as the inner product of two mapped samples K ij = K (xĩ (. B) , xj(̃ .B)) = ⟨ϕ(xĩ (. B)), ϕ(xj(̃ .B))⟩

(11)

where LI(B)L(B) is a I L × I L matrix with each element being equal to 1/I(B)L(B). In this study, the following radial basis function (RBF) is selected as the kernel function (B) (B)

(12)

with σ representing the width of RBF kernel. For the linear regression model in the high-dimensional kernel feature space Y = ϕ(X )Γ̃ + Γ0̃

(13)

the corresponding regression coefficient matrix is expressed as −1 T Γ̃ = ϕTW (PX TKW ̅ ) PY Y

(14)

Thus the KPLS based prediction for test points Xt is given by30 Yt̂ = K̅ t W (PX TK̅ t W )−1PY TY = PY PY TY

(15)

where K̅ t is the centralized kernel matrix between the training and test points and PY satisfies PY = ϕϕT W (PX TK̅ t W )−1 = K̅ t W (PX TK̅ t W )−1

(16)

After the Gaussian mixture model is estimated from the Bayesian inference based E-M algorithm with different operating phases being identified as {P1, P2, ..., PC}, the unfolded input and output data matrices from training set can be split into C blocks corresponding to the various phases as follows T

T

T

X̃ (B) = [X̃ (1) X̃ (2) ...X̃ (C) ]T

(10)

(17)

and 13231

dx.doi.org/10.1021/ie3020186 | Ind. Eng. Chem. Res. 2012, 51, 13227−13237

Industrial & Engineering Chemistry Research

Article

Figure 6. Trend plots of biomass concentration predictions using KPLS and MGMM-AKPLS methods.



(B)

= [Y ̃

(1)T (2)T



...Y ̃

(C)T T

]

c(xs̃ ) = arg max P(θj|xs̃ )

(18)

j

With each pair of input and output block matrices {X̃ (i) ,Ỹ Y(i) }, the corresponding ith KPLS model can be built as T



(i)

= ϕ(X̃ (i))Γ̃

(i)

(i) + Γ0̃

For this test sample, the c(x̃s)th KPLS model is automatically retrieved to estimate the quality variable as follows Ys̃ = K̅ t(c(xs̃ ))W (c(xs̃ ))(PX(i)TK̅ t(c(xs̃ ))W (c(xs̃ )))−1PY(c(xs̃ ))TY (c(xs̃ ))

(19)

(24)

and the ith operating phase based local prediction is estimated as follows (i) Yt̂

= K̅ t(i)W (i)(PX(i)TK̅ t(i)W (i))−1PX(i)TY (i)

It should be noted that the presence of disturbance might cause the maximum posterior probability value to be very small and subsequently the phase identification to be biased. The preventive strategy is to ignore the outlier sample if its corresponding maximum posterior probability is less than the statistical significance level. The schematic diagram of the proposed MGMM-AKPLS approach is shown in Figure 2, and the detailed step-by-step procedure is listed below (i) Collect the input and output data of the batch process. (ii) Perfrom batch synchronization using the dynamic time warping strategy. (iii) Unfold and scale the three-dimensional input data matrix. (iv) Estimate the finite Gaussian mixture model using unfolded input data to identify the total C operating phases. (v) Unfold the three-dimensional output data matrix.

(20)

For the C identified operating phases, the corresponding localized KPLS models as expressed as {KPLS1, KPLS2 , ..., KPLSC }

(21)

For the sth measurement sample x̃s of a test batch, its posterior probabilities with respect to different operating phases can be computed as follows P(θj|xs̃ ) =

Pp j (xs̃ |μj , Σj) C

∑l = 1 Pp l (xs̃ |μl , Σl )

(23)

T

(22)

Then the measurement sample can be classified into the c(x̃s)th Gaussian component as 13232

dx.doi.org/10.1021/ie3020186 | Ind. Eng. Chem. Res. 2012, 51, 13227−13237

Industrial & Engineering Chemistry Research

Article

Figure 7. Trend plots of substrate concentration predictions using KPLS and MGMM-AKPLS methods.

(vi) Split the input and output matrices into the identified C segments. (vii) With all pairs of input and output block matrices, the C different localized KPLS models are built. (viii) For a new batch process measurement sample, compute its posterior probability with respect to different operating phases and determine which phase it belongs to. (ix) Estimate the quality attribute value of each test sample using the localized KPLS model corresponding to the identified phase from step vii.

fermentation process has the duration of 400 h while the initial batch operation lasts only about 40 h. The process flow sheet is shown in Figure 3. The bioreactor includes two cascade controllers to maintain the fermenter pH and temperature by manipulating the acid/base and cold/hot water flow ratios, respectively. Meanwhile, the substrate and air are fed into the fermenter to provide the required glucose and oxygen for cell growth and penicillin formation, though both parameters are operated under open-loop conditions. This cell culture process has strong nonlinear dynamics and is of multiple operating phases and system uncertainty. The process dynamics and variable relationships across different phases are subject to significant changes. In this study, the quality related objective variables for online prediction are biomass concentration, substrate concentration, and penicillin concentration. Meanwhile, the selected process input variables include substrate feed rate, agitation power, aeration rate, substrate feed temperature, fermenter temperature, pH, dissolved oxygen concentration, CO2 concentration, culture volume, and generated heat. All the input and output variables are listed in Table 1, and the process operating conditions including the initial and operational parameter settings are summarized in Table 2. A total of 30 training batches are collected for soft sensor model learning while the additional 10 test batches are obtained

4. APPLICATION EXAMPLE 4.1. Fed-Batch Penicillin Fermentation Process. In this section, the MGMM based adaptive kernel PLS method is compared to the regular kernel PLS model using a simulated fedbatch penicillin fermentation process,31,32 and the soft sensor based online prediction accuracy of different quality variables is evaluated. The batch/fed-batch fermentation process is employed to produce antibiotic that is the secondary metabolite of microbial cell culture. As the formation of the product penicillin is not associated with cell growth, the bioreactor operation is initiated with a short period of batch mode in order to grow the microorganism and then followed by penicillin production in fed-batch operating mode. The entire penicillin 13233

dx.doi.org/10.1021/ie3020186 | Ind. Eng. Chem. Res. 2012, 51, 13227−13237

Industrial & Engineering Chemistry Research

Article

Figure 8. Trend plots of penicillin concentration predictions using KPLS and MGMM-AKPLS methods.

jth quality variable during the ith test batch and across all different test batches, respectively. 4.2. Comparison of soft Sensor Based Online Quality Prediction Results. With the multiway Gaussian mixture model conducted on the training batches, the entire penicillin fermentation process is identified with three distinct phases that correspond to the lag phase, exponential phase, and stationary phase throughout the microbial culture cycle (see Figure 5). Then, three localized kernel PLS models are built in the highdimensional nonlinear feature space using the input and output process data from different segments. Further, the online quality variable prediction is adaptively estimated from the localized KPLS models after each measurement sample of test batches is categorized into an individual operating phase through Bayesian inference based posterior probability estimation. The trend plots of quality variable prediction of one test batch using the single KPLS model and MGMM-AKPLS method are shown in Figures 6, 7, and 8, respectively. Furthermore, the prediction errors of different quality variables are depicted in Figure 9. As seen in Figure 6, the single KPLS model leads to fairly poor prediction on biomass concentration. There are significant deviations between the actual and predicted values across all three different operating phases. During the lag and stationary phases, a vast majority of

to assess the model accuracy and reliability. The sampling period of the fed-batch penicillin fermentation process is 0.5 h, and the trend plots of the input variables are shown in Figure 4. Each of the training batches include pseudo-random binary signals (PRBS) to ensure the adequate signal excitation for soft sensor modeling. The following root-mean-square square error (RMSE) indices are used to evaluate the performance of online quality prediction L(t )

RMSE(ji)

=

∑l = 1 (yijl(̂ t ) − yijl(t ))2 L(t )

(25)

and I (t )

RMSEj =

L(t )

∑i = 1 ∑l = 1 (yijl(̂ t ) − yijl(t ))2 I (t )L(t )

(26)

where I(t) and L(t) denote the number of batches and number of (t) sampling instants in the test set, and ŷ(t) ijl and ŷijl are the actual and predicted measurements of the jth output variable for the ith batch and lth sampling instant, respectively. The indices RMSE(i) j and RMSEj represents the average root-mean-square error of the 13234

dx.doi.org/10.1021/ie3020186 | Ind. Eng. Chem. Res. 2012, 51, 13227−13237

Industrial & Engineering Chemistry Research

Article

Figure 9. Prediction errors of (a) biomass concentration, (b) substrate concentration, and (c) penicillin concentration using KPLS and MGMM-AKPLS methods.

operation stages, which cause the online prediction to substantially deviate from the actual measurement. Likewise, the prediction on the other two quality variables, substrate and penicillin concentrations, by the single KPLS model also shows unsatisfactory accuracy. As observed in Figure 7, the predicted values of substrate concentration significantly drift away from the actual measurements, especially within the exponential and stationary phases. In Figure 8, one can readily see that the predictions of penicillin concentration are below the actual values in the exponential phase while above the real measurements during the stationary phase. The pattern changes across different phases in the cell culture process reveal that the averaged process model is unable to characterize the shifting dynamics within various phases. As opposed to the poor performance of single KPLS model, the presented MGMM-AKPLS approach can achieve high prediction accuracy on all three quality variables. The online predictions of those quality attributes match well the actual measurements across different phases, which demonstrates the superiority of the multiple localized KPLS models to adaptively capture the switching relationships over different operating phases. The MGMM method can result in a precise phase identification and then the Bayesian inference based adaptive

Table 3. Comparison of RMSE Values of Quality Variable Predictions Using Single KPLS and MGMM-AKPLS Methods biomass concentration

substrate concentration

penicillin concentration

batch number

KPLS

MGMMAKPLS

KPLS

MGMMAKPLS

KPLS

MGMMAKPLS

1 2 3 4 5 6 7 8 9 10 average

0.946 0.735 1.129 0.764 0.835 1.027 0.759 0.903 0.786 1.086 0.897

0.103 0.086 0.117 0.077 0.083 0.102 0.079 0.096 0.101 0.076 0.092

0.693 0.502 0.724 0.439 0.406 0.698 0.433 0.725 0.402 0.388 0.541

0.078 0.082 0.061 0.058 0.065 0.079 0.062 0.080 0.083 0.082 0.073

0.214 0.175 0.226 0.163 0.218 0.206 0.167 0.202 0.182 0.197 0.195

0.047 0.031 0.049 0.034 0.042 0.031 0.027 0.043 0.048 0.028 0.038

predicted values are above the actual values. In the exponential phase, however, most of the predictions are below the actual measurements. The inferior accuracy and reliability of single KPLS model is mainly due to the underlying relationship changes between process and quality variables among three different 13235

dx.doi.org/10.1021/ie3020186 | Ind. Eng. Chem. Res. 2012, 51, 13227−13237

Industrial & Engineering Chemistry Research



KPLS models are able to characterize the changing nonlinear dynamic relationships between input and output variables across multiple phases. Therefore, the MGMM-AKPLS based quality predictions coincide well with the actual values of biomass, substrate, and penicillin concentrations with very minimal deviation errors. The RMSE values of single KPLS and MGMM-AKPLS based quality predictions are compared in Table 3. It is obvious that the MGMM-AKPLS method is superior to the single KPLS model in terms of much lower RMSE values over all ten different test batches. Moreover, one can observe that the MGMM-AKPLS method leads to smaller variation on the RMSE values across different batches. Such behavior indicates that the presented approach has more reliable and consistent performance on quality variable prediction. This application example shows that the inherent multiplicity of operating phases in batch bioprocesses makes the single-model approach ill-suited. In comparison, the proposed adaptive multimodel strategy integrates different localized models corresponding to various operation phases and provides more reliable inferential prediction on key quality variables in the batch processes.

REFERENCES

(1) Lennox, B.; Montague, G.; Hiden, H.; Kornfeld, G.; Goulding, P. Process monitoring of an industrial fed-batch fermentation. Biotechnol. Bioeng. 2001, 74, 125−135. (2) Ü ndey, C.; Ertunç, S.; Ç inar, A. Online batch/fed-batch process performance monitoring, quality prediction, and variable-contribution analysis for diagnosis. Ind. Eng. Chem. Res. 2003, 42, 4645−4658. (3) Yu, J. Nonlinear Bioprocess Monitoring based Multiway Kernel Localized Fisher Discriminant Analysis. Ind. Eng. Chem. Res. 2011, 50, 3390−3402. (4) Fortuna, L.; Graziani, S.; Rizzo, A.; Xibilia, M. Soft Sensors for Monitoring and Control of Industrial Processes; Springer: London, United Kingdom, 2007. (5) Kano, M.; Nakagawa, Y. Data-based process monitoring, process control, and quality improvement: Recent developments and applications in steel industry. Comput. Chem. Eng. 2008, 32, 12−24. (6) Kadlec, P.; Gabrys, B.; Strandt, S. Data driven soft sensor in the process industry. Comput. Chem. Eng. 2009, 33, 795−814. (7) Doyle, F. Nonlinear inferential control for process applications. J. Proc. Cont. 1998, 8, 339−353. (8) Desai, K.; Badhe, Y.; Tambe, S.; Kulkarni, B. Soft-sensor development for fed-batch bioreactors using support vector regression. Biochem. Eng. J. 2006, 27, 225−239. (9) Chiang, L.; Russell, E.; Braatz, R. Fault Detection and Diagnosis in Industrial Systems; Advanced Textbooks in Control and Signal Processing; Springer-Verlag: London, Great Britain, 2001. (10) Kosanovich, K.; Dahl, K.; Piovoso, M. Improved process understanding using multiway principal component analysis. Ind. Eng. Chem. Res. 1996, 35, 138−146. (11) Piovoso, M.; Hoo, K. Multivariate statistics for process control. IEEE Cont. Syst. Mag. 2002, 22, 8−9. (12) Zamprogna, E.; Barolo, M.; Seborg, D. Optimal selection of soft sensor inputs for batch distillation columns using principal component analysis. J. Proc. Cont. 2005, 15, 39−52. (13) Chiang, L.; Leardi, R.; Pell, R.; Seasholtz, M. Industrial experiences with multivariate statistical analysis of batch process data. Chemom. Intell. Lab. Syst. 2006, 81, 109−119. (14) Qin, S. J. Recursive PLS algorithms for adaptive data modeling. Comput. Chem. Eng. 1998, 22, 503−514. (15) Wang, X.; Kruger, U.; Lennox, B. Recursive partial least squares algorithms for monitoring complex industrial processess. Control Eng. Practice 2003, 11, 613−632. (16) Zhang, H.; Lennox, B. Integrated condition monitoring and control of fed-batch fermentation processes. J. Proc. Cont. 2004, 14, 41− 50. (17) Facco, P.; Doplicher, F.; Bezzo, F.; Barolo, M. Moving average PLS soft sensor for online product quality estimation in an industrial batch polymerization process. J. Proc. Cont. 2009, 19, 520−529. (18) Chiang, L.; Kotanchek, M.; Kordon, A. Fault diagnosis based on Fisher discriminant analysis and support vector machines. Comput. Chem. Eng. 2004, 28, 1389−1401. (19) Yu, J. Localized Fisher Discriminant Analysis Based Complex Process Monitoring. AIChE J. 2011, 57, 1817−1828. (20) Kaneko, H.; Arakawa, M.; Funatsu, K. Development of a new soft sensor method using independent component analysis and partial least squares. AIChE J. 2009, 55, 87−98. (21) Zhang, J.; Morris, A.; Martin, E.; Kiparissides, C. Prediction of polymer quality in batch polymerisation reactors using robust neural networks. Chem. Eng. J. 1998, 69, 135−143. (22) Jain, R.; Rahman, R.; Kulkarni, B. Development of a Soft Sensor for a Batch Distillation Column Using Support Vector Regression Techniques. Chem. Eng. Res. Des. 2007, 85, 283−287. (23) Zhang, Y. Enhanced statistical analysis of nonlinear processes using KPCA, KICA and SVM. Chem. Eng. Sci. 2009, 64, 801−811. (24) Yu, J. A Bayesian inference based two-stage support vector regression framework for soft sensor development in batch bioprocesses. Comput. Chem. Eng. 2012, 41, 134−144.

5. CONCLUSIONS In this article, an adaptive soft sensor modeling approach is developed for nonlinear multiphase batch or semibatch processes in order to accurately predict the key quality variables. The presented MGMM-AKPLS approach has the attractive merits of automatically identifying multiple phases in batch process as well as adaptively selecting localized kernel PLS models through Bayesian inference strategy to characterize the shifting dynamics across different phases. Consequently, the product quality variables can be accurately predicted with adaptive kernel PLS models in real-time fashion. The application to the fed-batch penicillin fermentation process has demonstrated that the new MGMM-AKPLS approach is of much higher prediction accuracy and reliability than the conventional KPLS modeling method. The presented soft sensor technique can be implemented online in batch or semibatch processes to provide accurate measurements of essential quality variables continuously. The superior capability of the proposed modeling method to capture shifting dynamics throughout multiple operating phases in batch processes makes this technique an excellent candidate for soft sensor development and robust quality prediction of a wide range of batch or semibatch processes with nonlinear dynamics. The predictive model based continuous measurements of quality variables can further enable the advanced control and real-time optimization of multiphase batch or semibatch processes. Future research can be focused on designing the multimodel based predictive control and optimization of multiphase batch chemical and biological processes.



Article

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The author appreciates the valuable comments and suggestions of the anonymous reviewers. 13236

dx.doi.org/10.1021/ie3020186 | Ind. Eng. Chem. Res. 2012, 51, 13227−13237

Industrial & Engineering Chemistry Research

Article

(25) Fujiwara, K.; Kano, M.; Hasebe, S.; Takinami, A. Soft-sensor development using correlation-based just-in-time modeling. AIChE J. 2009, 55, 1754−1765. (26) Geladi, P.; Kowalski, B. R. Partial Least-Squares Regresion: A Tutorial. Anal. Chim. Acta 1986, 185, 1−17. (27) Eriksson, L.; Johansson, E.; Kettaneh-Wold, N.; Trygg, J.; Wikstrom, C.; Wold, S. Multi- and Megavariate Data Analysis Basic Principles and Applications; Umetrics: Malmo, Sweden, 2006. (28) Yu, J.; Qin, S. J. Multiway Gaussian mixture model based multiphase batch process monitoring. Ind. Eng. Chem. Res. 2009, 48, 8585−8594. (29) Figueiredo, M. A. F.; Jain, A. K. Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Machine Intell. 2002, 24, 381−396. (30) Rosipal, R.; Trejo, L. Kernel partial least squares regression in reproducing kernel hilbert space. J. Machine Learning Res. 2001, 2, 97− 123. (31) Birol, I.; Ü ndey, C.; Birol, G.; Tatara, E.; Ç inar, A. A web-based simulator for penicillin fermentation. Int. J. Eng. Sim. 2001, 2, 24−30. (32) Birol, G.; Ü ndey, C.; Ç inar, A. A modular simulation package for fed-batch fermentation: penicillin production. Comput. Chem. Eng. 2002, 26, 1553−1565.

13237

dx.doi.org/10.1021/ie3020186 | Ind. Eng. Chem. Res. 2012, 51, 13227−13237