Research Note pubs.acs.org/IECR
Information-Transfer PLS Model for Quality Prediction in Transition Periods of Batch Processes Zhiqiang Ge* and Zhihuan Song State Key Laboratory of Industrial Control Technology, Institute of Industrial Process Control, Department of Control Science and Engineering, Zhejiang University, Hangzhou, People’s Republic of China
Furong Gao Department of Chemical and Biomolecular Engineering, The Hong Kong University of Science and Technology, Hong Kong
Peiliang Wang School of Information and Engineering, Huzhou Teachers College, Huzhou, Zhejiang, People’s Republic of China ABSTRACT: Compared to the steady phase, quality prediction in the transition period of batch process has been explored as a difficult task in recent years. Unlike the data behavior in the steady phase, there may be significant correlations among the data samples in each transition period. Without considering the relationships among different time slices (time pieces in the batch dataset), the performance of the quality prediction model may be degraded. In the present work, an information transfer PLS model is particularly proposed for quality prediction in transition periods of the batch process. By transferring the main data information of one time slice to the next one, different time slices in the transition period are connected. As a result, most available data information in the previous time slices can be efficiently used for quality modeling of a specific time slice in the transition period. For performance evaluation of the developed method, a case study of an industrial injection molding process is provided.
1. INTRODUCTION Batch and semibatch processes have played an important role in modern industries, such as chemical and biochemical sectors, food sectors, etc. For batch process modeling, monitoring, and quality prediction, data-driven multivariate analysis methods such as multiway principal component analysis (PCA) and partial least-squares (PLS) have received much attention in recent years.1−5 While successful applications of multiway PCA and multiway PLS were reported from both industry and academia, it has been explored that they cannot efficiently reveal the multiphase characteristic that commonly exists in batch processes. For example, a typical fermentation process contains a preculture phase and a production phase; a plastication batch process may consist of injection, packingholding, plastication, and cooling phases. When a batch process has several phases, each phase may have its own underlying mechanism. In order to improve the modeling efficiency, the multiphase batch process is desired to be divided into different phases. So far, various approaches have been developed for multiphase batch process modeling, which can be summarized into two categories, namely, multiblock methods and phase-separated methods.6 While the multiblock method interprets the batch process through a single model with data grouped in several blocks,7−9 the main idea of the phase-separated methods is to build a separated model for each phase.10−14 Particularly, a phase-based sub-PLS modeling approach has been proposed for quality prediction of multiphase batch processes.15 The first © 2013 American Chemical Society
step of sub-PLS is to build a PLS model for each time slice in the phase; based on this, a representative PLS model was then constructed by averaging the regression matrices of various time-slice PLS models. Here, the term “time slice” means the subdata set that has been collected for all process variables in a specific time during the batch. Therefore, only those representative PLS models are employed in the online quality prediction stage. Compared to the traditional multiway PLS method, more satisfactory performance has been obtained by the sub-PLS model. So far, however, most multiphase modeling methods have not considered the transition behavior in the batch process well. In practice, batch processes may have some significant transition behaviors among batch duration, e.g., the starting period of a batch, the ending period of a batch, and transition periods among two adjacent phases. In those time periods, the data characteristic may change frequently, which may have great impacts in the final product quality of the batch process. Therefore, it is desired that the data information in those transition periods should be modeled carefully, and prediction of the final product quality should also be made in those special time periods. Received: Revised: Accepted: Published: 5507
November 27, 2012 February 10, 2013 March 19, 2013 March 19, 2013 dx.doi.org/10.1021/ie303267u | Ind. Eng. Chem. Res. 2013, 52, 5507−5511
Industrial & Engineering Chemistry Research
Research Note
Figure 1. Data unfolding and phase division of the batch process.
Y = UQT + F
The objective of this paper is to deal with the quality prediction problem in the transition period of the batch process. Unlike the data behavior in the steady phase, there may be significant correlations among the data samples in each transition period. In other words, the data information in the previous time slice may influence the quality modeling performance in the time slice afterward. In this situation, the performance of the representative sub-PLS model may be degraded. In the present paper, a new modeling approach, called information transfer PLS (ITPLS), is proposed for quality modeling in transition periods of the batch process. By transferring the main data information of one time slice to the next one, different time slices in the transition period are connected with each other. Therefore, the relationships among various time slices are constructed, based on which most available data information can be efficiently used for quality modeling of a specific time slice in the transition period. The remainder of this paper is organized as follows. In section 2, the PLS algorithm is briefly introduced. Detailed description of the proposed ITPLS model is provided in section 3, including the data unfolding and scaling step, development of the ITPLS model, and the online quality prediction algorithm in transition periods. In section 4, an industrial application study of the injection molding process is demonstrated. Finally, conclusions are made.
where W is the weight matrix of the PLS model, and E and F are residuals matrices of the input and output variables, respectively. For a new data sample xnew, the predicted output variables can be calculated as ynew ̂ = [W(PTW)−1QT]T x new
X = TPT + E −1
T = XW(P W)
(3)
3. QUALITY PREDICTION IN TRANSITION PERIODS BASED ON THE ITPLS MODEL In this section, the main idea of the ITPLS model is demonstrated. First, the three-dimensional (3D) batch process dataset is unfolded into two dimensions, and the mean trajectory is removed for data scaling. Second, the ITPLS models are constructed in each of the transition periods. Based on the developed model, an online quality prediction algorithm is then developed. 3.1. Data Unfolding and Scaling. Typically, the batch process data matrix X is collected in a three-way matrix (I × J × K), where I represents batch number, J is the number of variables, and K is the sampling intervals during the batch. For multivariate statistical modeling, the three-way dataset is always unfolded into a two-dimensional (2D) dataset. Although there are six different 2D unfolding strategies, the one through the batch direction has been widely used, which is illustrated in Figure 1. Therefore, the data array is unfolded with each time slice {Xk(I × J)}k=1,2,...,K, placed side by side. Particularly, the transition periods are highlighted in Figure 1. Then, the data scaling is carried out before the multivariate statistical modeling step; thus, the mean and deviation values of the process variables along each time slice are removed, given as follows:
2. PARTIAL LEAST SQUARES (PLS) Given the input and output data sets as {X,Y}. The main idea of PLS is to decompose X and Y into scores matrix T, U, and loading matrices P and Q; the linear relationships can be given as shown in eq 1.
T
(2)
X k (I × J ) =
(1) 5508
[X k − mean(X k)] σ (X k )
(4)
dx.doi.org/10.1021/ie303267u | Ind. Eng. Chem. Res. 2013, 52, 5507−5511
Industrial & Engineering Chemistry Research
Research Note
where k = 1, 2, ..., K, mean(·) is the mean value of the data matrix, and σ(·) is a calculator for the standard variance value of the data matrix. 3.2. ITPLS Modeling in Transition Periods. After the batch process dataset has been arranged into 2D time slices, the ITPLS models can be developed in each phase. In the present paper, however, we only focus on the transition periods. For a specific transition period in the batch process, the main idea of the ITPLS model is demonstrated as follows. First, a PLS model is constructed for the initial time slice dataset. Second, the main data information of the time slice, which is represented by the latent variables, is transferred into the next time slice. Third, the main data information transferred from the previous time slice is combined together with the current time slice, based on which a new PLS model is built. Repeating the information transfer and PLS modeling steps until the end of the transition period, a series of information transfer manner PLS models can be constructed. An illustration of the ITPLS modeling process is given in Figure 2.
T X1(I × J ) = TP 1 1 + E1
T1(I × r1) = X1W1(P1TW1)−1 Y = U1Q 1T + F1 R1 = W1(P1TW1)−1Q 1T
(5)
where T1 is the latent variable matrix of the first PLS model, r1 the number of latent variables, and R1 the regression matrix of the PLS model. For the second time slice, the main data information of the first time slice T1 is transferred and combined with the data information of the second time slice. The new data information is given as X *2 [I × (r1 + J )] = [ T1(I × r1) X 2(I × J )]
(6)
Then, the second PLS model is construed as follows: X *2 = T*2 P*2 T + E*2 T*2 = X *2 W *2 (P*2 TW *2 )−1 Y = U*2 Q *2 T + F*2 R *2 = W *2 (P*2 TW *2 )−1Q *2 T
(7)
Generally, the PLS model for the kth time slice dataset can be developed as follows: X*k [I × (rk − 1 + J )] = [ Tk − 1(I × rk − 1) X k(I × J )]
(8)
X*k = T*k P*k T + E*k T*k = X*k W *k (P*k TW *k )−1 Y = U*k Q *k T + F*k R*k = W *k (P*k TW *k )−1Q *k T
where k = 2, 3, ..., Tr. 3.3. Online Quality Prediction and Performance Evaluation. Based on the developed ITPLS models, an online quality prediction algorithm can be formulated for the new batch in each transition period. In a specific time interval during the new batch, if one represents the data vector as xnew kc , the online prediction of the final quality variables can be made as follows:
Figure 2. Information transfer PLS modeling.
Suppose the time slice datasets of a transition period are represented as {Xk(I × J),Y}k=1,2,...,Tr, where Tr is the duration time of the transition period and Y is the dataset of final quality variables. For the initial time slice dataset X1(I × J), the first PLS model can be developed as follows:
ynew ̂
⎧ R Tx new kc = 1 ⎪ 1 kc ⎪ * T T new * −T ⎪ tk − 1 = (P*kc − 1 W *kc − 1) W *kc − 1 x kc − 1 =⎨ ⎪ x new * = [t* Tx newT]T 2 ≤ kc ≤ Tr k − 1 kc ⎪ kc ⎪ R*Tx new * ⎩ kc kc
For performance evaluation of the quality prediction in each transition period of the batch process, the root-mean-square error (RMSE) criterion can be used, which is defined as follows: ∑k = 1 || yk − ŷ ks ||2 Tr
(10)
where s is the number of transition periods in the batch process (s = 1, 2, ..., S), ŷks (for k = 1, 2, ..., Tr) the predicted quality values obtained in each transition period, and yk the corresponding measured values. Similarly, for each specific time interval during the transition period, the performance index of RMSE can be defined as follows:
T
RMSE(s) =
(9)
(11) 5509
dx.doi.org/10.1021/ie303267u | Ind. Eng. Chem. Res. 2013, 52, 5507−5511
Industrial & Engineering Chemistry Research
cross-validation. For comparison, the quality prediction results of the conventional representative PLS model are also provided. In both the starting period and the ending period, the representative PLS is constructed by averaging all single PLS models that have been developed for each time slice in the corresponding time period. Therefore, the regression matrix of the representative PLS model is actually the average value throughout the entire single PLS regression matrices in both the starting period and the ending period. More-detailed information of the representative PLS model can be found in the work by Lu and Gao.15 Figure 4 shows the RMSE results of both methods for the 50 testing batch during the two transition periods. It can be seen
2
n
RMSE(k) =
Research Note
∑i =te1 || y ik − ŷ ik || nte
(12)
where nte is the number of testing batches, ŷki the predicted quality values made in the kth time interval in the transition period, yki the corresponding measured quality values, and i the number index of the testing batches.
4. CASE STUDY OF AN INJECTION MOLDING PROCESS As a typical multiphase batch process, the injection molding process contains several different phases, such as injection, plastication, and cooling.15 Figure 3 shows a simplified
Figure 4. RMSE values of the two transition periods obtained by different methods.
that the ITPLS model performs much better than the representative PLS model in both of the starting period and the ending period of this process. Except for several sampling intervals in the initial stage of the transition periods, the RMSE values of the ITPLS model are smaller than those of the representative PLS model during the entire duration of the two transition periods. Particularly, detailed quality prediction results of two testing batches are demonstrated in Figures 5a and 5b, respectively. Compared to the prediction results of the representative PLS model, which are noted as circles in both figures, the weight prediction values of the ITPLS model are much closer to the actual measured values. The RMSE values of the two testing batches during each of the transition periods can also be calculated for the two methods, the results of which are tabulated in Table 2. In both transition periods, the RMSE values of the ITPLS model are much smaller than those of the representative PLS model, which are highlighted in bold font in Table 2. Generally, it means that the quality prediction performance in both transition periods has been improved by the ITPLS model. Therefore, based on the results of this process, it can be inferred that the relationship between different time slices does have an impact on the final product quality. By incorporating this relationship information, the prediction of the final product quality has been improved.
Figure 3. Simplified schematic flowchart of the injection molding machine.14
flowchart of a reciprocating-screw injection molding process. According to the recent phase division method, there are four transition periods in this process: the starting period and ending period of the batch process, and another two short transition periods between two steady phases. In the present paper, the starting and ending periods of this process are used for transition quality prediction study. In order to predict the weight of the final product which serves as the quality variable, some important process variables are measured online, such as temperatures, pressures, and the screw velocity. For quality modeling purpose, a total of 11 key variables have been selected, which are tabulated in Table 1. A Table 1. Variables Selected for Process Monitoring No.
variable
unit
No.
variable
unit
1 2 3 4 5 6
valve 1 opening valve 2 opening screw stroke screw velocity ejector stroke mold stroke
% % mm mm/s mm mm
7 8 9 10 11
mold velocity injection press temperature 3 temperature 2 temperature 1
mm/s bar °C °C °C
5. CONCLUSIONS In the present paper, an information-transfer form of the partial least squares (PLS) model has been developed for quality prediction in transition periods of the multiphase batch process. Compared to the steady phase, the correlations among different time slices in the transition period may be more significant, which will influence the performance of the quality prediction model. Based on the developed information-transfer PLS model, the main data information in the previous time slice can be transferred to the one afterward, thus different time
dataset that consists of 150 batches has been generated, among which 100 batches are used for model training and the remaining 50 batches are used for testing. The durations of the starting period and the ending period are 18 and 26 time intervals among the entire batch process. The optimal number of latent components in each PLS model is determined through 5510
dx.doi.org/10.1021/ie303267u | Ind. Eng. Chem. Res. 2013, 52, 5507−5511
Industrial & Engineering Chemistry Research
Research Note
(2) Ramaker, H. J.; van Sprang, E. N. M ; Westerhuis, J. A.; Smilde, A. K. Fault detection properties of global, local and time evolving models for batch process monitoring. J. Process Control 2005, 15, 799− 805. (3) van Sprang, E. N. M; Ramaker, H. J.; Westerhuis, J. A.; Guiden, S. P.; Smilde, A. K. Critical evaluation of approaches for on-line batch process monitoring. Chem. Eng. Sci. 2008, 57, 3979−3991. (4) Chen, T.; Zhang, J. On-line multivariate statistic monitoring of batch processes using Gaussian mixture model. Comput. Chem. Eng. 2010, 34, 500−507. (5) Ge, Z. Q.; Zhao, L. P.; Yao, Y.; Song, Z. H.; Gao, F. R. Utilizing transition information in online quality prediction of multiphase batch processes. J. Process Control 2012, 22, 599−611. (6) Yao, Y.; Gao, F. R. A survey on multistage/multiphase statistical modeling methods for batch processes. Ann. Rev. Control 2009, 33, 172−183. (7) Westerhuis, J. A.; Kourti, T.; MacGregor, J. F. Analysis of multiblock and hierarchical PCA and PLS models. J. Chemom. 1998, 12, 301−321. (8) Smilde, A. K.; Westerhuis, J. A.; de Jong, S. A framework for sequential multiblock component methods. J. Chemom. 2003, 17, 323−337. (9) Choi, S. W.; Lee, I. B. Multiblock PLS-based localized process diagnosis. J. Process Control 2005, 15, 295−306. (10) Muthuswamy, K.; Srinivasan, R. Phase-based supervisory control for fermentation process development. J. Process Control 2003, 13, 367−382. (11) Liu, J.; Wang, D. S. H. Fault detection and classification for a two-stage batch process. J. Chemom. 2008, 22, 385−398. (12) Camacho, J.; Pico, J.; Ferrer, A. Multi-phase analysis framework for handling batch process data. J. Chemom. 2008, 22, 632−643. (13) Zhao, C. H.; Gao, F. R.; Wang, F. L. Phase-phased joint modeling and spectroscopy analysis for batch process monitoring. Ind. Eng. Chem. Res. 2010, 49, 669−681. (14) Ge, Z. Q.; Song, Z. H.; Gao, F. R. Nonlinear quality prediction for multiphase batch processes. AIChE J. 2012, 58, 1778−1787. (15) Lu, N. Y.; Gao, F. R. Stage-based process analysis and quality prediction for batch processes. Ind. Eng. Chem. Res. 2005, 44, 3547− 3555. (16) Ge, Z. Q.; Song, Z. H. Online monitoring of nonlinear multiple mode processes based on adaptive local model approach. Control Eng. Pract. 2008, 16, 1427−1437. (17) Zhang, Y.; Chai, T.-Y. Y.; Li, Z.; Yang, C. Modelling and monitoring of dynamic processes. IEEE Trans. Neural Networks Learning Syst.. 2012, 23, 277−284. (18) Ge, Z. Q.; Yang, C. J.; Song, Z. H. Improved kernel PCA-based monitoring approach for nonlinear processes. Chem. Eng. Sci. 2009, 64, 2245−2255. (19) Zhang, Y. W.; Zhou, H.; Qin, S. J.; Chai, T. -Y. Y. Decentralized fault diagnosis of large-scale processes using multiblock kernel partial least squares. IEEE Trans. Ind. Informatics 2010, 6, 3−12.
Figure 5. Quality prediction result of the two methods: (a) first batch and (b) second batch.
Table 2. RMSE Values of the Two Testing Batches during the Two Transition Periods First Batch transition period
representative PLS
starting period ending period
0.0225 0.0212
Second Batch ITPLS
representative PLS
ITPLS
0.0087 0.0087
0.0222 0.0203
0.0114 0.0076
slices in the same transition period can be connected. Based on results of the application case study, the prediction performance of the final product quality has been improved by the new method. Although the idea of information transfer modeling approach has been implemented through the basic PLS model, it can be extended to more-complex processes, such as nonlinear and dynamic processes, large-scale processes, etc.16−19
■
AUTHOR INFORMATION
Corresponding Author
*Tel.:+86-87951442. E-mail:
[email protected]. Notes
The authors declare no competing financial interest.
■
ACKNOWLEDGMENTS This work was supported in part by the National Natural Science Foundation of China (NSFC) (No. 61004134), National Project 973 (No. 2012CB720500), the Natural Science Foundation of Zhejiang Province (No. LY12F03008), and the Fundamental Research Funds for the Central Universities.
■
REFERENCES
(1) Nomikos, P.; MacGregor, J. F. Multi-way partial least square in monitoring batch processes. Chem. Intell. Lab. Syst. 1995, 30, 97−108. 5511
dx.doi.org/10.1021/ie303267u | Ind. Eng. Chem. Res. 2013, 52, 5507−5511