Moving-Window GPR for Nonlinear Dynamic ... - ACS Publications

Apr 10, 2012 - with Dual Updating and Dual Preprocessing ... output variables by dual preprocessing could significantly improve the predictive capabil...
0 downloads 0 Views 5MB Size
Article pubs.acs.org/IECR

Moving-Window GPR for Nonlinear Dynamic System Modeling with Dual Updating and Dual Preprocessing Wangdong Ni,*,†,‡ Soon Keat Tan,§,∥ Wun Jern Ng,‡,∥ and Steven D. Brown⊥ †

DHI-NTU Centre, ‡Nanyang Environment and Water Research Institute, §NTU-MPA Maritime Research Centre, and ∥School of Civil and Environmental Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singapore ⊥ Department of Chemistry and Biochemistry, University of Delaware, Newark, Delaware 19716, United States ABSTRACT: The characteristics of nonlinearity and time-varying changes in most industrial processes usually cripple the predictive performance of conventional soft sensors. In this article, moving-window Gaussian process regression (MWGPR) is proposed to effectively capture the process dynamics and to model nonlinearity simultaneously. Applications of the proposed MWGPR method to the modeling of the activity of a catalyst and an industrial propylene polymerization process are presented. The results clearly demonstrate that the MWGPR method effectively tracks the process changes to generate satisfactory predictive performance. Two modeling strategies, namely, dual updating and dual preprocessing, are applied to MWGPR, in an attempt to more efficiently track the process dynamics. Dual updating takes into account both the time-varying variance of the process and the bias between the actual measurement and the model prediction. The improvement in performance is illustrated by a case study on modeling catalyst activity. Simultaneous removal of embedded noise in both process parameters and process output variables by dual preprocessing could significantly improve the predictive capability of MWGPR, as illustrated by the performance of a modeling study of an industrial propylene polymerization process.

1. INTRODUCTION Soft sensors such as those based on canonical variate analysis (CVA)1,2 and partial least-squares (PLS)3,4 regression have been established as useful for the online prediction of quality variables of different complex dynamic processes.5−7 These sensors have included many tools such as ones based on autoregression with exogenous inputs (ARX),8,9 recursive regression,10−12 and bias update,13,14 to account for time-varying process dynamics. These soft sensors are powerful tools for performing multivariate linear regression analysis and for dealing with the noise and high colinearity that commonly occurs in most industrial processes. In fact, these sensors hold the potential to overcome nonlinearities in applications in which nonlinearities are found to be the cause for unreliable performance or even failure of the modeling.15,16 Within the framework of PLS, many nonlinear techniques have been proposed to capture the true nonlinearity of industrial processes by introducing a nonlinear kernel into the PLS model or by performing local linearization of a nonlinear process. Qin12 and Li et al.15 proposed forms of recursive PLS, merging the quadratic kernel and input matrix augmented by the hidden nodes of a radial basis function (RBF) network to handle the nonlinearity. Dayal and MacGregor11 modeled a nonlinear process by using recursive, exponentially weighted PLS. The performance of these methods, however, depends on the selection of an appropriate nonlinear kernel. Local linearization of a nonlinear process is an alternative approach. Kadlec and Gabrys16 proposed a local learning-based adaptive soft sensor for modeling nonlinear catalyst activity by combining local learning and recursive PLS. Despite reports of successful predictive performance, this approach still suffers from issues with complexity and interpretability, because of the complicated localization and adaptation. © 2012 American Chemical Society

Other non-PLS-based soft sensors, such as those based on artificial neural networks (ANNs)17 and neuro-fuzzy systems,18 are also widely accepted as useful techniques for online prediction. However, these two methods are handicapped by the requirement of a large number of neurons to model the complex multidimensional system in the case of ANNs17 and the off-equilibrium problem encountered in neuro-fuzzy systems.18 Soft sensors based on Gaussian process (GP) models have been increasingly applied to different nonlinear dynamic systems,19−22 due in part to the significant advantage of providing predictive variance and the relatively smaller number of optimization parameters23,24 required. Gaussian process regression (GPR)25 has been increasingly viewed as an alternative approach to ANNs26 in applications such as spectroscopic calibration27,28 and nonlinear dynamic process modeling.29−31 All reported applications of GPR for nonlinear process modeling involve integration of an ARX structure with GPR. This article presents a new method, namely, moving-window Gaussian process regression (MWGPR), that can handle nonlinearity of the dynamic system through GPR modeling. The approach proposed in this work is different from that reported in our previous work,32 also based on a recursive GPR, which adapts to the process dynamics through a forgetting factor commonly used in recursive PLS for modeling linear12 or nonlinear15 dynamic processes. The recursive GPR method reported in this study adapts to the process dynamics by using a moving-window33,34 approach. The GPR model is updated by integration of the newly measured sample, to adapt it to the Received: Revised: Accepted: Published: 6416

August 24, 2011 April 5, 2012 April 10, 2012 April 10, 2012 dx.doi.org/10.1021/ie201898a | Ind. Eng. Chem. Res. 2012, 51, 6416−6428

Industrial & Engineering Chemistry Research

Article

Figure 1. Selection of parameters: (A) optimization of ARX for propylene polymerization data, (B) selection of weighting factor for bias update, (C) optimization of the widow size of the Savitzky−Golay filter.

2. MODELING OF DYNAMIC SYSTEMS

current trend, while discarding the oldest sample. A dualupdating strategy that includes updates of the mean and variance and a bias update is included in the proposed model to address problems of instability and potential loss of useful historical information after discarding the oldest sample.13 Because noise embedded in either the process parameters or output variables usually corrupts the model’s predictive performance, we propose a dual-preprocessing method based on a Savitzky−Golay derivative smoothing filter35 to remove the aforementioned noise. The predictive performance from MWGPR is illustrated by modeling an industrial propylene polymerization process. The remaining sections of the article are organized as follows: In section 2, a brief overview is given of GPR modeling, and moving-window GPR with dual-updating and dual-preprocessing modeling is introduced. In section 3, two application studies are employed to demonstrate the predictive performance of MWGPR model. Section 3.1 demonstrates that a dual-updating strategy can significantly improve the predictive performance of MWGPR. The improvement in predictive performance of MWGPR after implementation of dual preprocessing is emphasized in section 3.2. Finally, section 4 presents the conclusions.

2.1. Gaussian Process Regression (GPR) Model. We aim to establish a Gaussian process regression model based on a dynamic system {X, Y} for the estimation of the response variables given any new predictors through Bayesian inference. Considering a single target variable initially, namely, y = (y1, ..., yn)T, GPR can be expressed as the regression function y(x) having a Gaussian prior distribution with zero mean y = (y1 , ..., yn )T ∼ G(0, C)

(1)

where C is the n × n covariance matrix of the Gaussian process: Cij = cov(xi,xj). The following covariance function is commonly reported in the literature C(x i , x j) = cov[f (x i), f (x j)] ⎡ 1 = a0 + a1 x ix j + v0 exp⎢ − ⎢⎣ 2 + σe 2δij 6417

D



∑ w(xid − xjd)2 ⎥ d=1

⎥⎦ (2)

dx.doi.org/10.1021/ie201898a | Ind. Eng. Chem. Res. 2012, 51, 6416−6428

Industrial & Engineering Chemistry Research

Article

presented as x = [y(k − 1), ..., y(k − L), u(k − 1), ..., u(k − L)], with observed data consisting of the past L outputs and the past L control vectors. It is structured as an autoregressive model of Lth order with exogenous inputs.8,9 With reference to eq 2, a GPR model is capable of capturing the features of a dynamic system automatically. The number of orders included in the ARX model structure can be optimized using the Akaike information criterion (AIC)38 to achieve optimized dynamic modeling performance. For the propylene polymerization data in section 3.2 as an example, Li et al.15 used two back steps of process variables as the order of the past control vector; therefore, the past L process outputs are considered in this study for the ARX structure with past process output ranging from 0 to 10. After optimization based on the predictive performance depicted in Figure 1A, four past process outputs are added to the input vector x, which can be written as x = [y(k − 1), ... ,y(k − 4), u(k), u(k − 1), u(k − 2)], to form the ARX structure. Then, the GPR model can be applied to model nonlinear dynamic systems based on the specified ARX structure. Although this approach was found appropriate to the present study, readers might need to conduct their own evaluation in their applications, as this approach is not universal and might not be appropriate for other dynamic systems. 2.3. Moving-Window GPR Model for Dynamic Systems. Many researchers, such as Helland et al.,10 Mu et al.,13 and Ahmed et al.,14 have modeled many linear dynamic processes through application of a moving-window approach without increasing the size of the data matrix (historical data). In this study, moving-window GPR is proposed to model nonlinear dynamic systems by integrating the merits of these two methods and by accounting for the process dynamics. The historical data (training data set) is preset in the moving window as in other studies.10,13,14 The first step of MWGPR is to train the GRP model on the historical data, to generate the set of hyperparameters. When the online data (test data set) is obtained, the GPR model generated earlier is applied to make a prediction. After the process output variable of the new sample is measured, the window of historical data slides down to include the newly measured sample and to exclude the oldest sample. Then, the GPR model is retrained to re-estimate the hyperparameter for online prediction based on the current window of historical data, and another test sample is obtained. The process repeats until all online data (those in the test data set) are predicted. To perform a direct comparison with other works in the literature,15,16 this article considers only the “one-step-ahead” prediction feature of moving window methods, and multi-stepahead prediction performance is not included. Two application studies discussed in section 3 are employed to demonstrate the predictive performance of an MWGPR model for nonlinear dynamic process modeling. Section 3.1 demonstrates that the dual-updating strategy can significantly improve the predictive performance of MWGPR, and the improvement in predictive performance from MWGPR is emphasized in section 3.2 through implementation of dual preprocessing. 2.4. Dual-Updating Strategy. 2.4.1. Mean and Variance Updating. A dynamic system is typically modeled using many recorded process parameters that have numerical values of different orders of magnitude. Thus, it is critical to normalize these variables and to use the normalized parameters in the model. Because the process dynamics might not be steady, the means and standard deviations of the process variables might also change over time. The mean and variance

where D is the dimension of the input space of vector x and Θ(a0,a1,v0,w,σe2)T is a vector of the modeling parameters, referred to in the literature as a hyperparameter. The first two terms in eq 2 represent constant bias (offset) and linear correlation, respectively. Hyperparameter v0 signifies the magnitude of the exponential term and accounts for nonlinearity, which is similar to the form of the radial basis function. Hyperparameter w represents the relative significance of each component xd of vector x. The term σe2δij captures the random white noise. The hyperparameters in Θ can be estimated by using Bayesian inference, where p(y|θ,X) = G(0,C), and the maximization of the log-likelihood is now L(Θ) = log[p(y|θ , X)] 1 1 n = − log|C| − y TC−1y − log(2π ) 2 2 2

(3)

and where C is the training covariance matrix. The assignment of prior distributions of the hyperparameters, as cited by Rasmussen,36 was used; the offset, linear, and noise terms in eq 2, for log(a0), log(a1), and log(σe2), respectively, are each taken as Gaussian with mean −3 and standard deviation 3. Two methods are typically employed to optimize the loglikelihood of the hyperparameter θ. The first method is based on the derivative of the log-likelihood corresponding to each component of the hyperparameter θ ∂L(Θ) 1 ⎛ ∂C ⎞ 1 ∂C = − tr⎜C−1 ⎟ + y TC−1 C−1y 2 ⎝ 2 ∂θ ∂θ ⎠ ∂θ

(4)

The second method is the Markov chain Monte Carlo (MCMC) method, which can be used to approximate the hyperparameter distribution p(y|θ,X); details of this method are given in the literature.37 In this study, we used a Matlab implementation of GPR based on a conjugate gradient method to optimize hyperparameter θ, which is publicly available from the GPML toolbox (http://www.gaussianprocess.org/gpml/code/ matlab).37 After generation of the hyperparameters, prediction of the GPR model based on the input xn+1 can provide the predictive distribution of yn+1 p(yn + 1|y, X, x n + 1) =

p(y, yn + 1) p(y|X)

(5)

Then, the Gaussian distribution of the mean and variance are μ(x n + 1) = c(x n + 1)T C−1y

(6)

σ 2(x n + 1) = c(x n + 1) − c(x n + 1)T C−1c(x n + 1)

(7)

T −1

Vector c(xn+1) C can be viewed as smoothing terms or weightings applied to the training outputs y to predict yn+1 based on xn+1. The smaller the term c(xn+1)TC−1c(xn+1), the larger the predicted variance σ2(xn+1), and the farther the new data is from the training data. Higher variance, therefore, indicates that the region of input space either contains few data or is corrupted by noise. These two equations provide an indication of the model prediction (shown in eq 6) and its confidence level (shown in eq 7), from the perspective of the dynamic system. 2.2. GPR-ARX Model for Dynamic Systems. GPR models were first developed to model static nonlinearity present in a data set. The GPR model has since been extended to include nonlinear dynamic systems.19,20,25−27 The dynamic system is modeled using eq 1, where the input vector x can be 6418

dx.doi.org/10.1021/ie201898a | Ind. Eng. Chem. Res. 2012, 51, 6416−6428

Industrial & Engineering Chemistry Research

Article

updates proposed by Wang et al.31 and Mu et al.13 are employed in this work to capture the process dynamics. These are summarized briefly next. Consider the training data set and a window that contains n samples. Then, the process variables, in terms of their means (x), ̅ standard deviations (σ), and normalized forms x̂i, can be standardized as follows xî =

(xi − x̅ ) σ

Table 1. Variables of the Catalyst Activation Data Set16 no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

(8)

The window of the n training samples is moved along the data set, dropping the oldest sample from one end of the data window while incorporating a new sample at the other end. The new training set is refreshed and renormalized similarly. In this work, mean and variance updates are incorporated within the MWGPR model to enhance the predictive capability of the model. Given that xt̅ and σt are the mean and variance of n training samples when the sample (xt, yt) is acquired at time t, the mean and variance of the training set at time t + 1 are generated recursively by xt̅ + 1 =

n−1 1 xt̅ + xt + 1 n n

σt + 12 =

n−2 2 1 (xt + 1 − xt̅ + 1)2 σt + n−1 n−1

(10)

°C °C °C °C °C °C °C °C °C − −

modela

RMSEP

RRMSEP (%)

RPLS MWGPR MWGPR1 MWGPR2 DUMWGPR

0.39 0.014 0.013 0.010 0.010

125 18.11 16.80 9.74 9.33

MWGPR1 represents an MWGPR model with online mean and variance updating; MWGPR2 represents an MWGPR model with bias updating; and DUMWGPR refers to the dual-updated MWGPR model, that is, the MWGPR model with both online mean and variance updating and bias updating.

(11)

(12)

preprocessing, would definitely improve the accuracy of the dynamic model. Previously, Mu et al. considered only the noise in the target (output) variable.13 Several methods could be employed to filter or condition the noise in the raw signal. A commonly employed signal preprocessing method, the Savitzky−Golay35 filter, was employed in this work. The Savitzky−Golay method is a local polynomial regression implementing a finite impulse response filter, which can be interpreted mathematically as follows: For each point j with value yj, a weighted sum of the neighboring values is calculated (a linear combination); the number of neighbors used in the smoothing and the degree of the polynomial fitted to the data control the strength of smoothing, details of which can be found in ref 35. In this study, for dynamic process modeling with integration of dual preprocessing, the process variables, such as temperature and flow, and the process output variable were preprocessed by a Savitzky−Golay filter, and then the recursive partial least-squares (RPLS) or MWGPR model was trained to the preprocessed, historical data. The generated model was applied to the newly collected data (in the test data set), which were also preprocessed through the aforementioned filter for online prediction. The window size of the Savitzky− Golay filter has a significant effect on the preprocessing performance. A few models with Savitzky−Golay filters with window sizes ranging from 3 to 13 in increments of 2 were built to generate predictions. Then, the predicted results were compared to select an appropriate Savitzky−Golay filter to

where yobs and ymod are the observed target values (e.g., from an offline laboratory) and the values predicted by the dynamic model (i.e., GPR), respectively. The entire offset at the latest point, bias (k − 1), is weighted with the factor ω (ranging from 0.1 to 0.9 in increments of 0.1, which is optimized through the same procedure as employed by Mu et al.13 and Ahmed et al.14) to improve the predictive accuracy of the dynamic model. Figure 1B demonstrates the selection of the weighting factor ω for updating GPR-ARX modeling propylene polymerization data (see section 3.2.2) based on the predictive performance. In this case, ω = 0.8. Note that the initial value of the bias is 0. The final output of the dynamic model, ycor(k), can then be corrected for bias as follows ycor (k) = ymod (k) + bias(k)

days kg/h kg/g −

a

that weights and sums the current offset bias using the equation bias0(k) = yobs (k − 1) − ymod (k − 1)

units

Table 2. Results Predicted by the RPLS and MWGPR Models

(9)

2.4.2. Bias Updating. The dynamic model needs to be reliable and robustly adaptive. For this reason, the bias of a dynamic model can be updated13,14 by incorporating an updating scheme to correct the predictive output of the model. Correction is performed by adding an offset bias smoother, defined as bias(k) = ω bias0(k) + (1 − ω)bias(k − 1)

description time measured flow of air measured flow of combustible gas measured concentration of combustible component in the combustible gas (mass fraction) total feed temperature cooling temperature temperature at 1/20 of reactor length temperature at 2/20 of reactor length temperature at 4/20 of reactor length temperature at 7/20 of reactor length temperature at 11/20 of reactor length temperature at 16/20 of reactor length temperature at 20/20 of reactor length product concentration of oxygen (mass fraction) product concentration of combustible component (mass fraction)

(13)

Any multi-step-ahead system sees large increases in uncertainty as one looks ahead. The dual-updating method proposed in this work does as well, because it needs to update the system step by step, when every new sample is measured, based on eqs 9−13. 2.5. Dual Preprocessing. Predictive performance from the moving-window GPR model can be corrupted by noise embedded in a dynamic system, because of the noise embedded in the process parameters or in the target (output) variable, or in both. Smoothing or filtering this kind of noise in both process parameters x and target output variable y, that is, dual 6419

dx.doi.org/10.1021/ie201898a | Ind. Eng. Chem. Res. 2012, 51, 6416−6428

Industrial & Engineering Chemistry Research

Article

Figure 2. Results predicted with different MWGPR models: (A) curve predicted by MWGPR, (B) relative error curve from MWGPR, (C) curve predicted by MWGPR with mean and variance update, (D) relative error curve from MWGPR with mean and variance update, (E) curve predicted by MWGPR with bias update, (F) relative error curve from MWGPR with bias update; (G) curve predicted by dual-updated MWGPR, (H) relative error curve from dual-updated MWGPR. Note: The dotted lines in panels B, D, F, and H represent an ARE metric of 0.1.

In this case, a Savitzky−Golay filter with a window size 3 was selected. 2.6. Evaluation of Model Performance. The modeling performance can be evaluated by the degree of agreement between

effectively remove the embedded noise in the process variables, as depicted in Figure 1C. (The predicted results were generated from GPR-ARX with Savitzky−Golay filters modeling propylene polymerization data, which are detailed in section 3.2.2.) 6420

dx.doi.org/10.1021/ie201898a | Ind. Eng. Chem. Res. 2012, 51, 6416−6428

Industrial & Engineering Chemistry Research

Article

Here, yi and ŷi are the actual system output and the model estimate of the output, respectively, at the ith step and n is the number of samples in the data set. The absolute value of the relative error (ARE) was employed to evaluate the capability of the model to track the trend of an evolving process as follows

Table 3. Results Predicted by GPR-ARX Models method GPR-ARX original bias updated GPR-ARX with Savitzky−Golay filter original bias updated

RMSEP

RRMSEP (%)

0.28 0.15

7.43 3.92

0.12 0.11

2.98 2.65

⎛ y − ŷ ⎞ i⎟ ARE = abs⎜⎜ i ⎟ ⎝ yi ⎠

where abs(·) is the absolute value operator and both RRMSEP and ARE are calculated from values of output y based on original measurement units.

the actual process output and the model-predicted output. We used either the root-mean-square error of prediction (RMSEP) or the relative root-mean-square error of prediction (RRMSEP), as appropriate, to compare the prediction of the model to the output of the process RMSEP =

RRMSEP =

1 n

3. APPLICATION STUDY 3.1. Dynamic Modeling of Catalyst Activity Process. In this section, the predictive performance of the proposed movingwindow GPR model is compared with that of conventional recursive PLS (RPLS), which was used as a benchmark.16 3.1.1. Data Set. The data set for this example was derived from a polymerization reactor consisting of approximately 1000 tubes filled with catalyst, previously described by Kadlec and Gabrys.16 The variations of the feed and operating conditions

n

∑ (yi − yi ̂ )2 i=1

1 n

⎛ y − y ̂ ⎞2 ∑ ⎜⎜ i i ⎟⎟ yi ⎠ i=1 ⎝

(16)

(14)

n

(15)

Figure 3. Results predicted with different GPR-ARX models: (A) curve predicted by GPR-ARX, (B) relative error curve from GPR-ARX, (C) curve predicted by GPR-ARX with bias update, (D) relative error curve from GPR-ARX with bias update. 6421

dx.doi.org/10.1021/ie201898a | Ind. Eng. Chem. Res. 2012, 51, 6416−6428

Industrial & Engineering Chemistry Research

Article

Figure 4. Results predicted by different GPR-ARX models preprocessed using Savitzky−Golay filtering: (A) curve predicted by GPR-ARX modeling, (B) relative error curve from GPR-ARX modeling, (C) curve predicted by GPR-ARX modeling with bias updating, (D) relative error curve from GPR-ARX modeling with bias updating.

to identify the state of the process. Many of the input variables in the collected data set covering one year of operation of the process plant contained high levels of colinearity and noise. In addition, there were large numbers of outliers, as well as other typical issues, such as missing values, changing sampling rates, and automatic value interpolation performed by the data recording system. This data set contained 15 input variables; the physical values represented by the variables in this data set are listed in Table 1. The available data set was divided into two parts, namely, a training set (the first 30% of the data set form the historical data) and a test set (the remaining 70% of the data, which represented the online data and was delivered as a stream of samples), the same division as reported in ref 16. 3.1.2. Data Preprocessing. The data set contained many samples, about 6000 in all, and because certain input variables involved a large number of outliers and missing values, the data set was preprocessed prior to being used as input for the MWGPR modeling. Specifically, the data set was downsampled by a factor of 10 (one every 10 samples was retained); input variables 3, 4, and 15 were removed because of the presence of severe outliers and missing values; and all data

Table 4. Results Predicted by MWGPR Models methoda MWGPR original updatedb MWGPR1 original updatedb MWGPR2 original updatedb DPMWGPR original updatedb

RMSEP

RRMSEP (%)

0.112 0.109

2.84 2.77

0.107 0.105

2.72 2.65

0.099 0.085

2.50 2.16

0.062 0.060

1.60 1.54

a

MWGPR1 represents an MWGPR model with only the X matrix preprocessed with a Savitzky−Golay filter, MWGPR2 represents an MWGPR model with only the y output preprocessed with a Savitzky− Golay filter, and DPMWGPR refers to the dual-preprocessed MWGPR model. bModels updated by implementation of bias updating.

were recorded as input variables, and the measurements of concentrations, flows, and several point temperatures were used 6422

dx.doi.org/10.1021/ie201898a | Ind. Eng. Chem. Res. 2012, 51, 6416−6428

Industrial & Engineering Chemistry Research

Article

Figure 5. Results predicted by different MWGPR models: (A) curve predicted by MWGPR modeling, (B) relative error curve from MWGPR modeling, (C) curve predicted by MWGPR modeling with bias updating, (D) relative error curve from MWGPR modeling with bias updating.

samples with missing target values were removed. All of these steps are similar to those performed by Kadlec and Gabrys in a previous report.16 Down-sampling was used here to strengthen the process changes, because the process was measured over a time of one year, with about 8000 measurements of process output very slowly varying from 0.05 to 1.01. As a result, the corrected training set contained 176 samples, and the remaining 410 samples were used for the online simulation test set. The value of the output ranged from 0.05 to 1.01, whereas the corresponding output value range used in ref 16 was from −8 to 4. 3.1.3. Dynamic Modeling. Table 2 lists the predicted results based on prediction of the test data set using the conventional recursive PLS (RPLS) model used as a reference benchmark for this study. The predicted results reported in subsequent tables were all generated from the test data set. Details of RPLS modeling can be found in ref 39. Both the RMSEP and RRMSEP values listed in Table 2 demonstrate that the RPLS model provided very poor predictive performance for catalyst activity modeling, a result similar to that reported in ref 16. As mentioned in ref 16, strong nonlinearity in the catalyst activity data was observed between the reaction rate and temperature. Leveraging the effectiveness of GPR modeling for nonlinear data, we evaluated the modeling of the catalyst

activity of a polymerization reactor with MWGPR in this work. As summarized in Table 2, the MWGPR model generated RMSEP and RRMSEP metrics of 0.014 and 18.11%, respectively. The improvement in predictive performance from the application of MWGPR to the catalyst activity modeling was significant, as the RMSEP value from MWGPR was only about 1/30 of the corresponding value obtained using RPLS and the RMSEP value produced by the method reported in ref 16 was about 1/12 of the RPLS result. A similar conclusion can be drawn from Figure 2A, as the curve predicted by MWGPR tracks the measured curve perfectly, except for the last five samples in the online test set. The most likely reason for the offset between the model predictions and the actual values over the last five samples is the long-term, discontinuous measurement of the output variable. The relative error curve depicted in Figure 2B demonstrates that most of the absolute relative error (ARE) in predicted samples was below the threshold indicated by the dotted line (0.1), with some predictions even showing errors close to 0, except for the last five samples. There are two peaks where the absolute relative error is close to 0.1, at about the 340th and 375th samples. Online updating of the mean and variance was applied in MWGPR modeling to take full advantage of the useful 6423

dx.doi.org/10.1021/ie201898a | Ind. Eng. Chem. Res. 2012, 51, 6416−6428

Industrial & Engineering Chemistry Research

Article

Figure 6. Results predicted by different MWGPR models with the X matrix preprocessed by a Savitzky−Golay filter: (A) curve predicted by MWGPR modeling, (B) absolute relative error curve from MWGPR modeling, (C) curve predicted by MWGPR modeling with bias updating, (D) absolute relative error curve from MWGPR modeling with bias updating.

information on process dynamics, as directly discarding samples in moving-window methods can lead to information loss. As shown in Table 2, the predictive performance of the MWGPR model was improved slightly in terms of both RMSEP and RRMSEP after online updating of the mean and variance. The predicted curve is depicted in Figure 2C. Figure 2D clearly shows that the peak in absolute relative error at about the 340th sample in Figure 1B has disappeared and the error in the last five samples has decreased slightly. Because there is an obvious bias between the predicted and measured values for the last five samples, the bias update with ω = 0.8 was employed to minimize the gap between the values predicted by the MWGPR model and the measured values. As summarized in Table 2, the predicted results show that the RRMSEP value from the model decreased significantly, to 9.74%. The gap for the last five samples clearly narrowed after application of bias updating, as depicted in Figure 2E. Similar results can be observed in Figure 2F, but the peak in absolute relative error remains. By leveraging the strengths of updating both the mean and variance and using a bias update with ω = 0.8, the MWGPR model was able to generate more robust predicted results (as shown in Table 2, the RRMSEP value from DUMWGPR modeling is only one-half

that obtained from the original MWGPR model), as illustrated in Figure 2G,H. In addition,the RMSEP value of the dualupdated MWGPR model is only 1/39 that of the benchmark RPLS model. 3.2. Dynamic Modeling of a Propylene Polymerization Process. As discussed in this section, the MWGPR model was further enhanced by adopting the strategy of dual signal processing using the Savitzky−Golay smoothing method and bias updating. 3.2.1. Process Description. The data set used in this example was adapted from Li et al.15 The process involved two continuous stirred-tank reactors (CSTRs) and two fluidizedbed reactors (FBRs). The first CSTR was modeled as the target in this example. Eight process parameters were selected to control the system and to predict the melt flow rate (MFR) as the output. A total of 360 data sets were collected. Considering the time lag that exists in the propylene polymerization process, the process parameters involved in the model were arranged as x = [u(k), u(k − 1), ..., u(k − L)], where u is the process input vector and L is the time lag for the input. Here, L was set to 2, similar to what was done in ref 15, according to the process mechanism. Thus, a total of 24 variables were modeled for each 6424

dx.doi.org/10.1021/ie201898a | Ind. Eng. Chem. Res. 2012, 51, 6416−6428

Industrial & Engineering Chemistry Research

Article

Figure 7. Results predicted by different MWGPR models with y output preprocessed by a Savitzky−Golay filter: (A) curve predicted by MWGPR modeling, (B) relative error curve from MWGPR modeling, (C) curve predicted by MWGPR modeling with bias updating, (D) relative error curve from MWGPR modeling with bias updating.

the GPR-ARX model produced an RRMSEP value of 2.65% as listed in Table 3, about one-third of that from GPR-ARX models employed without updating and Savitzky−Golay filtering. The MWGPR model was next used to model the propylene polymerization process. Table 4 shows the results predicted by MWGPR modeling. The predictive performance was improved significantly, as reflected in the values of both RMSEP and RRMSEP. It can be seen in Figure 5A that the curve predicted by the MWGPR model almost coincides with the measured curve, except for a few samples at about the 100th sample. There, the relative error was slightly over 0.1, as demonstrated in Figure 5B. The predictive performance could be improved slightly through application of bias updating with ω = 0.3, as shown in Table 4 and Figure 5C,D. The noise embedded in both the process parameters and the target (output) variable can corrupt the predictive performance of a dynamic model. Therefore, noise removal or reduction of noise to a certain level might lead to improvement in the predictive performance. Table 4 shows the results predicted by MWGPR modeling with an x matrix (process parameters) that had been preprocessed using the Savitzky−Golay filter.

sample. The data sets were divided into two parts; the training series consisted of the first 100 sets, and the test series consisted of the remaining 260 sets, again similar to the approach reported in ref 15. 3.2.2. Dynamic Modeling. As depicted in Figure 1A, four previous output MRF values were added to the X matrix to form the ARX structure. Table 3 shows that the GPR-ARX model generated RMSEP and RRSMEP values of 0.28 and 7.43%, respectively. The very poor predictive performance from the GPR-ARX model is further demonstrated in Figure 3A,B, where a significant offset can be seen between the measured and predicted values. Although significant improvement (as shown in Table 3) was achieved by application of bias updating, the GPR-ARX model still did not generate reliable predictive results. The obvious bias between the measured and predicted values was still present, as clearly shown in Figure 3C,D. (The results in Figure 3D were generated from an updated GPRARX model with ω = 0.8, which was optimized in Figure 1B.) After preprocessing using the Savitzky−Golay filter, which is optimized in Figure 1C, the GPR-ARX model produced more robust predictive results, as shown in Table 3 and Figure 4A,B. After further updating using bias correction as shown in Figure 4C,D, 6425

dx.doi.org/10.1021/ie201898a | Ind. Eng. Chem. Res. 2012, 51, 6416−6428

Industrial & Engineering Chemistry Research

Article

Figure 8. Results predicted by different MWGPR models with dual preprocessing with Savitzky−Golay filters: (A) curve predicted by MWGPR modeling, (B) absolute relative error curve from MWGPR modeling, (C) curve predicted by MWGPR modeling with bias updating, (D) absolute relative error curve from MWGPR modeling with bias updating.

in Figure 8D scoring below 0.06. All of these results are much lower than the control limit, 0.1, used in a previous study.15 In our previous study,32 recursive GPR (RGPR) performed both samplewise and blockwise with an ARX structure similar to that used for GPR-ARX in this work, was applied to model this dynamic process. The RGPR models generated predictive performance with RRMSEP values of 2.21% and 2.24% for a samplewise model without and with bias updating, respectively.32 Although an ARX structure was not involved in the method proposed in this work, the newly developed DPMWGPR and its updated model produced much better results (shown in Table 4) than those obtained from the RGPR models, as shown by a direct comparison of Figure 8 in this work and Figure 5 in the previous study.32

In general, after preprocessing of the data using the Savitzky− Golay filter, the model should generate more reliable predictive results. In this case, the deviation between the predicted and measured values at about the 120th sample became large, as shown in Figure 6A,B. The deviation narrowed slightly, as shown in Figure 6C,D, after application of updating with bias correction. When the MWGPR model was combined with y output that had been preprocessed using a Savitzky−Golay filter, significant improvement in predictive performance was obtained, as reflected in both the RMSEP and RRMSEP values (see Table 4). Figure 7D shows that the relative error of all samples was under the tolerance limit in the ARE metric of 0.1. For the MWGPR model, merging the two preprocessing strategies inevitably resulted in significant enhancement in the predictive performance of the model. Table 4 shows that the MWGPR model with a dual-preprocessing approach generated RMSEP and RRMSEP values of 0.062 and 1.60%, respectively. Figure 8A clearly shows that the predicted curve matches the measured curve very well, and in Figure 8B, all points are below 0.07. Combined with bias updating, the DPMWGPR model produced slightly better predictive performance, with all points

4. CONCLUSIONS In this article, a Bayesian nonparametric regression technique, namely, moving-window Gaussian process regression, has been presented. This technique was used to model nonlinear dynamic systems. Two modeling strategies, namely, dual updating and dual preprocessing, were proposed and incorporated in the newly developed MWGPR modeling method to further improve the 6426

dx.doi.org/10.1021/ie201898a | Ind. Eng. Chem. Res. 2012, 51, 6416−6428

Industrial & Engineering Chemistry Research

Article

(11) Dayal, B. S.; MacGregor, J. F. Recursive exponentially weighted PLS and its applications to adaptive control and prediction. J. Process Control 1997, 7, 169−179. (12) Qin, S. J. Recursive PLS algorithms for adaptive data modeling. Comput. Chem. Eng. 1998, 22, 503−514. (13) Mu, S.; Zeng, Y.; Liu, R.; Wu, P.; Su, H.; Chu, J. Online dual updating with recursive PLS model and its application in predicting crystal size of purified terephthalic acid (PTA) process. J. Process Control 2006, 16, 557−566. (14) Ahmed, F.; Nazier, S.; Yeo, Y. K. A recursive PLS-based soft sensor for prediction of the melt index during grade change operations in HDPE plant. Korean J. Chem. Eng. 2009, 26, 14−20. (15) Li, C.; Ye, H.; Wang, G.; Zhang, J. A recursive nonlinear PLS algorithm for adaptive nonlinear process modeling. Chem. Eng. Technol. 2005, 28, 141−152. (16) Kadlec, P.; Gabrys, B. Local learning-based adaptive soft sensor for catalyst activation prediction. AIChE J. 2011, 57, 1288−1301. (17) Radhakrishnan, V.; Mohamed, A. Neural networks for the identification and control of blast furnace hot metal quality. J. Process Control 2000, 10, 509−524. (18) Johansen, T. A.; Shorten, R.; Murray-Smith, R. On the interpretation and identification of dynamic Takagi−Sugeno fuzzy models. IEEE Trans. Fuzzy Syst. 2000, 8, 297−313. (19) Bukkapatnam, S. T. S.; Cheng, C. Forecasting the evolution of nonlinear and nonstationary systems using recurrence-based local Gaussian process models. Phys. Rev. E. 2010, 82, 056206−1−12. (20) Grancharova, A.; Kocijan, J.; Johansen, T. A. Explicit stochastic predictive control of combustion plants based on Gaussian process models. Automatica 2008, 44, 1621−1631. (21) di Sciascio, F.; Amicarelli, A. N. Biomass estimation in batch biotechnological process by Bayesian Gaussian process regression. Comput. Chem. Eng. 2008, 32, 3264−3273. (22) Brahim-Belhouari, S.; Bermak, A. Gaussian process for nonstationary time series prediction. Comput. Stat. Data Anal. 2004, 47, 705−712. (23) Ažman, K.; Kocijan, J. Application of Gaussian processes for black-box modeling of biosystems. ISA Trans. 2007, 46, 443−457. (24) Ažman, K.; Kocijan, J. Dynamical systems identification using Gaussian process models with incorporated local models. Eng. Appl. Artif. Intell. 2011, 24, 398−408. (25) O’Hagan, A. Curve fitting and optimal design for prediction (with discussion). J. R. Stat. Soc. B 1978, 40, 1−42. (26) Neal, R. M. Bayesian Learning for Neural Networks; SpringerVerlag: NewYork. 1996. (27) Chen, T.; Morris, J.; Martin, E. Gaussian process regression for multivariate spectroscopic calibration. Chemom. Intell. Lab. Syst. 2007, 87, 59−71. (28) Chen, T.; Wang, B. Bayesian variable selection for Gaussian process regression: Application to chemometric calibration of spectrometers. Neurocomputing 2010, 73, 2718−2726. (29) Gregorčič, G.; Lightbody, G. Nonlinear system identification: From multiple-model networks to Gaussian processes. Eng. Appl. Artif. Intell. 2008, 21, 1035−1055. (30) Deisenroth, M. P.; Rasmussen, C. E.; Peters, J. Gaussian process dynamic programming. Neurocomputing 2009, 72, 1508−1524. (31) Gregorčič, G.; Lightbody, G. Gaussian process approach for modeling of nonlinear systems. Eng. Appl. Artif. Intell. 2009, 22, 522− 533. (32) Ni, W.; Tan, S. K.; Ng, W. J. Recursive GPR for nonlinear dynamic process modeling. Chem. Eng. J. 2011, 173, 636−643. (33) Wang, X.; Kruger, U.; Irwin, G. W. Process monitoring approach using fast moving window PCA. Ind. Eng. Chem. Res. 2005, 44, 5691−5702. (34) Choi, S. W.; Martin, E. B.; Morris, A. J.; Lee, I. B. Adaptive multivariate statistical process for monitoring time-varying processes. Ind. Eng. Chem. Res. 2006, 45, 3108−3118. (35) Savitzky, A.; Golay, M. J. E. Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Anal. Chem. 1964, 36, 1627−1639.

predictive performance of the dynamic model. Two example processes were used as test cases to demonstrate the performance of the MWGPR model. The results showed that the MWGPR model is effective in tracking the process dynamics of catalyst activity. This was achieved without the need for complicated localization or complex procedures involving adaptation or preselection of an appropriate nonlinear kernel. It was also illustrated through modeling of the test case of an industrial propylene polymerization process for which simultaneous removal of embedded noise in both process parameters and process output variables by using a dual-preprocessing technique could significantly improve the predictive capability of MWGPR models for a nonlinear dynamic process. Furthermore, an obvious improvement in predictive performance was achieved by implementation of the newly developed methods in this study compared to the results from RGPR models in our previous study32 modeling the same nonlinear dynamic process.



AUTHOR INFORMATION

Corresponding Author

*Tel.: (65)67905321. Fax: (65)67906620. E-mail: WDNI@ ntu.edu.sg. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS Funding support from the Nanyang Environment and Water Research Institute, Nanyang Technological University, and its research partners is gratefully acknowledged. The authors are especially grateful to Dr. Petr Kadlec and Dr. Chunfu Li for providing the catalyst activation data set for the polymerization process and the data on the process of propylene polymerization, respectively.



REFERENCES

(1) Larimore, W. E. Canonical variate analysis for system identification, filtering and adaptive control. In Proceedings of the 29th IEEE Conference on Decision and Control; IEEE Press: Piscataway, NJ, 1990; Vol. 1, pp 596−694. (2) Juricek, B. C.; Seborg, D. E.; Larimore, W. E. Fault detection using canonical variate analysis. Ind. Eng. Chem. Res. 2004, 43, 458− 474. (3) Shi, R.; MacGregor, J. F. Modeling of dynamic systems using latent variable and subspace methods. J. Chemom. 2000, 14, 423−439. (4) Negiz, A.; Ç inar, A. PLS, balanced, and canonical variate realization techniques for identifying VARMA models in state space. Chemom. Intell. Lab. Syst. 1997, 38, 209−221. (5) Kadlec, P.; Grbić, R.; Gabrys, B. Review of adaptation mechanisms for data-driven soft sensors. Comput. Chem. Eng. 2010, 35, 1−24. (6) McAvoy, T. Intelligent “control” applications in the process industries. Annu. Rev. Control 2002, 26, 75−86. (7) Fortuna, L.; Rizzo, A.; Sinatra, M.; Xibilia, M. G. Soft analyzers for a sulfur recovery unit. Control Eng. Pract. 2003, 11, 1491−1500. (8) Simoglou, A.; Martin, E. B.; Morris, A. J. A comparison of canonical variate analysis and partial least squares for the identification of dynamic processes. In Proceedings of the American Control Conference; IEEE Press: Piscataway, NJ, 1999; pp 832−837. (9) Simoglou, A.; Martin, E. B.; Morris, A. J. Dynamic multivariate statistical process control using partial least squares and canonical variate analysis. Comput. Chem. Eng. 1999, 23 (Suppl.), S277−S280. (10) Helland, K.; Berntsen, H. E.; Borgen, O. S.; Martens, H. Recursive algorithm for partial least squares regression. Chemom. Intell. Lab. Syst. 1992, 14, 129−137. 6427

dx.doi.org/10.1021/ie201898a | Ind. Eng. Chem. Res. 2012, 51, 6416−6428

Industrial & Engineering Chemistry Research

Article

(36) Rasmussen C. E. Evaluation of Gaussian processes and other methods for non-linear regression. Ph.D. Thesis, University of Toronto, Toronto, Canada, 1996. (37) Rasmussen, C. E.; Williams, C. K. I. Gaussian Processes for Machine Learning; The MIT Press: Cambridge, MA, 2006. (38) Akaike, H. Canonical correlation analysis of time series and the use of an information criterion. In System Identification: Advances and Case Studies; Mehra, R. K., Lainiotis, D. G., Eds.; Academic Press: New York, 1976; pp 27−96. (39) Ni W.; Tan S. K.; Ng W. J.; Brown S. D. A localized, adaptive recursive partial least squares regression for dynamic system modeling. Ind. Eng. Chem. Res., manuscript submitted.

6428

dx.doi.org/10.1021/ie201898a | Ind. Eng. Chem. Res. 2012, 51, 6416−6428