Development of Soft Sensor Models Based on Time Difference of

Development of Soft Sensor Models Based on Time Difference of Process Variables with Accounting ... Phone: +81-3-5841-7751. .... Ensemble deep kernel ...
0 downloads 0 Views 2MB Size
ARTICLE pubs.acs.org/IECR

Development of Soft Sensor Models Based on Time Difference of Process Variables with Accounting for Nonlinear Relationship Hiromasa Kaneko† and Kimito Funatsu*,† †

Department of Chemical System Engineering, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-8656, Japan ABSTRACT: Soft sensors are widely used to estimate process variables that are difficult to measure online. Though regression models are reconstructed with new data to adapt changes of the plants, some problems remain in practice. Hence, it is attempted to construct soft sensor models based on the time difference of an objective variable and that of explanatory variables for reducing the effects of deterioration with age such as the drift and gradual changes in the state of plants. In this paper, we have proposed to construct time difference models after modeling nonlinear relationship between and among process variables. Variables obtained by physical models or those calculated by statistical nonlinear regression methods are used to consider the nonlinearity, and then, a time difference model is constructed including these variables. We applied these methods to the actual industrial data obtained during an industrial polymer process and confirmed the usefulness of the proposed methods.

1. INTRODUCTION In industrial plants, soft sensors have been widely used to estimate process variables that are difficult to measure online.1,2 An inferential model is constructed between those variables that are easy to measure online and those that are not, and an objective variable y is then estimated using that model. In particular, the partial least-squares (PLS) method3,4 has been used as a modeling method for soft sensors. Through the use of soft sensors, the values of objective variables can be estimated with a high degree of accuracy in real-time. In addition, soft sensors can give useful information in terms of fault detection by working with hardware sensors in parallel. Their use, however, involves some practical difficulties. One crucial difficulty is that their predictive accuracy gradually decreases due to changes in the state of chemical plants, catalyzing performance loss, sensor and process drift, and the like. In order to reduce the degradation of a soft sensor model, the updating of regression models5 and Just-In-Time (JIT) modeling6 have been proposed. While many excellent results have been reported based upon the use of these methods, there remain some problems for the introduction of soft sensors into practice.7 First of all, if soft sensor models are reconstructed with the inclusion of any abnormal data, their predictive ability can deteriorate.8 Though such abnormal data must be detected with high accuracy, under present circumstances it is difficult to accurately detect all of them. Second, reconstructed models have a high tendency to specialize in predictions over a narrow data range.9 Subsequently, when variations in the process variables occur, these models cannot predict the resulting variations in data with a high degree of accuracy. Third, if a soft sensor model is reconstructed, the parameters of the model, for example, the regression coefficients in linear regression modeling, are dramatically changed in some cases. Without the operators’ understanding of a soft sensor model, the model cannot be practically applied. Whenever soft sensor models are reconstructed, operators check the parameters of the models so they will be safe for operation. This takes a lot of time and effort because it is not rare that tens of soft sensors are used in a plant.10 Fourth, the data used to reconstruct r 2011 American Chemical Society

soft sensor models are also affected by the drift of sensors and the process. In the construction of the model, data have to be selected from a database which includes both data affected by the drift and data after correction of the drift. In order to solve these problems, it was proposed previously to construct soft sensor models based on the time difference of explanatory variables X and that of y for reducing the effects of deterioration with age such as the drift and gradual changes in the state of plants.7 In other words, models which are not affected by these changes must be constructed using not the values of process variables but the time difference in soft sensor modeling. A model whose construction is based on the time difference of X and that of y is referred to as a ‘time difference model’. Time difference models can also have high predictive accuracy even after drift correction because the data are represented as the time difference that cannot be affected by the drift. Kaneko et al.7 confirmed through the analysis of actual industrial data that the time difference model displayed almost the same predictive accuracy as the updating model for a period of three years, even when the time difference model was not reconstructed. However, the time difference model cannot account for the nonlinearity in process variables. Figure 1 shows a simplified example of this problem. There exists the nonlinear relationship between an explanatory variable x and y. Hence, time difference of x, Δx, is same (1, 1) though that of y, Δy, is different (1, 3). In this case, even if nonlinear regression methods are used for modeling, it is impossible to construct appropriate regression models. Studies for modeling the nonlinearity of process variables have been performed, and some excellent results were obtained. Dai et al.11 constructed the nonlinear soft sensor model having high accuracy with process knowledge and the artificial neural network (ANN)12 method. However, degradation of soft sensor models is not referred there. Kadlec and Gabrys13 present an Received: April 3, 2011 Accepted: August 11, 2011 Revised: August 8, 2011 Published: August 11, 2011 10643

dx.doi.org/10.1021/ie200692m | Ind. Eng. Chem. Res. 2011, 50, 10643–10651

Industrial & Engineering Chemistry Research

ARTICLE

2. METHODS The proposed method constructing time difference models after modeling nonlinearity in process variables are explained in this section. Before that, we briefly introduce a time difference modeling method. 2.1. Time Difference Modeling Method. In a traditional procedure, a modeling relationship between explanatory variables X(t) ∈ Rmn with m rows of data samples and n columns of variables and an objective variable y(t) ∈ Rm1 is done by regression methods after preparing data, X(t) and y(t), related to time t, where t is a real number. In terms of prediction, the constructed model predicts the value of y(t0 ) with the new data x(t0 ). In time difference modeling, time difference of X(t), ΔX(t) ∈ Rmn, and that of y(t), Δy(t) ∈ Rm1, are first calculated between the present values, X(t) and y(t), and those in some time i before the target time X(t-i) ∈ Rmn and y(t-i) ∈ Rm1, where i is a real number ΔXðtÞ ¼ XðtÞ  Xðt  iÞ

ð1Þ

ΔyðtÞ ¼ yðtÞ  yðt  iÞ

ð2Þ

Then, relationship between ΔX(t) and Δy(t) is modeled by a regression method such as PLS and SVR ΔyðtÞ ¼ f ðΔXðtÞÞ þ eðtÞ Figure 1. Nonlinear relationship between x and y.

architecture for the development of online prediction models to follow the nonlinearity and adapt changes of operating points and data drift with ensemble methods, local learning, and meta learning. Though the constructed models have a high degree of predictive accuracy, these are not appropriate for practical use in some cases because of maintenance problems of the soft sensor models as mentioned above. Therefore, in order to account for nonlinearity in process variables and the effects of changes with age, we have proposed to model nonlinear relationship between and among process variables before constructing time difference models. The problem shown in Figure 1 can be prevented by describing the nonlinearity between x and y before preparation of the time difference. One of the methods to model nonlinearity in process variables is constructing physical models with process knowledge. In polymer processes, for example, McAuley and MacGregor14 made correlations between polymer quality variables and other process variables such as temperature in a reactor and concentration of monomer, comonomer, hydrogen and catalyst on the basis of reaction kinetics and rheological characteristics. However, appropriate physical models cannot be obtained in all processes and wrong physical models might be constructed unexpectedly. In those cases, statistical nonlinear regression methods such as the ANN method and the support vector regression (SVR) method15,16 can be used to model the nonlinear relationship between X and y. If physical variables or those calculated by statistical nonlinear regression models can account for nonlinearity in process variables appropriately, time difference models can account for the effects of changes with age after that; and thus, the degradation of soft sensor can be solved without manual and automatic reconstruction of soft sensor models, which help process engineers to maintain soft sensors.

ð3Þ

where f is a regression model and e(t) ∈ Rm1 is a vector of calculation errors. f is a transform from R1n to R, and hence, X ∈ Rmn is transformed to calculated y ∈ Rm1 if the number of data samples is m. In terms of prediction, the constructed model predicts the time difference of y(t0 ), Δypred(t0 ), using the time difference of a new data, Δx(t0 ) ∈ R1n, calculated as follows Δxðt 0 Þ ¼ xðt 0 Þ  xðt 0  iÞ

ð4Þ

Δy pred ðt 0 Þ ¼ f ðΔxðt 0 ÞÞ

ð5Þ

ypred(t0 ) can be calculated as follows ypred ðt 0 Þ ¼ Δypred ðt 0 Þ þ yðt 0  iÞ

ð6Þ

because y(t0 -i) is given previously. This method can be easily expanded to a case that an interval i is not constant. By constructing time difference models, the effects of deterioration with age such as the drift and gradual changes in the state of plants can be accounted for, because data are represented as time difference that cannot be affected by these factors. 2.2. Proposed Method. We have proposed to construct time difference models after modeling nonlinear relationship between and among process variables. Variables obtained by physical models and those calculated by statistical nonlinear regression methods can be used to consider the nonlinearity in process variables. 2.2.1. Time Difference Modeling with Variable Obtained by Physical Model. In order to describe nonlinearity between X(t) and y(t), physical models can be constructed with process knowledge for introducing Xphysi(t) ∈ Rmp that are variables calculated by the physical models, where p is the number of physical variables. For example, if it is known that an explanatory variable x ∈ R1n affects y in an exponential manner from process knowledge, exp(x) is a physical variable. In addition, McAuley and MacGregor14 proposed physical variables for polymer qualities in industrial polymer processes. After that, time difference of 10644

dx.doi.org/10.1021/ie200692m |Ind. Eng. Chem. Res. 2011, 50, 10643–10651

Industrial & Engineering Chemistry Research

ARTICLE

Xphysi(t), ΔXphysi(t) ∈ Rmp, is calculated as follows ΔX physi ðtÞ ¼ Xphysi ðtÞ  Xphysi ðt  iÞ

ð7Þ

ΔX(t) in eq 1 is combined with ΔXphysi(t) as follows ΔX new1 ðtÞ ¼ ½ΔXðtÞΔX physi ðtÞ

ΔyðtÞ ¼ f ðΔXnew1 ðtÞÞ þ eðtÞ

ð9Þ

If parameters in Xphysi(t) have to be decided and/or the number of X-variables are large, decision of the parameters and/or selection of X-variables are performed using a genetic algorithm (GA).15,16 The q2 value, which is shown in Appendix C, is used as a fitness value of the chromosome. In terms of prediction, Δxphysi(t0 ) ∈ R1p calculated first as follows 0

Δycalc ðt 0 Þ ¼ ycalc ðt 0 Þ  ycalc ðt 0  iÞ

ð8Þ

where ΔXnew1(t) ∈ Rm (n+p) is new explanatory variables incorporating process knowledge. A modeling relationship between ΔXnew1(t) and Δy(t) in eq 2 is done by a regression method such as PLS and SVR like eq 3

0

If the number of X-variables is large, decision of the parameters in a nonlinear model and selection of X-variables are performed by using GA as well as done in section 2.2.1. In terms of prediction, Δycalc(t0 ) calculated as follows

0

Δx physi ðt Þ ¼ x physi ðt Þ  x physi ðt  iÞ

ð10Þ

is the combination of Δx(t0 ) A new data Δxnew1(t0 ) ∈ R calculated by eq 4 and Δxphysi(t0 ) 1 (n+p)

Δx new1 ðt 0 Þ ¼ ½Δxðt 0 Þ Δx physi ðt 0 Þ

ð11Þ 0

The constructed model f predicts Δypred(t ) using Δxnew1(t0 ) Δypred ðt 0 Þ ¼ f ðΔx new1 ðt 0 ÞÞ

ð12Þ

0

ypred(t ) can be calculated by eq 6. If nonlinear relationship between X(t) and y(t) can be modeled by Xphysi(t), time difference models can account for the effects of deterioration with age after that. 2.2.2. Time Difference Modeling with Variable Calculated by Statistical Nonlinear Regression Method. Though nonlinearity in process variables can be described by constructing appropriate physical models, these models cannot be obtained in all processes and wrong physical models might be modeled unexpectedly. Therefore, a nonlinear regression model is constructed by a statistical method such as ANN and SVR in order to consider a nonlinear relationship between X(t) and y(t) as follows yðtÞ ¼ gðXðtÞÞ þ eðtÞ

ð13Þ

where g is a nonlinear function. The time difference of calculated values of y, Δycalc(t) ∈ Rm1, is obtained as follows Δy calc ðtÞ ¼ y calc ðtÞ  y calc ðt  iÞ y calc ðtÞ ¼ gðXðtÞÞ y calc ðt  iÞ ¼ gðXðt  iÞÞ

where ycalc ðt 0 Þ ¼ gðxðt 0 ÞÞ ycalc ðt 0  iÞ ¼ gðxðt 0  iÞÞ

ð15Þ

Δxnew2 ðt 0 Þ ¼ ½Δxðt 0 ÞΔycalc ðt 0 Þ

Δypred ðt 0 Þ ¼ f ðΔxnew2 ðt 0 ÞÞ

ð21Þ

0

ypred(t ) can be calculated by eq 6. Of course, the proposed methods can be sensitive to noise in data in some cases because statistical modeling is performed two times, but this will be prevented by using robust regression methods for noise such as PLS and SVR. If the nonlinear relationship between X(t) and y(t) is modeled by a nonlinear regression method appropriately and robustly, time difference models can account for the effects of deterioration with age after that.

3. RESULTS AND DISCUSSION We verified that the proposed methods could model nonlinearity in process variables and handle the effects of deterioration with age even when noise in data existed and they were superior to traditional methods and attempted to apply these methods to the actual industrial data obtained during an industrial polymer process. 3.1. Modeling of the Simulation Data. The proposed methods were compared with traditional methods using simulated data to verify the superiority of the proposed methods over traditional ones. We generated the data including both nonlinearity between X and y and effects of change with age at a constant rate. First, three vectors p1, p2, and p3 ∈ Rm1 of uniform pseudorandom numbers whose range was from 0 to 7 were prepared. Then, y was set as follows y ¼ p1 3 þ ðp2 þ 5Þ1=2 þ p3 þ Nð0, 0:1Þ

ð22Þ

where N(0, 0.1) is random numbers from normal distribution given a standard deviation of 0.1 and a mean of 0, the cube of a vector p1 is applied to each element, and the first and second terms of the right side represents the nonlinearity between pi and y. X-variables were prepared as follows X ¼ ½x 1 x 2 x 3 x4 

ð16Þ

where ΔXnew2(t) ∈ Rm (n+1) is new explanatory variables incorporating nonlinearity of process variables. Relationship between ΔXnew2(t) and Δy(t) is modeled by a regression method such as PLS and SVR like eq 3 ΔyðtÞ ¼ f ðΔXnew2 ðtÞÞ þ eðtÞ

ð20Þ

The constructed model f predicts the time difference of the objective variable, Δypred(t0 ), using Δxnew2(t0 )

ΔX(t) in eq 1 is combined with Δycalc(t) as follows ΔX new2 ðtÞ ¼ ½ΔXðtÞΔy calc ðtÞ

ð19Þ

A new data Δxnew2(t0 ) ∈ R1(n+1), which is the combination of Δx(t0 ) calculated by eq 3 and Δycalc(t0 )

ð14Þ

where

ð18Þ

ð17Þ

x i ¼ pi þ 0:005  ½123 3 3 3 qT þ Nð0, 0:1Þ ði ¼ 1, 2, 3Þ ð23Þ x4 ¼ Nð0, 0:1Þ where the second term of xi ∈ Rm1 represents the effects of change with age at a constant rate, and q is the number of data samples. In eqs 22 and 23, the all N(0, 0.1) series are independent. There exists strong nonlinearity between X and y as shown 10645

dx.doi.org/10.1021/ie200692m |Ind. Eng. Chem. Res. 2011, 50, 10643–10651

Industrial & Engineering Chemistry Research

ARTICLE

Table 1. Modeling and Prediction Results of the Simulation Data r2

q2

rpred2

A: PLS, value

0.922

0.913

0.854

B: PLS, time difference C: SVR, value

0.839 0.986

0.822 0.981

0.856 0.868 0.830

D: SVR, time difference

0.844

0.834

E: PLS, value, Xphysi

0.986

0.984

0.847

F: PLS, time difference, Xphysi

0.988

0.986

0.927

G: PLS, value, Xwrong physi

0.973

0.968

0.740

H: PLS, time difference, Xwrong physi

0.968

0.963

0.787

I: PLS, time difference, ycalc

0.982

0.980

0.938

J: SVR, time difference, ycalc

0.989

0.984

0.942

in eqs 22 and 23. Therefore, both nonlinearity between X and y and effects of change with age at a constant rate was included in the generated data. q was set as 201, and the numbers of training and test data were set as 101 and 100, respectively. The first 101 data was used for the training, and the constructed models were tested by using the next 100 data. After the preparation of data, the ten models listed below were applied to the data. A The PLS model constructed between the values of X and those of y B The PLS model constructed between the time difference of X and that of y C The SVR model constructed between the values of X and those of y D The SVR model constructed between the time difference of X and that of y E The PLS model constructed between the values of Xphysi and those of y F The PLS model constructed between the time difference of Xphysi and that of y G The PLS model constructed between the values of Xwrong physi and those of y H The PLS model constructed between the time difference of Xwrong physi and that of y I The PLS model constructed between the time difference of X and ycalc and that of y J The SVR model constructed between the time difference of X and ycalc and that of y Xphysi is the six variables added x5 and x6 ∈ Rm1 to X as follows X physi ¼ ½x1 x2 x 3 x 4 x 5 x 6  x5 ¼ x1 3 x 6 ¼ ðx2 þ 5Þ1=2

ð24Þ

where the cube of a vector x1 is applied to each element, and x5 and x6 mean the variables obtained by physical models about relationship between p1 and y and that between p2 and y, respectively. Xwrong physi is the six variables added x6 and x7 ∈ Rm1 to X as follows X wrongphysi ¼ ½x1 x2 x 3 x 4 x 7 x 6  x7 ¼ expðx1 Þ ð25Þ where x7 means the variable obtained by a wrong physical model about relationship between p1 and y. ycalc is calculated values of

y by the SVR model between X and y as represented in eq 13. The details of PLS and SVR are shown in Appendixes A and B. The modeling and prediction results are shown in Table 1. The details of the statistics are shown in Appendix C. The rpred2 is the r2 for test data. The time difference calculated simply did not affect the predictive accuracy in this case because the rpred2 values of A and B were almost the same degree. This would be because of the nonlinearity between X and y. Since the rpred2 value of D was not high, the nonlinearity could not be dealt with by the nonlinear model constructed between the time difference of X and that of y as shown in Figure 1. If the time difference was not used, the SVR method increased the fitting accuracy, but predictive accuracy did not increase largely as shown by r2, q2, and rpred2 values of C. Figure 2 shows plots of y values and predicted y values with test data. The nonlinearity between X and y was not modeled with A from Figure 2(a). Though this nonlinearity could be described by the model C because there existed almost the linear relationship between y values and predicted y values, there seemed the bias of the prediction errors in Figure 2(c). This bias is derived from the effects of change with age at a constant rate in eq 23. On the other hand, Figure 2(b) shows that the time difference model could ease the bias of the prediction errors; however, the variation of them was large, and there seemed to remain nonlinearity between y and ypred in Figure 2(d). The model constructed between the time difference by using SVR, a nonlinear regression method, had the same tendency in Figure 2(d) as that of PLS. Kaneko et al.7 analyzed the simulated data including only effects of change with age at a constant rate where the relationship between X and y is linear. In that case, the time difference model constructed with the PLS method outperformed the normal PLS model. However, through the analysis of actual industrial data, a time difference model could not account for variations where the relationship between X and y was changing over time. Nonlinearity between X and y comes from a special case of the variations. It should be noted that just time difference cannot deal with nonlinearity between X and y, while a linear trend with time is removed by time difference. In the simulation of this paper, the performances of methods B and D were bad because effects of the nonlinearity were far stronger than those of change with age at a constant rate. Subsequently, the variables representing the nonlinearity between X and y were added to X. As shown in the results of A and E in Table 1, rpred2 was not changed when the models were constructed with the values of Xphysi. However, by using the time difference model, predictive accuracy increased significantly as shown by the results of F. The nonlinearity existing in the variables could be modeled by x5 and x6, and the effects of changes at a constant rate could be accounted for by the time difference after that as shown in Figure 2(e). On the one hand, Xwrong physi decreased predictive accuracy from the results of G and H in Table 1, which indicates that physical models should be constructed adequately. When the SVR method was used for modeling the nonlinear relationship between X and y and the time difference model was constructed after that, the bias of the prediction errors existing in Figure 2(c) was reduced and high predictive accuracy was shown in Table 1 and Figure 2(f) and (g). If process knowledge is lacking, nonlinear regression methods such as the SVR method can be substituted for physical models. In addition, as shown in Figure 2(e), (f), and (g), the bias and the variation of the prediction errors were small, and thus, the predicted y values were in good agreement with y values. 10646

dx.doi.org/10.1021/ie200692m |Ind. Eng. Chem. Res. 2011, 50, 10643–10651

Industrial & Engineering Chemistry Research

ARTICLE

Figure 2. The relationship between y value and predicted y value with test data.

3.2. Application to Industrial Polymer Processes. We applied the proposed methods to actual industrial data obtained during an industrial polymer process at Mitsui Chemical in order to verify the prediction ability. Industrial polymer processes generally produce many grades of products. Therefore, when a polymer grade changes, it is important to reduce the quantity of offgrade material. Thus, an early and accurate judgment on whether the polymer quality is within the given specifications or not is made by using soft sensors because it is impossible to perform online measurements of a large number of polymer quality variables by using hard sensors. Constructing models with high prediction performance is difficult in present circumstances. One of the reasons is that there can be nonlinear relationship between a polymer quality variable y and other process variable X.1921 Thus, it is often difficult to accurately judge whether the grade transition is complete or not. Kaneko et al.22 construct models that detect the completion of transition to ensure that the polymer quality evaluated after transition conforms to the predicted one. The models used to detect the completion of transition are called as discriminant models. However, these discriminant models cannot predict polymer quality during the

transition and besides cannot deal with new polymer grades and polymer grades whose number of data is not large enough to construct discriminant models because each discriminant model is constructed with data of each polymer grade. Therefore, in order to predict polymer quality of new polymer grades and polymer grades whose number of data is small and predict polymer quality during the transition, we attempted to construct nonlinear models between X and y with various kinds of polymer grades. The constructed models can be applied to data of any kinds of polymer grades if the relationship between X and y can be extrapolated to those data. We collected data measured in steady state of many grades and modeled the relationship in this study. The objective variables represent melt flow rate (MFR) and density, and the explanatory variables X represent 38 variables such as the temperature in the reactor and the pressure and concentration of the monomer, comonomer, and hydrogen. We used the data monitored from January 2005 to April 2007 as the training data and that from May 2007 to May 2008 as the test data. In this case study, the time lag i in eqs 1, 2, and others is not constant because time difference was calculated between values before transition and those after transition. 10647

dx.doi.org/10.1021/ie200692m |Ind. Eng. Chem. Res. 2011, 50, 10643–10651

Industrial & Engineering Chemistry Research

ARTICLE

Table 2. Modeling and Prediction Results for MFR r2

RMSE q2

RMSEcv rpred2 RMSEtest

A: PLS, value

0.943 2.11 0.421

6.79

0.931

1.83

B: PLS, time difference

0.908 2.70 0.875

3.16

0.805

3.08

C: SVR, value

0.995 0.64 0.876

3.15

0.974

1.13

D: SVR, time difference

0.985 1.08 0.723

4.70

0.926

1.89

E: PLS, value, Xphysi

0.997 0.48 0.992

0.75

0.969

1.21

F: PLS, time difference, Xphysi 0.995 0.62 0.993

0.77

0.988

0.75

I: PLS, time difference, ycalc

0.996 0.59 0.995

0.62

0.982

0.94

J: SVR, time difference, ycalc

0.998 0.41 0.992

0.79

0.990

0.70

The models A, B, C, D, E, F, I, and J in section 3.1. were applied, and the results were compared. We referred (McAuley and MacGregor, 1991) for physical models in industrial polymer processes. For MFR, the variables calculated by the physical model were set as two variables below   ½M2  ½M3  ½H2  1 ð26Þ þ k2 þ k3 , log k0 þ k1 ½M1  ½M1  ½M1  T where [M1] is concentration of monomer; [M2] and [M3] are concentration of comonomers; [H2] is concentration of hydrogen; T is temperature in a reactor; and k0, k1, k2, and k3 are constant parameters. For density, the variables calculated by the physical model were set as three variables that are two variables in eq 26 and a variable below   ½M2  ½M3  k7 þ k6 ð27Þ k4 þ k5 ½M1  ½M1  where k4, k5, k6, and k7 are constant parameters. The parameters in eqs 26 and 27 were optimized by using GA. Additionally, variable selection was done at the same time by adding the information on X-variables to bit of chromosome. Table 2 shows the modeling and prediction results of MFR. The details of the statistics are shown in Appendix C. RMSE, RMSEcv, and RMSEtest are corresponding to r2, q2, and rpred2, respectively. The rpred2 value of A, that is a linear regression method, is lower that those of C and E, that are methods considering nonlinearity, which indicates that there exists nonlinearity between MFR and other process variables. Besides, the rpred2 value of C was the highest in those of A, B, C, and D, and the RMSEtest value of C was the lowest as well, which indicates the contribution of process nonlinearity is higher than that of the effects of deterioration with age. These are probably why predictive accuracy did not increase using the time difference models. Figure 3 shows plots of measured and predicted MFR. Though the plot of Figure 3(a) shows an almost linear trend along the diagonal globally, if small values of MFR are focused, there exists the bias of the prediction errors as shown by simulation data in section 3.1., which would come from the effects of deterioration with age. On the other hand, from the rpred2 values and the RMSEtest values of F, I, and J in Table 2, predictive accuracy increased by using the proposed methods. It is conceivable that the variables of eq 26 could model the nonlinearity in the process variables and time difference could account for the effects of deterioration with age. Besides, when the nonlinearity was extracted by the SVR method, the SVR based time difference model could largely increase predictive accuracy. The plots of Figure 3(c) and (e) show much tighter clusters of predicted values along the diagonal, reflecting the higher prediction of MFR. In addition, the bias of the

Figure 3. The relationship between measured and predicted MFR with test data.

prediction errors of small values of MFR could be eliminated as shown in Figure 3(d) and (f). It is important for soft sensors to reduce the bias because it will increase as time passes, and we cannot distinguish the bias and the variation of predicted values in prediction if the bias exists inherently. We confirmed that overall values of MFR could be predicted with high accuracy by using the proposed methods. It should be noted that the proposed models are applicable to any polymer grades while the discriminant model detecting the completion of transition, which is proposed in (Kaneko et al., 2010b), cannot be applied to new polymer grades and polymer grades whose number of data is not large enough because each discriminant model must be constructed with data of each polymer grade. The modeling and prediction results of density are shown in Table 3. The rpred2 value of A is lower that those of C and E, and the rpred2 value of C was the highest in those of A, B, C, and D as well as in MFR, which indicates that there also exists nonlinearity between density and other process variables and its contribution would be higher than that of the effects of deterioration with age. Thus, predictive accuracy did not increase using the time difference models. Figure 4 shows the relationship between 10648

dx.doi.org/10.1021/ie200692m |Ind. Eng. Chem. Res. 2011, 50, 10643–10651

Industrial & Engineering Chemistry Research

ARTICLE

Table 3. Modeling and Prediction Results for Density r2

RMSE (103)

q2

RMSEcv (103)

rpred2

RMSEtest (103)

A: PLS, value

0.980

1.65

0.969

2.05

0.922

3.37

B: PLS, time difference

0.975

1.83

0.962

2.28

0.942

2.92

C: SVR, value

0.991

1.11

0.980

1.66

0.958

2.48

D: SVR, time difference

0.977

1.77

0.956

2.45

0.948

2.75

E: PLS, value, Xphysi

0.982

1.58

0.976

1.80

0.936

3.05

F: PLS, time difference, Xphysi

0.972

1.95

0.962

2.26

0.943

2.90

I: PLS, time difference, ycalc

0.983

1.53

0.981

1.63

0.968

2.20

J: SVR, time difference, ycalc

0.986

1.39

0.981

1.61

0.972

2.04

Besides, the SVR based time difference models after extracting the nonlinearity by using the SVR method achieved high predictive accuracy for both MFR and density, which probably came from the robustness of the SVR method. The SVR method is based on statistical learning theory and is less likely to overfit to noise in the training data. This can contribute to improve the predictive accuracy because the proposed method performs statistical modeling two times and can be sensitive to noise in data. The SVR method could probably prevent overfitting when the nonlinearity between X and y was modeled and the time difference models were constructed.

Figure 4. The relationship between measured and predicted density with test data.

measured and predicted density. Though the plot of Figure 4(a) shows an almost linear trend along the diagonal globally, there exists the bias of the prediction errors as shown by the simulation data in section 3.1. This probably comes from the effects of deterioration with age. From the rpred2 and RMSEtest value of F in Table 3, predictive accuracy did not increase by using F. It is conceivable that the variables eqs 26 and 27 could not model the nonlinearity between density and other process variables, as the predictive accuracy was bad using G with simulation data in section 3.1, because the physical model of density covers the experimental term largely. However, predictive accuracy increased by using I and J. Therefore, we can say that SVR could model the nonlinearity of process variables, and then, time difference could account for the effects of changes with age. The plots of Figure 4(b) and (c) show a much tighter clustering of predicted values along the diagonal, reflecting the higher prediction of density. Additionally, the bias of the prediction errors could be eased as shown in Figure 4(b) and (c), which is important for soft sensors as mentioned above. It is confirmed that overall values of density could be predicted with high accuracy by using the proposed methods.

4. CONCLUSION In this paper, we have proposed the construction of time difference models after modeling the nonlinear relationships between and among process variables in order to construct high predictive soft sensor models in case that there exists the nonlinearity among process variables. First we verified the superiority of the proposed methods over traditional ones with the simulation data where the nonlinearity among the variables and noise exist and then applied these methods to the actual industrial data obtained during an industrial polymer process. The proposed models achieved high predictive accuracy for MFR by adding the variables calculated by the physical model and for density by describing nonlinearity by SVR. This also indicates that constructing appropriate physical models is important for improving predictive ability. In addition, the bias of the prediction errors could be eliminated in both cases by using the proposed method, which is of significant importance for soft sensors. The SVR based time difference models after extracting the nonlinearity by using the SVR method achieved high predictive accuracy for both MFR and density. We therefore confirmed the usefulness of the proposed method without reconstruction of soft sensor models. In this case study, we focused on MFR and density as examples of nonlinearity among process variables, but the proposed methods can be easily applied to predict other kinds of polymer quality such as the molecular weight distribution. The proposed methods can be combined with other kinds of regression methods and thus can be used in various fields of the soft sensor. It is expected that the problems of maintenance of soft sensor models are reduced by using our methods. Besides, we plan to predict polymer quality in transition by applying the proposed method as a future work. ’ APPENDIX A: PLS PLS is a method for relating X and y, by a linear multivariate model, but goes beyond traditional regression methods in that 10649

dx.doi.org/10.1021/ie200692m |Ind. Eng. Chem. Res. 2011, 50, 10643–10651

Industrial & Engineering Chemistry Research

ARTICLE

it models also the structures of X and y. In PLS modeling, the covariance between score vector ti ∈ Rm1 and y is maximized. Generally, PLS models have higher predictive power than those of multiple linear regression. A PLS model consists of the following two equations X ¼ TP0 þ E

ðA:1Þ

y ¼ Tq0 þ f

ðA:2Þ

where P ∈ R is an X-loading matrix, q ∈ R is a y-loading vector, E ∈ Rmn is a matrix of X residuals, f ∈ Rm1 is the vector of y residuals, and l is the number of components. The PLSregression model is as follows nl

1l

y ¼ Xb þ const b ¼ WðP0 WÞ1 q0 where W ∈ R is an X-weight matrix, and b ∈ R regression coefficients. nl

n 1

jj jj

2

þ C

∑i jyi  f ðxi Þje

Lower value of RMSE means more accurate prediction by the constructed model.

ðA:4Þ

’ AUTHOR INFORMATION

is a vector of

ðB:1Þ

ðB:2Þ

where yi and xi ∈ R1n, are training data; w ∈ Rn 1 is a weight vector; e is a threshold, and C is a penalizing factor which controls a trade-off between a training error and a margin. By minimization of eq B.1, we can construct a regression model which has a well balance between adaptive ability to the training data and generalization capability. A kernel function in our application is a radial basis function 0 2

Kðx, x0 Þ ¼ eγjx  x j

ðB:3Þ

where γ is a tuning parameter controlling width of the kernel function.

’ APPENDIX C: STATISTICS To construct a highly predictive model, the number of components in PLS models and tuning parameters in SVR models must be appropriately decided. r2 and q2 values are used as the measure and defined as follows

∑ðyobs  ycalc2Þ2 ∑ðyobs  yÞ̅ 2 ∑ðyobs  ypred Þ q2 ¼ 1  ∑ðyobs  yÞ̅ 2

Corresponding Author

*Phone: +81-3-5841-7751. Fax: +813-5841-7771. E-mail: funatsu@ chemsys.t.u-tokyo.ac.jp.

subject to jyi  f ðx i Þje ¼ maxð0, jyi  f ðx i Þj  eÞ



ðA:3Þ

’ APPENDIX B: SVR SVR is a method applying support vector machine (SVM) to a regression analysis and can construct nonlinear models by applying the kernel trick as well as SVM. Primal form of SVR can be shown to be a following optimization problem: Minimize 1 2 w

generalize. In this study, the leave-one-out method is used in calculation of ypred. r2 represents the fitting accuracy of constructed models. On the one hand, q2 represents the prediction accuracy of constructed models. Both values close to unity are favorable. Both r2 and q2 values must be compared among models constructed with the same data of the same objective variable. In addition, root-mean-square error (RMSE) of ycalc and ypred is defined as follows sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðyobs  ycalc, pred Þ2 RMSE ¼ ðC:2Þ n

r2 ¼ 1 

ðC:1Þ

where yobs is the measured y value; ycalc is the calculated y value; and ypred is the predicted y value in the procedure of crossvalidation which is a technique for assessing how the results will

’ ACKNOWLEDGMENT The authors acknowledge the support of Mitsui Chemical Corporation and the financial support of Japan Society for the Promotion of Science. ’ REFERENCES (1) Kano, M.; Nakagawa, Y. Data-based Process Monitoring, Process Control, and Quality Improvement: Recent Developments and Applications in Steel Industry. Comput. Chem. Eng. 2008, 32, 12. (2) Kadlec, P.; Gabrys, B; Strandt, S. Data-driven Soft Sensors in the Process Industry. Comput. Chem. Eng. 2009, 33, 795. (3) Wold, S.; Sj€ ostr€ om, M.; Eriksson, L. PLSregression: a Basic Tool of Chemometrics. Chemom. Intell. Lab. Syst. 2001, 58, 109. (4) Lin, B.; Recke, B.; Knudsen, J. K. H.; Jorgensen, S. B. A Systematic Approach for Soft Sensor Development. Comput. Chem. Eng. 2007, 31, 419. (5) Qin, S. J. Recursive PLS Algorithms for Adaptive Data Modelling. Comput. Chem. Eng. 1998, 22, 503. (6) Cheng, C.; Chiu, M. S. A New Data-based Methodology for Nonlinear Process Modeling. Chem. Eng. Sci. 2004, 59, 2801. (7) Kaneko, H.; Funatsu, K. Maintenance-Free Soft Sensor Models with Time Difference of Process Variables. Chemom. Intell. Lab. Syst. 2011, 107, 312. (8) Kaneko, H.; Arakawa, M.; Funatsu, K. Development of a New Soft Sensor Method Using Independent Component Analysis and Partial Least Squares. AIChE J. 2009, 55, 87. (9) Kaneko, H.; Arakawa, M.; Funatsu, K. Applicability Domains and Accuracy of Prediction of Soft Sensor Models. AIChE J. 2011, 57, 1506. (10) Ookita, K. Operation and Quality Control for Chemical Plants by Soft Sensors. CICSJ Bull. 2006, 24, 31(in Japanese). (11) Dai, X. Z.; Wang, W. C.; Ding, Y. H.; Sun, Z. Y. Assumed Inherent Sensor Inversion Based ANN Dynamic Soft-sensing Method and its Application in Erythromycin Fermentation Process. Comput. Chem. Eng. 2006, 30, 1203. (12) Qin, S. J.; Yue, H. Y.; Dunia, R. Selfvalidating Inferential Sensors with Application to Air Emission Monitoring. Ind. Eng. Chem. Res. 1997, 36, 1675. (13) Kadlec, P.; Gabrys, B. Architecture for Development of Adaptive On-line Prediction Models. Memetic Comput. 2009, 1, 241. (14) McAuley, K. B.; MacGregor, F. J. Online Inference of Polymer Properties in an Industrial Polyethylene Reactor. AIChE J. 1991, 37, 825–835. 10650

dx.doi.org/10.1021/ie200692m |Ind. Eng. Chem. Res. 2011, 50, 10643–10651

Industrial & Engineering Chemistry Research

ARTICLE

(15) Vapnik, V. N. The Nature of Statistical Learning Theory; Springer: Berlin, 1995. (16) Yan, W. W.; Shao, H. H.; Wang, X. F. Soft Sensing Modeling Based on Support Vector Machine and Bayesian Model Selection. Comput. Chem. Eng. 2004, 28, 1489. (17) Whitley, D. A. Genetic Algorithm Tutorial. Stat. Comput. 1994, 4, 65. (18) Kaneko, H.; Arakawa, M.; Funatsu, K. Development of a New Regression Analysis Method Using Independent Component Analysis. J. Chem. Inf. Model. 2008, 48, 534. (19) Kim, M.; Lee, Y. H.; Han, I. S.; Han, C. Clusteringbased Hybrid Soft Sensor for an Industrial Polypropylene Process with Grade Changeover Operation. Ind. Eng. Chem. Res. 2005, 44, 334–342. (20) Lee, D. E.; Song, J. H.; Song, S. O.; Yoon, E. S. Weighted Support Vector Machine for Quality Estimation in the Polymerization Process. Ind. Eng. Chem. Res. 2005, 44, 2101. (21) Vijayakumar, S.; D’Souza, A.; Schaal, S. Incremental Online Learning in High Dimensions. Neural Comput. 2005, 17, 2602. (22) Kaneko, H.; Arakawa, M.; Funatsu, K. Novel Soft Sensor Method for Detecting Completion of Transition in Industrial Polymer Processes. Comput. Chem. Eng. 2011, 35, 1135.

10651

dx.doi.org/10.1021/ie200692m |Ind. Eng. Chem. Res. 2011, 50, 10643–10651