Nonlinear Soft Sensor Development Based on Relevance Vector

Aug 4, 2010 - While SVM/LSSVM can only provide a point estimation of the prediction result, RVM gives a probabilistic prediction result, which is more...
18 downloads 5 Views 3MB Size
Ind. Eng. Chem. Res. 2010, 49, 8685–8693

8685

Nonlinear Soft Sensor Development Based on Relevance Vector Machine Zhiqiang Ge*,† and Zhihuan Song*,†,‡ State Key Laboratory of Industrial Control Technology, Institute of Industrial Process Control, Zhejiang UniVersity, Hangzhou 310027, Zhejiang, P. R. China, and Ningbo Institute of Technology, Zhejiang UniVersity, Ningbo, 315100, Zhejiang, China

This paper proposes an effective nonlinear soft sensor based on relevance vector machine (RVM), which was originally proposed in the machine learning area. Compared to the widely used support vector machine (SVM) and least-squares support vector machine (LSSVM) based soft sensors, RVM gives a more sparse model structure, which can greatly reduce computational complexity for online prediction. While SVM/LSSVM can only provide a point estimation of the prediction result, RVM gives a probabilistic prediction result, which is more sophisticated for the soft sensor application. Furthermore, RVM can successfully avoid several drawbacks of the traditional support vector machine type method, such as kernel function limitation, parameter tuning complexity, and etc. Due to the advantages of RVM, a practical application of this method is made for soft sensor modeling in this paper. To evaluate the performance of the developed soft sensor, two case studies are demonstrated, which both support that RVM performs much better than other methods for soft sensing. 1. Introduction To guarantee the performance of control and optimization in chemical engineering processes, important variables such as product quality and key process variables should be reliable and measured accurately.1-3 However, these important variables are always very difficult to be measured, especially, if the measurements are taken online. Although there are several hardware support analysis sensors available for online measurement, they are very consumed, and difficult to maintain. Beside, the measurement delay of the hardware based sensor may influence the control performance of the process. Therefore, in recent years, it is more acceptable that we use soft sensor for estimating and predicting important variables for the process.4-6 Generally, there are two classes of soft sensors, namely modelbased and data-based. The model-based soft sensor is mainly focused on first principle models, which describe the physical and chemical background of the process. However, it is very difficult and costly to build accurate first principle models for modern chemical processes. On the other hand, data-based soft sensors have gained increasing popularity in the process industry. Probably, the most acceptable technique for soft sensor development is the partial least-squares (PLS) based method.7-15 However, this method can only model linear relations between the process data. Since the nonlinear behavior is widely existed in modern chemical processes, it is required that the nonlinear soft sensor should be developed. Conventionally, the traditional PLS method has been extended to its nonlinear counterpart, such as neural network PLS (NNPLS), kernel PLS (KPLS), and etc.16-19 Another widely accepted nonlinear soft sensor technique is the artificial neural network (ANN) based method, which can successfully handle the nonlinear relations of the process data. Traditional ANN based soft sensor techniques include multilayer perceptron (MLP), feed-forward networks (FFN), self-organizing map (SOM), radial basis function network (FBFN), and etc.20-24 However, it is reported that the determination of the network * Address correspondence to either author. E-mail: zqge@ iipc.zju.edu.cn (G.Z.), [email protected] (S.Z.). † State Key Laboratory of Industrial Control Technology, Institute of Industrial Process Control. ‡ Ningbo Institute of Technology.

topology and the generalization performance of the ANN based method is still unsolved up to now. Besides, neural network may get stuck in local minima through the training process. Recently, support vector machine (SVM) has found an increasing amount of application in modeling nonlinear systems.25 Compared to the traditional PLS and ANN based methods, SVM has better generalization performance and can work well under limited training data samples. Detailed derivation and theoretical justification of SVM can be found in the book of Vapnik.25 Different application aspects of SVM have also been discussed and reported.26-32 Particularly, the SVM method has been extended to different counterparts, such as least-squares SVM and regression (LSSVM/LSSVR), support vector data description (SVDD), etc.33-37 Although LSSVM can be optimized more accurately due to its computational simplicity, the sparse property is lost, since LSSVM uses all training sample in their final model structure. On the other hand, SVM can successfully obtain sparsity because it only needs a fraction of training samples among the whole data set for model construction. In spite of the advantages of SVM, it has some significant drawbacks, which are listed as follows: (1) SVM does not allow for the use of arbitrary kernel function, because the function must satisfy Mercer’s conditions; (2) SVM needs to determine the trade-off and insensitivity parameters, which may generally entail a cross-validation procedure, thus the computation burden will be enlarged; (3) the output of SVM is only a point estimation of the predicted variable, the uncertainty of the prediction has not been captured. In other words, the SVM method is not probabilistic. However, in our opinion, almost all process measurements are essentially random variables, this is because they always contaminated by random white noises. According to the statistical viewpoint, all statistical inferences and decisions should be made through the probabilistic manner. Therefore, it is necessary to express the uncertain characteristic of the process data in both modeling and prediction steps. To address this limitation of the SVM method, fortunately, a method named relevance vector machine (RVM) has recently been proposed.38 RVM shares the similar function form as SVM, but it is implemented through a probabilistic manner and has no limitation of the kernel function form. By introducing a prior

10.1021/ie101146d  2010 American Chemical Society Published on Web 08/04/2010

8686

Ind. Eng. Chem. Res., Vol. 49, No. 18, 2010

distribution over the weights, the Bayesian approach has been adopted for RVM learning. As a result, the prediction of RVM is probabilistic, thus the uncertainty of the predicted variable can be characterized. Furthermore, it is reported that the sparsity of RVM is much better than that of SVM.39 Due to the modeling efficiency of the RVM method, it has been used in many different areas, such as machine learning, pattern recognition, etc.39-43 However, to our best knowledge, the application of RVM in the process control area has rarely been reported, especially for the purpose of soft sensor development. This paper intends to introduce RVM for the development of a nonlinear soft sensor, through which the estimation and prediction performance can both be enhanced. The rest of this paper is structured as follows. First, the detailed description and advantages of RVM for soft sensor modeling are demonstrated in Section 2, which is followed by two industrial case studies in Section 3. Finally, some conclusions and discussions are made in Section 4. 2. Nonlinear Soft Sensor Development Based on RVM In this section, the detailed description of the RVM algorithm is given in the first subsection 2.1, based on which the RVMbased nonlinear soft sensor can be developed. Then, some advantages of the developed nonlinear soft sensor are remarked in subsection 2.2. 2.1. RVM for Nonlinear Soft Sensor Modeling. Through introduction of the Bayesian probabilistic method, RVM can be developed upon the framework of SVM. The idea of RVM was originally proposed by Tipping. Denote the training data set as {ui, yi}i)1,2,...n, the following generalized linear regression model can be used to describe the relation between the input and output variables of the soft sensor y ) f(u, w) + e

(1)

where e is the random noise, which is assumed to be independent ˜ (0,σ2). zero-mean Gaussian distributed with variance σ2, thus eN Like the LSSVM method, the nonlinear function f(u) can also be expressed as a linearly weighted sum of some basis functions38 n

f(u, w) )

∑ i)1

n

wiK(u, ui) + w0 )

∑ w ψ(u) i

(2)

i)0

where w ) [w0,w1,w2, ..., wn] is the weighted parameter vector of the basis functions ψ(u), which is given as ψ(u) ) [1,K(u,u1),K(u,u2), ..., K(u,un)]T. Since we have modeled the process noise term, the conditional distribution of the output variable y can be given as p(y|u) ) N(f(u,w),σ2). Due to the assumption of independence of y, the likelihood of the complete data set can be written as p(y|w, σ2) )

1 1 exp{- 2 |y - ψ(u)w|} 2 n/2 (2πσ ) 2σ

(3)

where y ) (y1,y2, ..., yn), ψ(u) ) [ψ1(u),ψ2(u), ..., ψn(u)]. To obtain optimal values of w and σ2, the maximum-likelihood estimation should be carried out. However, with as many parameters in the model as training data samples, the overfitting problem could probably be caused. To avoid this, RVM adopts the Bayesian method, and constrains w and σ2 by defining an explicit prior probability distribution over them, which are given as the following39

n

p(w|r) )



N(wi |0, R-1 i ) )

i)0

1 (2π)(n+1)/2

( )

n



R1/2 i exp -

i)0

Riw2i 2 (4)

n

p(r) )

∏ gamma(R |a, b)

(5)

i

i)0

p(β) ) gamma(β|c, d)

(6)

where β ) σ-2, r is a M + 1 vector parameter set, and Gamma(R|a,b) is a gamma function, which is defined as gamma(r|a, b) ) Γ(a)-1baRa-1e-bRΓ(a) )



∞ a-1 -t

0

t

e dt

(7) To make the defined priors noninformative, the parameters a, b, c, and d can be set as small values, such as a ) b ) c ) d ) 10-4. Having defined the prior probability of the parameter set, the posterior parameter distribution can be calculated through the Bayesian rule, thus p(y|w, σ2)p(w|r) ) p(y|r, σ2) 1 1 |Σ| -1/2exp - (w - µ)TΣ-1(w - µ) (n+1)/2 2 (2π)

p(w|y, r, σ2) )

{

}

(8)

which is also a Gaussian distribution with mean and variance given as follows Σ ) (σ-2ψT(u)ψ(u) + A)-1

(9)

µ ) σ-2ΣψT(u)y

(10)

where Ais a diagonal matrix with elements as A ) diag(R0,R1, ..., Rn). However, it is difficult to estimate the optimal hyperparameter values directly, because they are analytically intractable. In the present paper, we only formulate the RVM model through a Type-II maximum likelihood procedure, the more general case can be found in Tipping.39 Hence, a most probable point estimation value of the hyperparameters rop ) 2 can be found through the [R0,op,R1,op, ..., Rn,op ub>] and Rop maximization of the marginal likelihood function with respect to these two hyperparamters, detailed derivation of which is provided in Appendix A. An important characteristic of RVM is that the values of many hyperparameters will become infinite after the optimal estimation rop has been obtained. Therefore, according to eq 4, it implies that distribution of many weights wi will be highly peaked to zero, thus wi will become zero. Then the mean and variance values of the w given in eqs 9 and 10 can be reformulated as T -1 Σop ) (σ-2 op ψ (u)ψ(u) + Aop)

(11)

T µop ) σ-2 op Σopψ (u)y

(12)

where Aop ) diag(rop). Since many elements of w become to zero, the optimal mean wvalue µop will correspondingly comprise very few nonzero elements. After the optimal parameter set has been determined, we are in the position to use it for prediction of the new data sample, which is assumed as unew. The predicted output variable of the RVM based soft sensor can be calculated as

Ind. Eng. Chem. Res., Vol. 49, No. 18, 2010

p(yˆnew |unew, y, rop, σ2op)



)

p(yˆnew |unew, w, σ2op)p(w|unew, y, rop, σ2op)dw

(13)

which is also Gaussian distributed, with its mean and variance given as follows µy,new ) µTopψ(unew)

(14)

2 σy,new ) σ2op + ψT(unew)Σopψ(unew)

(15)

where ψ(unew) ) [1,K(unew,u1),K(unew,u2), ..., K(unew,un)]T. 2.2. Some Remarks. Compared to the terminology “support vector” in the traditional SVM method, the nonzero elements of w which are related to training data samples are called “relevance vectors” in the RVM method. It has been reported that the sparseness of RVM is much better than that of SVM/ LSSVM, while the generalization of both methods are comparative. Therefore, the computation efficiency of online soft sensor prediction can be greatly improved by the RVM method. Different from the traditional SVM/LSSVM method which can only exhibit a point prediction value of the real value ynew, the probability distribution of the output variable is detailed provided by the RVM method, which is very important for statistical judgment and decision making. In summary, the advantages of the developed RVM based nonlinear soft sensor can be listed as follows: (1) the uncertainty of the predicted variable can be characterized; (2) RVM has good sparseness than SVM, thus can efficiently reduce online prediction complexity; (3) RVM does not need to regularize the parameter C in the SVM method during the modeling training phase. However, to be frank, compared to SVM and

8687

LSSVM, there is one drawback of the RVM method. That is, RVM needs more time to train the model because it has involved a highly nonlinear optimization step. In our opinion, if the soft sensor model is developed offline and only to be used for online prediction, then the high computational complexity of the offline modeling phase is acceptable because it does not influence the online prediction phase. However, if the process condition is changed frequently, the soft sensor model should also be kept adaptive to the change of the process. In this case, online modeling of the RVM soft sensor is required. While the high complexity of the basic RVM algorithm may cause a time delay of the online prediction step, an incremental version of the RVM method can be incorporated, which is much more efficient for online and adaptive modeling. 3. Case Studies This section gives two industrial examples for performance evaluation of developed nonlinear soft sensor. For comparison, the traditional PLS and LSSVM models are also constructed. In order to compare the modeling performance of the three methods more accurately, the root-mean-square error (RMSE) criterion is defined as follows

RMSE )



n

∑ (yˆ

i

- yi)2

i)1

(16)

n

where i ) 1,2, ..., n, yi and yˆi are real and predicted values, respectively, n is the total number of test data samples. 3.1. TE Benchmark Process. As a benchmark simulation plant, the Tennessee Eastman process has been widely used for

Figure 1. Control system of the Tennessee Eastman process. Table 1. Selected Input Variables of TE Process no.

variables

no.

variables

no.

variables

1 2 3 4 5 6 7 8 9 10 11

A feed D feed E feed total feed recycle flow reactor feed rate reactor pressure reactor level reactor temperature purge rate product separator temperature

12 13 14 15 16 17 18 19 20 21 22

product separator level poduct separator pressure product separator underflow stripper level stripper pressure stripper underflow stripper temperature stripper steam flow compressor work reactor cooling water outlet temperature separator cooling water outlet temperature

23 24 25 26 27 28 29 30 31 32 33

D feed flow valve E feed flow valve A feed flow valve total feed flow valve compressor recycle valve purge valve separator pot liquid flow valve stripper liquid product flow valve stripper steam valve reactor cooling water flow condenser cooling water flow

8688

Ind. Eng. Chem. Res., Vol. 49, No. 18, 2010

Figure 2. Highlighted relevance vectors of RVM for the TE process. Table 2. RMSE Values of Three Different Soft Sensors for TE Process methods

PLS

LSSVM

RVM

RMSE

0.1077

0.1052

0.1043

algorithm testing. This process consists of five major unit operations: a reactor, a condenser, a compressor, a separator, and a stripper. The control structure is shown schematically in Figure 1 which is the second structure listed in Lyman and Georgakis.44 There are 41 measured variables (22 continuous process measurements and 19 composition measurements) and 12 manipulated variables in TE process. The details on the process description are well explained in Downs and Vogel.45 Among all of the 53 process variables, the agitate speed is not manipulated, thus it is exclude in this case study. Compared to the other 33 variables, the sample rates of 19 composition measurements are much lower. Therefore, these variables should be predicted online, thus the soft sensor is developed for them. In this study, one of the 19 component variables is selected for soft sensor development, which is the flow value of component B in the purgeFive hundred data samples are generated for training RVM, LSSVM, and PLS soft sensor models. To test the prediction performance of the three soft sensors, a total of 960 data samples are also provided. To build the PLS, RVM, and LSSVM soft sensors, several parameters should be selected first. The retained components number of PLS is selected as 10, which can explain most of the process data information. The kernel parameter of RVM is determined as 12.5, whereas the trade-off and kernel parameters of LSSVM are chosen as 25 and 1, both of which can gain the best prediction performance. Particularly, nine relevance vectors have been selected by the RVM model, which are highlighted among the training samples in Figure 2. Corresponding values of the weighted parameter for all of the training data samples

Figure 4. Prediction results and estimation errors of RVM for the TE process.

Figure 5. Predictive variances of testing data samples in TE process.

are given in Figure 3(a). In accordance to the nine selected relevance vectors, only these weighted parameters have significant values which are used for soft sensor prediction. To examine the determination performance of the weighted parameter, the gamma values of each relevance vector is shown in Figure 3(b), through which we can find that most weighted parameters have been well determined, since their gamma values are much bigger than zero. To test the prediction performance of the three developed soft sensors, 960 data samples of the testing data set are utilized. The prediction results of all three soft sensors are given in Table 2, through which we can find that RVM has gained the best predicted performance, because it has the smallest RMSE among these three soft sensors. Particularly, detailed prediction results and estimation errors of RVM are given in Figure 4. It can be found that the testing data samples have been well fitted. In order to examine the online computational efficiency of the

Figure 3. Weighted and gamma parameter values of RVM for the TE process, (a) weighted parameter values; (b) gamma values of the selected relevance vectors.

Ind. Eng. Chem. Res., Vol. 49, No. 18, 2010

8689

Table 4. RMSE Values of Three Different Soft Sensors for Predicted Variables methods

PLS

LSSVM

RVM

RMSE of y1 RMSE of y2

0.0528 0.0563

0.0401 0.0512

0.0397 0.0511

Table 5. CPU Time Comparison of RVM and LSSVM

Figure 6. T2 monitoring results of the PCA method for the testing data set. Table 3. Input Variables of the Soft Sensor in SRU Process input variables number

variable descriptions

u1 u2 u3 u4 u5

MEA gas flow first air flow second air flow gas flow in SWS zone air flow in SWS zone

RVM soft sensor, the CPU times of both RVM and LSSVM for prediction of all 960 testing samples have been calculated, which are 0.2500 s and 1.2625 s for RVM and LSSVM, respectively. Therefore, the online prediction efficiency has been greatly improved by the RVM soft sensor. To examine the characterized uncertainty information of the RVM based soft sensor, the predictive variance of each testing data sample is shown in Figure 5. Through this result, it can be found that the predictive variance suddenly grows after sample number 800. Depending on the uncertainty information of the RVM, it can be inferred that the data samples after 800 have come out of

methods

LSSVM

RVM

CPU time of y1 CPU time of y2

2.3906 s 2.3594 s

1.1719 s 1.1416 s

the modeling space. Actually, TE process has been widely used for process monitoring case study. For simplicity, the traditional PCA based monitoring method can be utilized to examine the operation condition of the testing data samples. The monitoring results of the T2 monitoring statistic of the PCA based method for the testing data set are given in Figure 6, through which one can be clearly seen that the normal operation condition of the process has been violated. Therefore, the predictive variance can also express the condition of the process very well. 3.2. SRU Process. The sulfur recovery unit (SRU) can remove environmental pollutants from acid gas streams before they are released into the atmosphere. Meanwhile, the recovered sulfur could become as a valuable byproduct. The SRU process takes in two kinds of acid gases as its inputs. The first one is called MEA gas, which comes from the gas washing plant and is rich in H2S. The second input gas in call SWS, which comes from the sour water stripping (SWS) plant, and is rich in H2S and NH3. The input acid gases are first burnt in reactors where H2S can be transformed into pure sulfur, then the gaseous combustion products from the furnaces is cooled, which in result generates the liquid sulfur. The liquid sulfur is further passed through high temperature converters, through which the final

Figure 7. A simplified scheme of the SRU process.4

Figure 8. Data characteristics of both input and output variables, (a) input variables; (b) output variables.

8690

Ind. Eng. Chem. Res., Vol. 49, No. 18, 2010

Figure 9. Highlighted relevance vectors of RVM, (a) First output variable; (b) Second output variable.

Figure 10. Weighted and gamma parameter values of RVM for the second output variable, (a) weighted parameter values; (b) gamma values of the selected relevance vectors.

Figure 11. Weighted and gamma parameter values of RVM for the second output variable, (a) weighted parameter values; (b) gamma values of the selected relevance vectors.

sulfur can be produced.4,46 A simplified scheme of the SRU process can be described in Figure 7. It is noted that the air which supplies oxygen for the reaction is an important parameter in the conversion of H2S. An excessive air flow will increase the concentration of SO2, while a low air flow could possibly reduce the concentration of SO2. In order to monitor the performance of the conversion process and improve the sulfur extraction rate, online soft sensors are need for measuring the concentration of both H2S and SO2 in the tail gas of each sulfur line. For prediction of these two important variables, five process variables listed in Table 3 are used, which can be measured more easily. For the construction of soft sensors, a total of 1680 data samples have been used. Data characteristics of both input and output variables are shown in Figure 8. Three components are selected for the PLS soft sensor, the kernel parameter of RVM is chosen as 0.95, the trade-off and kernel parameters of LSSVM are chosen as 150 and 2, respectively. As a result, 64 relevance vectors are selected by the first RVM model, whereas the number is 63 for the second RVM model. The relevance vectors of both two RVM models are highlighted in Figure 9(a) and (b), respectively. Similarly, corresponding values of weighted and gamma parameters are given in Figures 10 and 11 for both of the two RVM soft sensors. As can be seen from these two figures, most of the selected relevant vectors have been well determined. For testing purpose, a new data set with the same sample number of the training data set has also been collected. The prediction results of all three soft sensors are provided in Table 4. It can be seen that the prediction performance of RVM is similar with that of LSSVM for both of the two output variables. However, compared to the PLS soft sensor, the prediction performance has been greatly improved, especially for the first

Ind. Eng. Chem. Res., Vol. 49, No. 18, 2010

8691

output variable of this process. Particularly, detailed prediction results and estimation errors of RVM for both two output variables are shown in Figure 12 (a) and Figure 12 (b), respectively. Again, we can find that the testing data set has been fitted very well. Although LSSVM has similar prediction performance with RVM, the computation is less efficient, which can be analyzed through Table 5. In this table, the CPU times of LSSVM and RVM for predicting both of the two testing data sets have been listed. Compared to the results of LSSVM, the CPU times of both two RVM soft sensors have been improved. Since RVM has a sparse nonlinear model structure, only a fraction of training data samples is selected as relevance vectors. Due to the sparse behavior of the RVM model, the computational complexity of the online prediction step can be greatly reduced. Similarly, the predictive variance of each testing data sample in this process is given in Figure 13, through which it can seen that there are many abrupt value changes of the predictive variance for both of the two output variables. In our opinion, this is due to reason that the testing data samples have been out of the training region in some specific data points. Particularly, compare the output variables of the training data set and the testing data set which are given in Figure 8 (b) and Figure 14, one can find that the true operation condition of these abrupt changed points have violated the modeling space. Therefore, based on the predictive variance, these changes can be expressed. 4. Conclusions and Discussions

Figure 12. Prediction results and estimation errors of RVM, (a) First output variable; (b) Second output variable.

In the present paper, the recent developed RVM model has been introduced for nonlinear soft sensor development. Compared to the widely used SVM and LSSVM methods, RVM can overcome several drawbacks, and has much lower computational complexity for online prediction. Besides, RVM has a

Figure 13. Predictive variances of testing data samples in SRU process, (a) First output variable; (b) Second output variable.

Figure 14. Output variables of testing data set in SRU process, (a) First output variable; (b) Second output variable.

8692

Ind. Eng. Chem. Res., Vol. 49, No. 18, 2010

probabilistic prediction result, while SVM and LSSVM can only give a point estimation of the product variable. Since most process variables are contaminated by random noises, probabilistic prediction is very important for the soft sensor, which can efficiently extract the uncertainty information of the testing data sample. Depending on the predictive variance, the reliability of the soft sensor for the specific data point can be measure. Beside, the operation condition of the process can also be expressed by the predictive variance. Therefore, beside of the prediction function, the RVM based soft sensor can also give a monitoring result for engineer reference. To evaluate the performance of the developed soft sensor, two industrial case studies have been carried out. For comparison, the traditional PLS and LSSVM based soft sensors have also been developed. As a result, both examples have demonstrated that the RVM based soft sensor performs better than other two soft sensors. Due to the probabilistic prediction performance of the developed soft sensor, it can provide more useful information for the process, compared to traditional soft sensors. Appendix A: Derivation of the Two Optimal Parameters. Notice eqs 3 and 4, the likelihood of the complete data set can be reformulated as follows:

(17)

}

where I is an identity matrix with appropriate dimensionality. Change the likelihood function to its logarithm form, thus L(r, σ2) ) ln{p(y|u, r, σ2)} ) 1 n - ln(2π) - ln |σ2I + ψ(u)A-1ψT(u)| 2 2 1 T 2 y [σ I + ψ(u)A-1ψT(u)]-1y 2

(18)

Maximizing L(r,σ2) with respect to r and σ2, therefore, set the following two derivations to zero ∂L(r, σ2) )0 ∂r

(19)

∂L(r, σ2) )0 ∂σ2

(20)

where r ) [R0, R1, ..., Rn] is a parameter vector. However, the r and σ2 values cannot be obtained in closed form through eqs 20 and 21. Therefore, an iterative re-estimation method has been developed. The re-estimation of r value can be given as39 Rnew ) γi /µ2i i

(21)

where i ) 0, 1, 2, ..., n, µi is the ith posterior mean value calculated in eq 10, and γi is defined by39 γi ) 1 - RiΣii

(σ2)new )

|y - ψ(u)µ| n

n-

(23)

∑γ

i

i)1

Repeating the re-estimation step given in eqs 21-23 and updating the mean and variance values of the posterior weight, until some suitable convergence criteria has been satisfied, we can finally obtained the optimal values of r and σ2. Acknowledgment This work was supported in part by the National Natural Science Foundation of China (60974056), the National High Technology Research and Development Program of China (863 Program), Grant Number 2009AA04Z154, China Postdoctoral Science Foundation (20090461370), and Zhejiang Provincial Natural Science Foundation of China (Y1080871). Literature Cited



p(y|u, r, σ2) ) p(y|u, w, σ2)p(w|, r)dw ) 1 |σ2I + ψ(u)A-1ψT(u)| -1/2 × n/2 (2π) 1 exp - yT[σ2I + ψ(u)A-1ψT(u)]-1y 2

{

determined. When Ri has a large value, the corresponding wi is highly constrained by the prior, thus γi ≈ 0. On the other hand, when the Ri value is small, wi fits the data well, and γi ≈ 1. Similarly, the noise variance σ2 can be re-estimated as follow39

(22)

where Σii is the ith diagonal element of the posterior weight covariance calculated in eq 9. It is noticed that the value of γi is between zero and one, which can be interpreted as a measurement of how well the corresponding parameter wi is

(1) Tham, M. T.; Morris, A. J.; Montague, G. A. Soft-sensors for process estimation and inferential control. J. Process Control. 1991, 1, 3–14. (2) Kresta, J. V.; Marlin, T. E.; MacGregor, J. F. Development of inferential process models using PLS. Comput. Chem. Eng. 1994, 18, 597– 611. (3) Hartnett, M. K.; Lightbody, G.; Irwin, G. W. Dynamic inferential estimation using principal components regression (PCR). Chem. Intel. Lab. Syst. 1998, 40, 215–224. (4) Fortuna, L.; Graziani, S.; Rizzo, A.; Xibilia, M. G. Soft Sensors for Monitoring and Control of Industrial Processes; Springer: New York, 2007. (5) Kano, M.; Nakagawa, Y. Data-based process monitoring, process control and quality improvement: recent developments and applications in steel industry. Comput. Chem. Eng. 2008, 32, 12–24. (6) Kadlec, P.; Gabrys, B.; Strandt, S. Data-driven soft sensors in the process industry. Comput. Chem. Eng. 2009, 33, 795–814. (7) Qin, S. J. Recursive PLS algorithm for adaptive data modeling. Comput. Chem. Eng. 1998, 22, 503–514. (8) Lu, N. Y.; Yang, Y.; Gao, F. R.; Wang, F. L. Multirate dynamic inferential modeling for mulvariable processes. Chem. Eng. Sci. 2004, 59, 855–864. (9) Li, C. F.; Ye, H.; Wang, G. Z.; Zhang, J. A recursive nonlinear PLS algorithm for adaptive nonlinear process modeling. Chem. Eng. Technol. 2005, 28, 141–152. (10) Lee, Y. H.; Kim, M. J.; Chu, Y. H.; Han, C. H. Adaptive multivariate regression modeling based on model performance assessment. Chem. Intel. Lab. Syst. 2005, 78, 63–73. (11) Liu, J. L. On-line soft sensor for polyethylene process with multiple production grades. Control Eng. Pract. 2007, 15, 769–778. (12) Zhao, C. H.; Wang, F. L.; Mao, Z. Z.; Lu, N. Y.; Jia, M. X. Quality prediction based on phase-specific average trajectory for batch processes. AIChE J. 2008, 54, 693–705. (13) Facco, P.; Doplicher, F.; Bezzo, F.; Barolo, M. Moving average PLS soft sensor for online product quality estimation in an industrial batch polymerization process. J. Process Control 2009, 19, 520–529. (14) Fujiwara, K.; Kano, M.; Hasebe, S.; Takinami, A. Soft-sensor development using correlation based just-in-time modeling. AIChE J. 2009, 55, 1754–1765. (15) Zhao, C. H.; Wang, F. L.; Gao, F. R. Improved calibration investigation using phase-wise local and cumulative quality interpretation and prediction. Chem. Intel. Lab. Syst. 2009, 95, 107–121. (16) Adebiyi, O. A.; Corripio, A. B. Dynamic neural networks partial least squares (DNNPLS) identification of multivariable processes. Comput. Chem. Eng. 2003, 27, 143–155. (17) Bylesjo, M.; Rantalainen, M.; Nicholson, J. K.; Holmes, E.; Trygg, J. K-OPLS package: Kernel-based orthogonal projections to latent structures for prediction and interpretation in feature space. BMC Bioinf. 2008, 9, 106.

Ind. Eng. Chem. Res., Vol. 49, No. 18, 2010 (18) Sun, Q.; Wang, J. H.; Han, D. H. Improving the prediction model of protein in milk powder using GA-PLS combined with PC-ANN arithmetic. Spectrosc. Spectral Anal. 2009, 29, 1818–1821. (19) Zhang, Y. W.; Zhang, Y. Complex process monitoring using modified partial least squares method of independent component regression. Chem. Intel. Lab. Syst. 2009, 98, 143–148. (20) Willis, M. J.; Montague, G. A.; Massimo, C. D.; Tham, M. T.; Morris, A. J. Artificial neural networks in process estimation and control. Automatica 1992, 28, 1181–1187. (21) Mandic, D. P.; Chambers, J. A. Recurrent Neural Networks for Prediction: Learning Algorithms, Architectures and Stability;Wiley: New York, 2001. (22) Lee, M. W.; Joung, J. Y.; Lee, D. S.; Park, J. M.; Woo, S. H. Application of a moving-window-adaptive neural network to the modeling of a full-scale anaerobic filter process. Ind. Eng. Chem. Res. 2005, 44, 3973– 3982. (23) Himmelblau, D. M. Accounts of experience in the application of artificial neural networks in chemical engineering. Ind. Eng. Chem. Res. 2008, 47, 5782–5796. (24) Gonzaga, J. C. B.; Meleiro, L. A. C.; Kiang, C.; Filho, R. M. ANNbased soft-sensor for real-time process monitoring and control of an industrial polymerization process. Comput. Chem. Eng. 2009, 33, 43–49. (25) Vapnik, V. N. The Nature of Statistical Learning Theory; SpringerVerlag: New York, 1995. (26) Scholkopf, B.; Smola, A. J. Learning with Kernels: Support Vector Machine, Regularization, Optimization, And Beyond; MIT Press: Cambridge, MA, 2002. (27) Agrawal, M.; Jade, A. M.; Jayaraman, V. K.; Kulkarni, B. D. Support vector machine: A useful tool for process engineering applications. Chem. Eng. Prog. 2003, 98, 57–62. (28) Taylor, J. S.; Crisianini, N. Kernel Methods for Pattern Analysis.: Cambridge University Press: Cambrige, U. K., 2004. (29) Yan, W. W.; Shao, H. H.; Wang, X. F. Soft sensing modeling based on support vector machine and Bayesian model selection. Comput. Chem. Eng. 2004, 28, 1489–1498. (30) Laskov, P.; Gehl, C.; Kruger, S.; Muller, K. R. Incremental support vector learning: analysis, implementation and application. J. Mach. Learn. Res. 2006, 7, 1909–1936. (31) Jain, P.; Rahman, I.; Kulkarni, B. D. Development of a soft sensor for a batch distillation column using support vector regression techniques. Chem. Eng. Res. Des. 2007, 85, 283–287. (32) Zhang, Y. W. Enhanced statistical analysis of nonlinear processes using KPCA, KICA and SVM. Chem. Eng. Sci. 2009, 64, 801–811.

8693

(33) Suykens, J. A. K.; Van Gestel, T.; De Brabanter, J.; De Moor, B.; Vandewalle, J. Least Squares Support Vector Machines.: World Scientific: Singapore, 2002. (34) Liu, Y.; Hu, N. P.; Wang, H. Q.; Li, P. Soft chemical analyzer development using adaptive least-squares support vector regression with selective pruning and variable moving window size. Ind. Eng. Chem. Res. 2009, 48, 5731–5741. (35) Tax, D. M. J.; Duin, R. P. W. Support vector domain description. Pattern Recognit. Lett. 1999, 22, 1191–1199. (36) Liu, X.; Xie, L.; Kruger, U.; Littler, T.; Wang, S. Q. Statisticalbased monitoring of multivariate non-gaussian systems. AIChE J. 2008, 54, 2379–2391. (37) Ge, Z. Q.; Xie, L.; Kruger, U.; Lamont, L.; Song, Z. H.; Wang, S. Q. Sensor fault identification and isolation for multivariate non-gaussian processes. J. Process Control 2009, 19, 1707–1715. (38) Tipping, M. E. The relevance vector machines. In AdVances in Neural Information Processing Systems 12; MIT Press: Cambridge, MA, 2000, 652658. (39) Tipping, M. E. Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 2001, 1, 211–244. (40) Bishop, C. M. Pattern Recognition and Machine Learning; Springer, New York, 2006. (41) Tzikas, G. D.; Wei, L. Y.; Likas, A.; Yang, Y.; Galatsanos, N. P. A tutorial on relevance vector machines for regression and classification with applications. University of Ioannina: Ioanni, Greece, Illinois Institute of Technology: Chicago, IL, 2006. (42) Hernandez, N.; Talavera, I.; Dago, A.; Biscay, R. J.; Ferreira, M. M. C.; Porro, D. Relevance vector machines for multivariate calibration purposes. J. Chemom. 2008, 22, 686–694. (43) Lima, C. A. M.; Coelho, A. L. V.; Chagas, S. Automatic EEG signal classification for epilepsy diagnosis with relevance vector machines. Expert Syst. Appl. 2009, 36, 10054–10059. (44) Lyman, P. R.; Georgakist, C. Plant-wide control of the Tennessee Eastman problem. Comput. Chem. Eng. 1995, 19, 321–331. (45) Downs, J. J.; Vogel, E. F. A plant-wide industrial process control problem. Comput. Chem. Eng. 1993, 17, 245–255. (46) Fortuna, L.; Rizzo, A.; Sinatra, M.; Xibilia, M. G. Soft analyzers for a sulfur recovery unit. Control Eng. Pract. 2003, 11, 1491–1500.

ReceiVed for reView March 15, 2010 ReVised manuscript receiVed June 24, 2010 Accepted July 18, 2010 IE101146D