Nonlinear Multivariate Quality Estimation and Prediction Based on

As compared to other nonlinear methods, the main advantage of KPLS is that it does ..... Qin, S. J.; McAvoy, T. J. Nonlinear PLS Modeling Using Neural...
0 downloads 0 Views 978KB Size
1120

Ind. Eng. Chem. Res. 2008, 47, 1120-1131

Nonlinear Multivariate Quality Estimation and Prediction Based on Kernel Partial Least Squares Xi Zhang,* Weiwu Yan, and Huihe Shao Department of Automation, Shanghai Jiaotong UniVersity, Shanghai, 200240, China

A novel nonlinear multivariate quality estimation and prediction method based on kernel partial least-squares (KPLS) was proposed in this article. KPLS is a promising regression method for tackling nonlinear problems because it can efficiently compute regression coefficients in high-dimensional feature space by means of the nonlinear kernel function. It is an efficient method for estimating and predicting quality variables in the nonlinear process by mapping data from the original space into a high-dimensional feature space. It only requires the use of linear algebra, making it as simple as linear multivariate projection methods, and it can handle a wide range of nonlinearities because of its ability to use different kernel functions. Its application results from a simple example, and real data of an industrial oil refinery factory show that the proposed method can effectively capture the nonlinear relationship among variables. In addition, it has a better estimation performance than the partial least-squares (PLS) and other linear approaches. 1. Introduction It is well-known that some important quality variables in process control systems (e.g., product composition in a large distillation column, biomass concentration in mycelia fermentation, and concentration of reaction mass in a chemical reactor) are difficult or even impossible to measure online because of the limitation of process or measurement techniques. These variables, which are the key indicators of process performance, are normally determined by offline analyses in the laboratory or by an online quality analyzer, which is often expensive and requires frequent and high-cost maintenance. Furthermore, a significant delay (often several hours) is usually incurred during laboratory testing such that the analyzed values cannot be used as feedback signals for quality control systems. Such limitations can have a severe influence on the quality of product and safety in production. Joseph and Brosillow1 introduced inference control to solve this problem. The basic idea of inference estimation (soft sensing) is to estimate the outputs of the primary variables using easily measured secondary variables, which are correlated to the primary variables. In recent years, online estimation and prediction approaches have been widely studied and used in industrial processes.2 Empirical models, such as neural networks3,4 and multivariate statistical methods, are used to derive regression models.5,6 Multiple linear regression suffers from numerical problems as well as degraded models when a data set is strongly collinear. Principal component regression, partial least-squares (PLS), and canonical variate analysis address this issue by projecting the original process variables onto a low number of orthogonal latent variables. Multivariate estimation and prediction methods developed in batch/fed-batch processes have been reviewed extensively in refs 7 and 8.9 Other methods based on model10 and hybrid methods11,12 have also been developed. However, in complex chemical and physical systems, linear methods like PLS are inappropriate to describe the underlying data structure because these systems may exhibit significant nonlinear characteristics. To solve this issue, a number of strategies have been proposed within the PCA and PLS * To whom correspondence should be addressed. Tel: +8621-3420 4264, Fax: +86-21-3420 4264. E-mail: [email protected].

frameworks.13 The quadratic PLS model was developed to fit the functional relationship between latent variables.13-16 Other nonlinear approaches like PLS combined with neural networks were also proposed.3,17-18 In recent years, a new nonlinear PLS method called kernel PLS has been developed.19 The basic idea of KPLS is to first map each point from an original space onto a feature space via a nonlinear kernel and then develop a linear PLS model in the mapped feature space. According to Cover’s theorem, the nonlinear data in the original space is most likely to be linear after high-dimensional nonlinear mapping.20 KPLS can efficiently compute latent variables in the feature space by means of integral operators and the nonlinear kernel function. As compared to other nonlinear methods, the main advantage of KPLS is that it does not involve nonlinear optimization. It only requires linear algebra, making it as simple as the conventional linear PLS. In addition, because of its ability to use different kernel functions, KPLS can handle a wide range of nonlinearities.20-21 On the basis of these merits, KPLS has been shown to perform better than linear PLS in regressing and classifying.13,19,22 The prediction ability of KPLS was touched upon in refs 13 and 21, but it was not thoroughly analyzed. In this article, a novel multivariate quality estimation and prediction approach based on kernel PLS is proposed. It first maps data from the original space into a high-dimensional feature space and then performs linear PLS regression and estimation in the feature space. Performance of the proposed strategy is extensively analyzed using two examples. The application results show that the prediction method based on KPLS can effectively capture the nonlinear relationship among variables. Its performance significantly outperforms estimation methods based on PLS. This article is organized as follows. The concepts of PLS and KPLS are introduced in section 2. At the same time, the detailed online estimation and prediction procedure is proposed. After which, the superiority of the KPLS-based estimation method is illustrated through two examples in section 3. In section 4, we further discuss the influence of a number of factors on the estimation performance. Finally, we present our conclusions.

10.1021/ie070741+ CCC: $40.75 © 2008 American Chemical Society Published on Web 01/16/2008

Ind. Eng. Chem. Res., Vol. 47, No. 4, 2008 1121

2. Multivariate Quality Estimation and Prediction Based on KPLS 2.1. Partial Least-Squares. Partial least-square (PLS) is a robust multivariate regression algorithm based on the PCA approach of breaking data matrices down into a series of abstract latent variables or principal components. PLS seeks to decompose both input X and output Y blocks and produce a relationship between the two sets of principal components. In PLS, the scaled matrices X and Y are decomposed into score vectors (t and u), loading vectors (p and q), and residual error matrices (E and F).23,24 The PLS algorithm is as follows. Algorithm 1. Partial least-squares algorithm.13,21,23 1. Randomly initialize u (can set u equal to any column of Y). 2. Calculate w by

w ) XTu

Figure 1. Geometric representation of PLS.

for Y, and U ˆ ) [uˆ 1,... ,uˆ m] are the predictions of Y block scores from ti for i ) 1,... ,m. It is trivial to verify that

(1)

3. Calculate

t ) Xw

(2)

c ) YTt

(3)

Normalize t by t r t/|t|. 4. Calculate

U ˆ ) [uˆ 1,... ,uˆ m] ) [t1b1,... ,tmbm] ) TB

(10)

Y)U ˆ QT + F ) TBQT + F ) TCT + F

(11)

where matrix B is the regression coefficient matrix and B ) diag(b1,... ,bm). On the other hand, the score vector, tk, can also be computed directly from the original predictor matrix, X, as follows:25

tk ) Xrk

(12)

with

where c is a weight vector. 5. Calculate

k-1

u ) Yc

rk )

(4)

and normalize u by u r u/|u|. 6. Repeat steps 2-4 until convergence. 7. Deflate X and Y matrices as follows:

Y ) Y - tt Y

(6)

The scores (t and u) contain the information regarding the objects and their similarities/dissimilarities with respect to the given problem and model. The weight w shows how the variables combine to form the quantitative relation between X and Y, and then it provides an interpretation of the scores, t and u. Hence, these weights are essential to understand which X variables are important and which X variables provide the same information. Consider the data matrices X ) [x1,... ,xP] ∈ RN×P and Y ) [y1,... ,yQ] ∈ RN×Q, where N stands for the number of samples, P the number of input variables, and Q the number of response variables. The PLS decomposition of X and Y results in the following: m

X ) TPT + E )

ti piT + E ∑ i)1

Y)U ˆ QT + F )

uˆ iqiT + F ∑ i)1

(7)

m

(8)

where T ) [t1,... ,tm] assembles the scores

t i ) E i - 1w i

]

(I - wipiT) wk

r 1 ) w1 (5)

T

i)1

(13)

and

X ) X - tt X T

[∏

(9)

where m is the number of latent variables, P ) [p1,... ,pm] are the loading vectors for X, Q ) [q1,... ,qm] are the loading vectors

(14)

where I is the identity matrix with a proper dimension. The score vectors T can therefore be formulated as26

T ) XR R ) W[P W] T

(15) -1

(16)

where R ) [r1,... ,rm] and W ) [w1,... ,wm].23,24,27 PLS is a projection method and thus has a simple geometric interpretation as a projection of the X matrix down on an m-dimensional hyper-plane in such a way that the coordinates of the projection (ti, i ) 1, ..., m) are good predictors of Y.23 This is indicated in Figure 1. As shown in Figure 1, the X matrix can be represented as N points in a space where each column of X defines one coordinate axis. The PLS model defines an m-dimensional hyper-plane, which in turn is defined by online, one direction, per component. The direction coefficients of these lines are pi. The coordinates of each object are i, and when its data (row i in X) are projected down on a plane, they are ti. These positions are related to the values of Y. The direction of the plane is expressed as slopes, which are the angles between the PLS direction and the coordinate axis.23 2.2. Kernel Partial Least-Squares. As we know, PLS works well when the system is linear. However, PLS fails to separate the nonlinear mixed source due to its intrinsic linearity. When the variations are nonlinear, the data can be mapped onto a highdimensional space in which they vary linearly. The highdimensional space is referred to as the feature space (F). KPLS is formulated in the feature space to extend the linear PLS to

1122

Ind. Eng. Chem. Res., Vol. 47, No. 4, 2008

its nonlinear kernel form.13 KPLS can handle a wide range of nonlinear problems. Because it needs to compute for the kernel matrix, it consumes more time as compared to the PLS method. Assume a nonlinear transformation of the input variables N onto a feature space F, that is, mapping Φ: xi ∈ RN f {xi}i)1 Φ(xi) ∈ F. Our goal is to construct a linear PLS regression model in F. In fact, it means that we can obtain a nonlinear regression model in the space of the original input variables. An (N × M) matrix of regressors, whose i-th row is the vector Φ(xi), is denoted by Φ. Depending on the nonlinear transformation Φ(‚), the feature space can be high and may even be infinitely dimensional when the Gaussian kernel function is used. For this study, we are working only with N observations, and we have to restrict ourselves to finding the solution of the linear N 19 regression problem in the span of the points {Φ(xi)}i)1 . The KPLS algorithm can be directly derived from the PLS algorithm by modifying some steps of its procedure.13,19 Good generalization properties of the KPLS model can be achieved by using the appropriate estimation of regression coefficients in F and through the selection of kernel function. The KPLS algorithm is as follows. Algorithm 2. Kernel partial least-squares algorithm.13,19,21 1. Randomly initialize u (can set u equal to any column of Y). 2. Calculate

t ) ΦΦTu ) Ku

(18)

where c is a weight vector. 4. Calculate

u ) Yc

(19)

and normalize u by u r u/|u| 5. Repeat steps 2-4 until convergence. 6. Deflate K and Y matrices as follows: K ) (I - ttT)K(I - ttT) ) K - ttTK - KttT + ttTKttT

Y ) Y - tt Y T

(

where

(

1 · · · · I) · ·· · · 0 · · ·

)(

)

0 · N×N · ∈ R , 1N ) · 1

)

(24)

(

)

1 · · · · · · · · · 1 · · ·

1 · N×N · ∈R · 1

Step 4. Use the KPLS algorithm and acquire the regression coefficient matrix.

B ) ΦTU(TTKU)-1TTY

(25)

Step 5. Calculate the predictions of the training data as follows: Y ˆ ) ΦB ) ΦΦTU(TTKU) -1TTY ) KU(TTKU)-1TTY

(26)

Step 6. Rescore data Y ˆ to their original value Y ˜ via multiplying the variance and adding the mean value acquired from step 1 of the model procedure, that is,

Y ˜ )Y ˆ Sr + Yrmean

(27)

(17)

and set t r t/|t|. 3. Calculate

c ) YTt

1 1 K ˜ ) I - 1N1TN K I - 1N1TN N N

(20)

(21)

where Sr and Yrmean are the variance and mean value of the training data, respectively. 2. Online Estimation and Prediction Procedure. Step 1. Obtain the new data and scale them using the mean and variance of each variable. Step 2. Given the m-dimensionally scaled test data, xt ∈ Rm, computes for the kernel matrix Kt ∈ RNt×N by

[Kt]j ) 〈Φ(xt),Φ(xj)〉 ) [k(xt,xj)]

(28)

where xt is the test data, xt ∈ Rm, t ) 1,... ,Nt , and xj is the training data xj ∈ Rm, j ) 1,... ,N. Step 3. Carry out mean centering on the test kernel matrix as follows:

1 1 K ˜ t ) Kt - 1Nt1TNK I - 1N1TN N N

(

)(

)

(29)

Remark 1. Applying the so-called kernel trick, that is,

Φ(xi)TΦ(xj) ) K(xi,xj)

(22)

We can see that ΦΦT represents the (N × N) kernel Gram matrix K of the cross dot products between all mapped input N 19 data points {Φ(xi)}i)1 . 2.3. Outline of Online Estimation and Prediction Strategy Based on KPLS. 1. Developing the Normal Estimation and Prediction Model. Step 1. Acquire the training data and normalize them using the mean and standard deviation of each variable. Step 2. Given a set of m-dimensionally scaled normal operating data xk ∈ Rm, k ) 1,... ,N, computes for the kernel matrix K ∈ RN×N by

[K]ij ) 〈Φ(xi),Φ(xj)〉 ) [k(xi,xj)]

(23)

Step 3. Carry out centering in the feature space for ∑kN) 1Φ(xk) ) 0,

where K and 1N are obtained from step 2 of the modeling procedure and

1Nt )

( ) 1 · · · 1

· · ·

· · ·

· · ·

1 · · · 1

∈ RNt×N

Step 4. Calculate the estimation value Y ˆ t using the regression coefficient matrix B obtained from step 4 of the modeling procedure as follows: Y ˆ t ) ΦtB ) ΦtΦTU(TTKU)-1TTY ) KtU(TTKU)-1TTY

(30)

Step 5. Rescore data Y ˆ t to their original value Y ˜ t via multiplying the variance and adding the mean value acquired from step 1 of the online estimation procedure, that is,

Y ˜t ) Y ˆ tSt + Ytmean

(31)

Ind. Eng. Chem. Res., Vol. 47, No. 4, 2008 1123

Figure 2. Data set used for training and testing in the nonlinear numerical simulation example.

Figure 4. Relationship between input variables and the response variable in the nonlinear numerical simulation example.

Figure 3. Data set used for training and testing in the nonlinear numerical simulation example.

Figure 5. Ratio of variable variance captured in PLS (nonlinear numerical simulation example).

where St and Ytmean are the variance and mean value of the testing data, respectively.

variable t only in this case. From Figure 4, we can easily see that the response variable is nonlinearly correlated with the input variables.28,29 The method of choosing latent variables is based on the percentage of the variance captured by the latent variables of matrix Y in this article. The ratios of variance captured in PLS and KPLS are shown correspondingly in Figures 5 and 6. When the ratios are equal or larger than 0.90, the latent variables retained in the PLS and KPLS models are 2 and 11, respectively. Figures 7 and 8 show the prediction results of the training and testing data using the PLS approach. The upper part of Figure 7 shows the actual and predicted values of the training and testing data, and the lower part shows the absolute error between the actual and estimation values. Figure 8 shows the estimation results via another style. In such plots, the data will fall on the diagonal (the predicted value is equal to the actual value) if the model fits the data perfectly. It also shows the residual value of the training and testing data. From Figures 7 and 8, we can see that the estimation results are not very satisfactory because there is a significant mismatch between the model prediction and the measured values. The estimation performance can also be evaluated in terms of the root-mean-square-error (RMSE) criterion. The RMSE values of the training and testing data using the PLS method

3. Application Studies 3.1. Case study 1: A Nonlinear Numerical Simulation Example. We use the nonlinear multidimensional simulation example devised in refs 28 and 29 to assess the efficiency of our proposed method. It is defined as follows:

x1 ) t 2 - t + 1 + 1

(32)

x2 ) sin t + 2

(33)

x3 ) t3 + t + 3

(34)

y ) x12 + x1x2 + 3 cos x3 + 4

(35)

where t is uniformly distributed over [-1,1] and i,i ) 1,2,3,4 are noise components uniformly distributed over [-0.1, 0.1]. The generated data of 300 samples are segmented into training and testing data sets. They are illustrated in Figures 2 and 3, respectively. The first 200 samples are selected for training, and the subsequent 100 samples are used as testing data set. It is apparent that the input variables are driven by one latent

1124

Ind. Eng. Chem. Res., Vol. 47, No. 4, 2008

Figure 6. Ratio of variable variance captured in KPLS (nonlinear numerical simulation example).

Figure 7. Estimation results via PLS method in the nonlinear numerical simulation example.

Figure 9. Estimation results via KPLS method in the nonlinear numerical simulation example.

Figure 10. Estimation parity plot from KPLS method in the nonlinear numerical simulation example.

Remark 2. The RMSE index is defined as follows:21

RMSE )

x

n

(yˆ - y)2 ∑ i)1 (36)

n

where y is the measured values, yˆ is the corresponding predicted values, and n is the number of samples. When we construct the KPLS estimation model, we must choose the appropriate kernel. The radial basic function (RBF) is used as the selected kernel function and c is 0.06 for this case.

K(x,y) ) exp Figure 8. Estimation parity plot from PLS method in the nonlinear numerical simulation example.

are 0.2125 and 0.2233, respectively, which also indicate that the linear PLS model is not very appropriate for fitting the data.

(

)

|x - y|2 c

(37)

The estimation and prediction results using KPLS are shown correspondingly in Figures 9 and 10. The RMSE values for the training and testing data using KPLS are 0.0539 and 0.1661, respectively. These values are smaller than those obtained when using the PLS method. From Figure 9, we can see that the KPLS model predicts the actual value with relatively good accuracy.

Ind. Eng. Chem. Res., Vol. 47, No. 4, 2008 1125

Figure 11. Schematic diagram of the fractionators.

The absolute error is also smaller than the linear PLS method. The same results can be seen in Figure 10, in which the predicted values are plotted against the observed data, and the data shift to a more diagonal distribution on the plot. The prediction residuals are more close to zero, and no significant outliers are observed. Therefore, the KPLS approach shows better estimation and prediction performance as compared to the PLS method. As already known, the input variables and the responsible variable are nonlinearly correlated with one another (as shown in Figure 4). The better prediction performance of KPLS as compared to PLS suggests that a nonlinear correlation structure should not be modeled using a linear approach due to the risk of including noise in the model while trying to account for nonlinearity. With PLS, by projecting the nonlinear input data onto a linear subspace, it cannot model the nonlinearity of the relationship properly. In contrast, KPLS aims to model such nonlinear relationships via a nonlinear kernel mapping onto a feature space. 3.2. Case study 2. Fluid Catalytic Cracking Unit. The fluid catalytic cracking unit (FCCU) is the core unit of the oil secondary operation. Its running conditions strongly affect the yield of light oil in petroleum refining industries. In general, FCCU consists of the reactor-regenerator subsystem, the fractionator subsystem, the absorber-stabilizer subsystem, and the gas sweetening subsystem. The main aim of the fractionator subsystem is to split crude oil according to a fractional distillation process. The prime products of the fractionator subsystem include crude gasoline, light diesel oil, and slurry. It is important that the freezing point of light diesel oil is estimated online to control the quality. The yield rate of gasoline is analyzed offline every 8 h. A significant delay (often several hours) is usually incurred such that the analyzed values cannot be used as feedback signals for control systems. Therefore, it is also necessary to estimate it online. In another aspect, if the estimation is accurate enough, it can entirely substitute the expensive analyzer and cut production costs. The online estimation method based on KPLS is applied to the estimation of the freezing point of light diesel oil in the fractionator subsystem and the yield rate of gasoline. In these studies, the source samples are from the process data records and the corresponding daily laboratory analysis of the Shi Jia zhuang Oil Refinery Factory, China. Figure 11 shows the simplified flow path of the fractionator subsystem. First, the input variables of the freezing point estimation are chosen according to the principle that the variables that affected the response variable most are selected first. There are 32 trays in the fractionator subsystem of the Shi Jia zhuang Oil Refinery Factory. According to the analysis of the fractional distillation process, it is found that the most important variables that contribute to the freezing point of light diesel oil are its

Figure 12. Data set used for training and testing in freezing-point estimation.

Figure 13. Data set used for training and testing in freezing-point estimation.

extraction temperature, the vapor temperature of the 19th tray, the quantity of reflux of the first intermediate section circulation, the extraction temperature of the first intermediate section circulation, and the reflux temperature of the first intermediate section circulation. Therefore, these five parameters are employed to act as inputs of the estimation model, and the freezing point of light diesel oil is employed to act as an output of the estimation and prediction model.2 The collected data of 300 samples for the freezing point estimation are segmented into training and testing data sets. The data are illustrated in Figures 12 and 13, respectively. The first 200 samples are selected as the training samples, and the subsequent 100 are used as testing data. There are also complex nonlinear relationships among the input and response variables. Thus, it is particularly good for testing the nonlinear estimation methods. The ratios of variance captured for selecting the number of latent variables are shown in Figures 14 and 15. When the ratios are equal or larger than 0.90, the latent variables retained in the PLS and KPLS models are 4 and 14, respectively. The estimation results of freezing point based on the PLS approach are presented in Figures 16 and 17. From Figure 16, we observe that the prediction results are not very accurate. The absolute error is large and in some cases even greater than the actual value. The RMSE values for training and testing data

1126

Ind. Eng. Chem. Res., Vol. 47, No. 4, 2008

Figure 14. Ratio of variable variance captured in PLS (freezing-point estimation).

Figure 15. Ratio of variable variance captured in KPLS (freezing-point estimation).

are 2.1959 and 2.3616, respectively. The same results can also be seen in Figure 17. In the diagram, the predicted values are plotted against the observed data. It is obvious that the PLS model is not able to predict the freezing point very well because the training and testing data have a scattered distribution and the residuals are large. When we use the estimation approach based on KPLS in this case, a radial basic function (RBF) kernel K(x,y) ) exp( -|x - y|2/c) is also selected as the kernel function with c ) 0.35. As compared to Figure 16, we can see from Figure 18 that KPLS gives much better prediction and estimation results than the linear PLS model. The estimation results are more accurate, and the error is smaller than that of the PLS method. Figure 19 shows the same results. The data are distributed more closely toward the diagonal, and the residuals are lower than the PLS. The RMSE values of the training and testing data using KPLS are 0.9568 and 1.1126, respectively. The values are also much smaller than the linear PLS approach. The input variables of yield rate estimation are selected according to the same principle as freezing point estimation. According to the analysis of the process, it is found that the main variables that contribute to the yield rate of gasoline are flow rate of fresh oil, flow rate of reflux oil, temperature of the catalytic reaction, overhead temperature of the main fractionating

Figure 16. Freezing point estimation results via PLS method.

Figure 17. Freezing point estimation parity plot from the PLS method.

Figure 18. Freezing point estimation results via the KPLS method.

tower, extraction temperature of the light diesel oil, and bottom temperature of the stabilizer column. Hence, these six variables are used to act as the inputs, and the yield rate is used to act as the output of the estimation model. We collected 200 samples for this case. The first 100 samples are used as the training samples and the subsequent 100 are employed as the testing data. The data for training and testing

Ind. Eng. Chem. Res., Vol. 47, No. 4, 2008 1127

Figure 19. Freezing point estimation parity plot from the KPLS method.

Figure 22. Ratio of variable variance captured in PLS (yield-rate estimation).

Figure 20. Data set used for training and testing in yield-rate estimation. Figure 23. Ratio of variable variance captured in KPLS (yield-rate estimation).

Figure 21. Data set used for training and testing in yield-rate estimation.

are shown in Figures 20 and 21, respectively. The ratios of variance captured for selecting the number of latent variables are shown in Figures 22 and 23. The latent variables retained in the PLS and KPLS models are 4 and 36, correspondingly, when the ratios are equal or larger than 0.90. The estimation results of the yield rate of gasoline based on PLS are shown in Figures 24 and 25. From the figures, we notice

Figure 24. Yield rate estimation results via the PLS method.

that the prediction results are also not very satisfactory. The RMSE values for the training and testing data are 1.4458 and 1.5136, respectively. As a comparison of the PLS estimation method, the KPLS prediction results are shown in Figures 26 and 27. We can see

1128

Ind. Eng. Chem. Res., Vol. 47, No. 4, 2008

Figure 25. Yield rate estimation parity plot from the PLS method.

Figure 28. Estimation results with the linear kernel.

Remark 3. In this case, RBF kernel is also selected as the kernel function with c ) 0.35. From the above discussion, we know that for nonlinear quality variable estimation, the linear PLS method has poor performance due to its intrinsic linearity. The KPLS-based method can effectively capture nonlinear relationships among input and response variables through high-dimensional feature mapping. It has a better performance as compared to linear approaches like PLS. 4. Further Discussions

Figure 26. Yield rate estimation results via the KPLS method.

Figure 27. Yield rate estimation parity plot from the KPLS method.

from the figures that KPLS gives better estimation results. The RMSE values of the training and testing data using KPLS are 0.4381 and 0.4536, respectively. The values are much smaller than the PLS method. Furthermore, the data are distributed more closely along the diagonal, and the residuals are smaller than PLS.

The performance of the estimation method based on KPLS is influenced by many factors. In this part, we will use the nonlinear numerical simulation example, which is used in case study 1, to discuss these extensively. The data set for training and testing are the same as those used in case study 1. 4.1. Performance Influence of Different Kernels. When we estimate quality variables via the KPLS-based method, the choice of kernel is important. A different kernel has a different influence on the performance of prediction. We use the linear kernel, polynomial kernel, and radial basis function kernel for comparison. These are listed below. (1) Linear kernel: K(X1,X2) ) X1X2 (2) Polynomial kernel: K(X1,X2) ) (1 + X1X2)C (3) Radial basis function kernel: K(X1,X2) ) exp(-|X1 X2|2/c) The KPLS model estimation results using linear and polynomial kernels (the parameter c is chosen as 0.06) are shown in Figures 28-31. It can be seen from the figures that when using the linear kernel and the polynomial kernel, they do not acquire good estimation results as compared to using the RBF kernel (shown in Figures 9 and 10). The reason for this is that the input variables of the data sets used are nonlinearly correlated with one another and with the response variable. When using the linear and polynomial kernel, they cannot model the nonlinear correlation structure properly due to the risk of including noise in the model while trying to account for nonlinearity. On the basis of experiences, the RBF kernel is more suitable for modeling such nonlinear relationships as compared to other kernel functions via selecting the appropriate parameter c. 4.2. Performance Influence of Parameter c in the RBF Kernel. When using the RBF kernel in the KPLS-based estimation method, the value of parameter c has a significant

Ind. Eng. Chem. Res., Vol. 47, No. 4, 2008 1129

Figure 29. Estimation parity plot with the linear kernel.

Figure 32. Estimation parity plot with different c (RBF) (training data).

Figure 30. Estimation results with the polynomial kernel. Figure 33. data).

Figure 31. Estimation parity plot with the polynomial kernel.

influence on the prediction performance. Usually, c is selected using the equation c ) rmσ2, where r is a constant that is determined by considering the process to be predicted, m is the dimension of the input space, and σ2 is the variance of the data.13,30,31 Figures 32 and 33 present the estimation results of the response variable for the training and testing data with parameter

Estimation parity plot with different c (RBF) (testing

c equal to 0.01, 0.06, 0.6, and 6. It can be seen from the figures that for the training data, when c becomes small, the data shifts to a more diagonal distribution on the plot (i.e., acquiring high predictive ability). However, for the testing data, the rule is a little different. When c becomes larger or smaller over a specific threshold, the estimation performance becomes worse. Therefore, if we want to acquire good prediction performance, the appropriate choice of parameter c is important. In this case, when c is equal to 0.06, it can acquire good estimation performance. The same results can be seen from the RMSE value of a different c (shown in Table 1). When c is equal to 0.06, the RMSE value is 0.1661, which is the smallest among the four values. 4.3. Influence of a Different c on Latent Variables. As is already known, the number of latent variables retained has a significant influence on the estimation performance of KPLS. In this article, the percentage of variance captured by the latent variables of matrix Y is used as the criterion to select the number of reserved latent variables. As shown in Figure 34, when c becomes smaller, the number of latent variables that capture the same percent of variance becomes larger. At the same time, the computation time increases with the increase of latent variables. Thus, if we want to achieve ideal estimation results, the appropriate choice of parameter c to retain the right number of variables is also important.

1130

Ind. Eng. Chem. Res., Vol. 47, No. 4, 2008

5. Conclusions

Figure 34. Ratio of variable variance captured of different c in the KPLS method.

A novel nonlinear multivariate quality estimation and prediction approach based on KPLS is proposed. The basic idea of this method is that data are mapped from the original space onto a higher-dimensional feature space, and then linear PLS prediction in the feature space is performed. The proposed method can effectively capture nonlinear relationships among input and response variables. Compared with PLS and other nonlinear methods, KPLS avoids the nonlinear optimization procedure and it involves calculations as simple as those used in linear PLS. The improvement in prediction performance observed from PLS to KPLS suggests that the nonlinear correlation structures should not be modeled using linear approaches because of the risk of including noise in the model while trying to account for the nonlinearity. The successful applications of the proposed approach to the nonlinear numerical simulation examples and the industrial fluid catalytic cracking unit (FCCU) have demonstrated the feasibility and effectiveness of the KPLS-based method. A number of factors also deserve special consideration when using KPLS for estimation. One is how to choose the kernel function and identify the kernel parameters. For this study, we only solved this based on our experiences. How to solve this problem using a systematic approach is still a challenging issue. Another problem is that KPLS has a drawback: it has difficulties in identifying the relationship between latent variables in highdimensional space and variables in original space, because it is difficult or even impossible to find an inverse mapping function from the feature space to the original space. We believe that with solving these problems, the quality estimation method based on KPLS will give more promising prediction results for nonlinear multivariate systems. Acknowledgment

Figure 35. Computation time comparison between the PLS and KPLS methods.

Literature Cited

Table 1. RMSE of a different parameter c KPLS

PLS

c

training data

testing data

0.01 0.06 0.6 6.0

0.0070 0.0539 0.1849 0.1870

0.4558 0.1661 0.1899 0.1801

The authors gratefully acknowledge the support of the Natural Science Foundation of China (grant 60504033). We also gratefully acknowledge the anonymous referees for their constructive advice.

training data

testing data

0.2125

0.2233

4.4. Computation Time Comparison Between PLS- and KPLS-Based Methods. The computation time for online estimation and prediction is important as well. The results of time consumption for modeling and estimation are shown in Figure 35. From the figure, we observe that, as the number of samples increases, the modeling and prediction time used in KPLS becomes longer than that of the PLS method. The reason for this is that, in the KPLS approach, there is a need to compute for the complex kernel matrix, which is not required in the PLS method. The KPLS-based method enhances prediction accuracy, but at the same time it sacrifices the speed of the process. However, in many cases, the accuracy is more important than speed. Therefore, the KPLS-based method can be widely used in many quality variable estimation cases, especially in chemical and biological processes.

(1) Joseph, B.; Brosilow, C. B. Inferential Control of Processes-1: Steady State Analysis and Design. AIChE J. 1978, 124, 485-508. (2) Yan, W.; Shao, H.; Wang, X. Soft Sensing Modeling Based on Support Vector Machine and Bayesian Model Selection. Comput. Chem. Eng. 2004, 28, 1489-1498. (3) Qin, S. J.; McAvoy, T. J. Nonlinear PLS Modeling Using Neural Networks. Comput. Chem. Eng. 1992, 16, 379-391. (4) Radhakrishnan, V. R.; Mohamed, A. R. Neural Networks for the Identification and Control of Blast Furnace Hot Metal Quality. J. Process Control 2000, 10, 509-524. (5) Kresta, J. V.; Marlin, T. E.; MacGregor, J. F. Development of Inferential Process Models Using PLS. Comput. Chem. Eng. 1994, 18, 597611. (6) Park, S.; Han, C. A Nonlinear Soft Sensor Based on Multivariate Smoothing Procedure for Quality Estimation in Distillation Columns. Comput. Chem. Eng. 2000, 24, 871-877. (7) Dochain, D.; Perrier, M. Dynamical Modeling, Analysis, Monitoring and Control Design for Nonlinear Bioprocesses. AdV. Biochem. Eng. Biotechnol. 1997, 56, 147-198. (8) James, S. C.; Legge, R. L.; Budman, H. On-Line Estimation in Bioreactors: A Review. ReV. Chem. Eng. 2000, 16, 311-340. (9) Lin, B.; Recke, B.; Knudsen, J. K. H.; Jørgensen, S. B. A Systematic Approach for Soft Sensor Development. Comput. Chem. Eng. 2007, 31, 419-425. (10) Chen, S.; Qin, S. A.; Billings, T. F. Practical Identification of NARMAX Models Using Radial Basis Functions. Int. J. Control 1990, 52, 1327-1350.

Ind. Eng. Chem. Res., Vol. 47, No. 4, 2008 1131 (11) Ljung, L. In System Identification: Theory for the User; Information And System Science Series; New Jersey: Prentice Hall, 1987. (12) Wang, X.; Luo, R.; Shao, H. In Designing a Soft Sensor for a Distillation Column with the Fuzzy Distributed Radial Basis Function Neural Network, Proceedings of the 35th IEEE Conference on Decision and Control, Kobe Japan, 1996; pp 1714-1719. (13) Kim, K.; Lee, J. M.; Lee, I. B. A Novel Multivariate Regression Approach Based on Kernel Partial Least Squares with Orthogonal Signal Correction. Chemom. Intell. Lab. Syst. 2005, 79, 22-30. (14) Wold, S.; Kettaneh-Wold, N.; Skagerberg, B. Nonlinear PLS Modeling. Chemom. Intell. Lab. Syst. 1989, 7, 53-65. (15) Frank, I. E. A Nonlinear PLS Model. Chemom. Intell. Lab. Syst. 1990, 8, 109-119. (16) Wold, S. Nonlinear Partial Least Squares Modelling: II Spline Inner Relation. Chemom. Intell. Lab. Syst. 1992, 14, 71-84. (17) Holcomb, T. R.; Morari, M. PLS/Neural Networks. Comput. Chem. Eng. 1992, 16, 393-411. (18) Malthouse, E. C.; Tamhane, A. C.; Mah, R. S. H. Nonlinear Partial Least Squares. Comput. Chem. Eng. 1997, 21, 875-890. (19) Rosipal, R.; Trejo, L. J. Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space. Journal of Machine Learning Research 2001, 2, 97-123. (20) Haykin, S. Neural Network; Prentice Hall: Englewood Cliffs, NJ, 1999. (21) Lee, D. S.; Lee, M. W.; Woo, S. H.; Kim, Y.-J.; Park, J. M. Multivariate Online Monitoring of a Full-Scale Biological Anaerobic Filter Process Using Kernel-Based Algorithms. Ind. Eng. Chem. Res. 2006, 45, 4335-4344. (22) Rosipal, R.; Trejo, L. J.; Matthews, B. Kernel PLS-SVC for Linear and Nonlinear Classification, Proceedings of the Twentieth International

Conference on Machine Learning (ICML-2003), Washington DC, 2003; pp 640-647. (23) Wold, S.; Sjo¨stro¨m, M.; Eriksson, L. PLS-Regression: A Basic Tool of Chemometrics. Chemom. Intell. Lab. Syst. 2001, 58, 109-130. (24) Geladi, P.; Kowalski, B. R. Partial Least Squares Regression: A Tutorial. Anal. Chim. Acta 1986, 185, 1-17. (25) Lindgren, F.; Geladi, P.; Wold, S. The Kernel Algorithm for PLS. J. Chemom. 1993, 7, 45-59. (26) Ho¨skuldsson, A. PLS Regression Methods. J. Chemom. 1988, 2, 211-228. (27) Zhao, S. J.; Zhang, J.; Xu, Y. M. Performance Monitoring of Processes with Multiple Operating Modes through Multiple PLS Models. J. Process Control 2006, 16, 763-772. (28) Wilson, D. J. H.; Irwin, G. W.; Lightdoby, G. RBF Principal Manifolds for Process Monitoring. IEEE Trans. Neural Networks 1999, 10, 1424-1434. (29) Zhao, S. J.; Zhang, J.; Xu, Y. M.; Xiong, Z. H. Nonlinear Projection to Latent Structures Method and its Applications. Ind. Eng. Chem. Res. 2006, 45, 3843-3852. (30) Lee, J. M.; Yoo, C.; Choi, S. W.; Vanrolleghem, P. A.; Lee, I. B. Nonlinear Process Monitoring Using Kernel Principal Component Analysis. Chem. Eng. Sci. 2004, 59, 223-234. (31) Wold, S. Cross-Validatory Estimation of Components in Factor and Principal Components Models. Technometrics 1978, 20, 397-405.

ReceiVed for reView May 25, 2007 ReVised manuscript receiVed November 6, 2007 Accepted November 16, 2007 IE070741+