Locally Weighted Kernel Principal Component Regression Model for

Aug 10, 2014 - paper, a just-in-time learning (JITL) based locally weighted kernel principal ... (KPCR)25,26 can be obtained by using kernel principal...
1 downloads 0 Views 728KB Size
Article pubs.acs.org/IECR

Locally Weighted Kernel Principal Component Regression Model for Soft Sensing of Nonlinear Time-Variant Processes Xiaofeng Yuan, Zhiqiang Ge,* and Zhihuan Song* State Key Laboratory of Industrial Control Technology, Institute of Industrial Process Control, Department of Control Science and Engineering, Zhejiang University, Hangzhou, 310027, Zhejiang, P. R. China ABSTRACT: The principal component regression (PCR) based soft sensor modeling technique has been widely used for process quality prediction in the last decades. While most industrial processes are characterized with nonlinearity and time variance, the global linear PCR model is no longer applicable. Thus, its nonlinear and adaptive forms should be adopted. In this paper, a just-in-time learning (JITL) based locally weighted kernel principal component regression (LWKPCR) is proposed to solve the nonlinear and time-variant problems of the process. Soft sensing performance of the proposed method is validated on an industrial debutanizer column and a simulated fermentation process. Compared to the JITL-based PCR, KPCR, and LWPCR soft sensing approaches, the root-mean-square errors (RMSE) of JITL-based LWKPCR are the smallest and the prediction results match the best with the actual outputs, which indicates that the proposed method is more effective for quality prediction in nonlinear time-variant processes.

1. INTRODUCTION In many industrial processes, it is important to measure and control key product quality in order to produce high-quality products. Yet on some occasions, due to reasons such as extreme environments, the unacceptable expense of analytical instruments, and time delay of measurements, there are difficulties for online measurement of those critical process variables. Thus, soft sensor techniques have become an important tool in industrial processes. Particularly, data-driven soft sensors have emerged since large amounts of data can be collected within process plants. In data-driven soft sensor approaches, difficult-tomeasure key process variables are estimated by some other easy-to-measure secondary process variables using statistical and machine learning techniques. In the last decades, various soft sensor methods have been proposed for online prediction, process monitoring, and sensor fault detection.1−7 By far, soft sensors have been successfully applied to many industrial processes such as chemical engineering, biochemical engineering, and the metallurgical, industrial, and pharmaceutical industry, etc.8−15 Among soft sensor methods, principal component regression16 and partial least-squares regression17 are the most common linear regression models.16,17 For other cases, artificial neural networks and support vector regression are popular for their strong ability in coping with nonlinear processes.18,19 Typically, the data measured in the process plant are strongly colinear as a result of redundancy of on-site sensors and correlation of variables. To deal with the colinearity of variables, principal component analysis (PCA)20the most popular and useful method for data dimensionality reductionis extensively used in many fields. According to a recent review paper on datadriven soft sensors,21 a total of 23% soft sensor models have been constructed on the basis of PCA, which is usually used as a preprocessing step. In principal component regression (PCR), the original input variables are replaced by their principal components for least-squares regression algorithm.22,23 Compared to ordinary least-squares, PCR can orthogonalize the © XXXX American Chemical Society

regression problem and deal with colinearity as well as high dimensionality. Although PCA can handle high dimensionality and colinearity, it can only extract linear information from data. Nevertheless, many processes are characterized with nonlinearity in nature, so it is desirable to develop an effective method which can solve the nonlinear problem. Thus, the nonlinear form of the PCAkernel PCA (KPCA)24is extensively applied in nonlinear processes. By mapping the original inputs into a highdimensional nonlinear space and calculating PCA in the new space, nonlinear features are extracted by KPCA. From the point of soft sensor regression, PCR cannot approximate well with nonlinear relation between output and input variables in nonlinear processes. Kernel principal component regression (KPCR)25,26 can be obtained by using kernel principal components in least-squares regression. Using KPCA as a data reprocessing step, KPCR creates a linear model based on the extracted nonlinear principal components. Besides the nonlinear problem, another common characteristic of the industrial process is time variance. The process characteristics often change gradually or frequently as a result of process drift, equipment aging, catalyst deactivation, etc. Therefore, the performance of the soft sensor model will deteriorate as time goes on. In order to deal with the changes of process plant and maintain high performance, the soft sensor models should be updated automatically and frequently. There are many kinds of adaptive soft sensors like recursive modeling27,28 and moving window29,30 techniques. While those methods can adapt the soft sensor to gradual process changes, they are difficult to cope with abrupt changes of the process. Hence a new online method called just-in-time learning Received: December 6, 2013 Revised: June 2, 2014 Accepted: August 9, 2014

A

dx.doi.org/10.1021/ie4041252 | Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Industrial & Engineering Chemistry Research

Article

(JITL)31−33 has been proposed. The JITL, which is also called lazy learning34 or instance-based learning,35 aims to assign training samples in the historical data set to a query sample for constructing a local model when estimation of the query sample is required. Different from traditional soft sensor methods, the JITL-based model is built locally and online, so it can track the process state well and be used in nonstationary processes. Another advantage of the JITL method is that it can deal with the nonlinearity of the process because of its local model structure. Even a linear local model of JITL-based PCR can acquire appropriate predictive accuracy. However, when strong nonlinear variable relationships widely exist in the process, a single linear local model may not always function very well. So a JITLbased nonlinear local model of KPCR may be a better choice. Meanwhile, the locally weighted regression (LWR),36,37 as a nonparametric method, is another kind of JITL-based local nonlinear model. In LWR, the samples are treated with different weights according to their distances to the estimated sample. So LWR can cope with the nonlinearity and time variance of the process simultaneously. Applying the locally weighted technique to PCR, we can obtain the locally weighted principal component regression (LWPCR). The LWPCR obtains the projection directions using the weighted input samples. Detailed interpretation of this algorithm will be shown later in this paper. Though the LWR form of linear PCRLWPCRhas a certain ability to solve nonlinearity regression relations, the input features are still the original process variables. The nonlinearity of the regression model only lies in the different treatment of samplesthat is the locally weighted trick. For processes with strong nonlinearity, LWPCR sometimes cannot meet the demand of prediction accuracy. Naturally, nonlinear features of kernel methods, instead of original variable features, are effective to deal with nonlinearity. In previous research, any nonlinear mapping of the original features is taken into consideration in LWR. By incorporating nonlinear features into local weighted regression, the model can adapt to more processes with much stronger nonlinearity. With regard to PCR, we propose an improved nonlinear regression method called locally weighted kernel principal component regression (LWKPCR), which takes the nonlinear feature transformation and the different treatment of samples into consideration simultaneously. Because of the just-in-time technique, this method is able to tackle the timevariance problem. Thus, LWKPCR can be expected to achieve higher predictive accuracy than linear PCR, nonlinear KPCR, and LWPCR. To verify this, comparative studies of predictive accuracy of PCR, KPCR, LWPCR, and LWKPCR will be carried out later. For purpose of fairness, the four different regression modeling methods will be constructed on the base of the JITL algorithm, as PCR and KPCR are global models while LWPCR and LWKPCR are local models. This means the JITL-PCR, JITL-KPCR, JITL-LWPCR, and JITL-LWKPCR will be used for online soft sensor modeling for estimating important process variables in this study. The remaining parts of this paper are organized as follows. In section 2, some preliminaries about PCR and KPCR are briefly introduced. Then on the basis of these two algorithms, LWPCR and LWKPCR are provided in section 3. Section 4 presents the strategy of JITL-based online soft sensor modeling techniques. Following this, in section 5, two case studies of soft sensor models are carried out on a debutanization column and a fermentation process. Finally, section 6 summarizes the main contribution of this paper.

2. PRELIMINARIES OF PCR AND KPCR First of all, the meanings of notations and symbols that will be used in this paper are interpreted here. Some key symbols and their corresponding meanings are list in Table 1. Other symbols Table 1. Key Symbols List L: number of historical data samples N: number of relevant samples xn ∈ Rm: input vector of m dimension yn ∈ Rl: output vector of l dimension X ∈ RN×m: relevant input data set Y ∈ RN×l: relevant output data set xq: input vector of query sample yq: actual output of query sample ωn: weight assigned to the nth sample d: number of principal component σ: parameter to calculate weight δ: parameter in Gaussian kernel C, CK, CW, CK,W: covariance matrices in PCR, KPCR, LWPCR, and LWKPCR T, TK, TW, TK,W: score matrices in PCR, KPCR, LWPCR, and LWKPCR K,W Tq, TKq , TW q , Tq : score vectors of query sample in PCR, KPCR, LWPCR, and LWKPCR θ, θK, θW, θK,W: regression coefficient in PCR, KPCR, LWPCR, and LWKPCR K,W ŷq, ŷKq , ŷW q , yq : estimation of output of query sample in PCR, KPCR, LWPCR, and LWKPCR

that are not listed will be explained immediately at their first appearances. 2.1. Principal Component Regression (PCR). The PCR algorithm consists of two steps.22,23 First, the feature components are extracted by PCA. Then, a linear regression model is built between the output variables and feature components. The procedure can be explained as follows. First, the covariance matrix of the data is calculated as C=

1 N

N

∑ xixi T i=1

(1)

Implementing the eigenvalue decomposition step, we can obtain (2)

CU = ΛU

where Λ and U are the eigenvalue matrix and eigenvector matrix of C sorted in descending order by the eigenvalue. That is U = [u1 , u 2 , ..., um]

(3)

Λ = diag(λ1, λ 2 , ..., λm)

(4)

The columns of U represent the new orthogonal coordinates. For dimensional reduction, only the first d dimensions are taken into consideration, i.e., Ud = [u1, u2, ..., ud]. Then the data X are projected to the new coordinates Ud as T = XUd

(5)

N×d

T∈R is called the principal component matrix, also called the score matrix. For the query sample, the score matrix is Tq = xqUd

(6)

Then using the multivariate linear regression, the regression coefficient matrix θ is θ = (TTT )−1TTY

(7)

Here it should be noted that the superscript T indicates the transposition of a matrix, and Y is mean removed. So the estimated outputs of the query sample are

yq̂ = Tqθ + y ̅

(8)

where y ̅ is the mean value row vector of the outputs. B

dx.doi.org/10.1021/ie4041252 | Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Industrial & Engineering Chemistry Research

Article

After that, the projection of ϕ(xk) is calculated as

2.2. Kernel Principal Component Regression (KPCR). KPCR is a nonlinear variant of PCR.25,26 By mapping the original input variables into high-dimensional space ϕ: Rm → F, x → ϕ(x), PCA is performed in F. The idea is simple and straightforward but at question because ϕ(x) does not have an explicit form. Fortunately, this can be addressed by the use of kernel functions originally used in Support Vector Machines. The same as PCA, the covariance matrix is also used here. Note that the superscript K is used to distinguish from the PCR algorithm. CK =

1 N

N i=1

=

(17)

Choosing the first d normalized columns of AK for dimension reduction, we get AKd = [αK1 , αK2 , ..., αKd ], and then, the projections of the training data are

(9)

i=1

T K = K ΑdK

(18)

For the query sample ϕ(xq), the gram matrix Kq is with entries Kq(q, i) = ϕ(xq)Tϕ(xi) and the projection is TqK = Kq ΑdK

Let α = [α1, α2, ..., αN]T. In order to calculate α, multiply ϕ(xk)T with both left sides of λKνK = CKνK, then we arrive at λ K (ϕ(xk)T ν K ) = ϕ(xk)T (C Kν K ),

3. LOCALLY WEIGHTED MODELING FOR PCR AND KPCR In this part, the derivation of the locally weighted principal component regression and the locally weighted kernel principal component regression will be given, respectively. 3.1. Locally Weighted Principal Component Regression (LWPCR). The idea of locally weighted modeling mehtod37 was proposed about two decades ago, here, the details of LWPCR are introduced in this part. As mentioned before, the input matrix and output matrix of the N training samples for the local model are X an Y. The weight vector of the training samples for the query sample is denoted as w = [w1, w2, ..., wN]T, where the ith entry represents the weight between the ith training sample and the query sample. The definition and calculation of the weight vector will be introduced in section 4. In order to construct the locally weighted PCR, all the variables are weighted-mean removed. That is

k = 1, 2, ..., N (11)



N

1 λ (ϕ(xk) ∑ αiϕ(xi)) = ϕ(xk) ⎜ i=1 ⎝N T⎜

T

⎞ × ∑ αiϕ(xi)⎟⎟ , i=1 ⎠

N

∑ ϕ(xj)ϕ(xj)T j=1

N

k = 1, 2, ..., N (12)

Define the kernel function gram matrix K with entries K(i, j) = ϕ(xi)Tϕ(xj), there are several kinds of kernel functions for choice such as the linear kernel function, the polynomial kernel function, sigmoid kernel function, Gaussian kernel function, and so on. Commonly, the Gaussian function has favorable generalization and smooth estimation ability. Moreover, the value of the kernel function is between 0 and 1, which can make the calculation process easier and faster.38 Therefore, we will use the Gaussian function as the kernel function. The Gaussian kernel function is defined as 2

K (i , j) = ϕ(xi)T ϕ(xj) = e− || xi − xj ||

/2δ 2

x̅ =

∑i = 1 xiwi N ∑i = 1 wi

,

xq := xq − x ̅ w ,

w

y̅ =

∑i = 1 yw i i N ∑i = 1 wi

Y := Y − 1N y ̅ w

X := X − 1N x ̅ w , (20)

here 1N×1 represents matrix whose size is N × 1 and elements are all ones. It can be easily seen in the following part that when all the entries of the weighting vector w = [w1, w2, ..., wN]T are scaled by a factor, the corresponding result of LWPCR does not change. So in the following LWPCR and LWKPCR, the weight vector w = [w1, w2, ..., wN]T is often normalized to satisfy that the sum of its entries equals to 1. After the weighted mean centered, we have new input samples xi, yi (i = 1, 2, ..., N) and query sample xq. Here we still use the notation of input and output variables to denote the weighted-mean centered variables. Then the weighted samples xwi = wixi (i = 1, 2, ..., N) are used to calculate the covariance matrix CW

(13)

(14)

that is equal to

Nλ K α = Kα

N

N

w

We have

Nλ K Kα = K 2α

(19)

Then we will get the regression coefficient matrix θK using eq 6 by replacing T with TK. At last, the predicted outputs of the query samples are calculated by substituting TKq , θK into eq 7.

(10)

i=1

k = 1, 2, ..., N (16)

K ΑK = ΛK ΑK

∑ αiϕ(xi)

K

∑ αiK (k , i) = K (k , : )α ,

Implementing the eigenvalue decomposition to the gram matrix K similar to PCR

N

ν =

i=1

i=1

N

∑ ϕ(xi)ϕ(xi)T

∑ αiϕ(xk)T ϕ(xi)

N

Then we should also find the eigenvalue matrix and eigenvector matrix of CK. So the next step is actually to find such eigenvalue λK ≥ 0 and its corresponding eigenvector νK satisfying λKνK = CKνK. From a different view of point, eigenvector νK can be regarded as the linear combination of ϕ(x1), ϕ(x2), ..., ϕ(xN) as the following equation K

N

tkK = ϕ(xk)T ν K = ϕ(xk)T ∑ αiϕ(xi) =

(15)

So α corresponds to the eigenvector of the gram matrix of K. Usually, the coordinate vector is normalized as ∥νK∥ = 1, so α should be normalized to satisfy ∥α∥ = 1/NλK.

1 C = N W

C

N

∑ (xiw)(xiw)T i=1

(21)

dx.doi.org/10.1021/ie4041252 | Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Industrial & Engineering Chemistry Research

Article

Also the eigenvalue matrix ΛW and eigenvector matrix UW are obtained as CWU W = ΛW U W

And let αw,K = [αw1 , αw2 , ..., αwN]T, it will be computed by multiplying ϕw(xk)T to both left sides of equation λw,Kνw,K = CW,Kνw,K, we have λ w ,K (ϕw(xk)T ν w ,K ) = ϕw(xk)T (CW,Kν w ,K ),

(22)

The first d columns of U are utilized for dimensional reduction, which we denote as UW d . So the projections to these coordinates of the training samples and query sample are W

T W = XUdW

k = 1, 2, ..., N N

λ

(23)

TqW = xqUdW

1 N

Λw ,K V w ,K = CW,KV w ,K

=

∑ αi i=1

N

∑ϕ

w

w

T

(xj)ϕ (xj)

j=1

N



i=1



∑ αiϕw(xi)⎟⎟, (29) W

K W (i , j) = wK i (i , j)wj

(30)

Next we will get Nλ w ,K K W α w ,K = (K W )2 α w ,K

(31)

That is

Nλ w ,K α w ,K = K W α w ,K

(32)

So α corresponds to the eigenvector of the weighted gram matrix KW. To normalize the projection direction ∥νw,K∥ = 1, αw,K should be normalized to satisfy ∥αw,K∥ = 1/Nλw,K. Then the projection of the weighted-mean removed ϕ(xk), k = 1, 2, ..., N will be carried out. w,K

N

N

tkw ,K = ϕ(xk)T ν w ,K = ϕ(xk)T ∑ αiwϕw(xi) = i=1 N

N

=

∑ αiwϕ(xk)T ϕw(xi) i=1 N

w w ∑ αiwwiϕ(xk)T ϕ(xi) = ∑ αiwwK i (k , i)= ∑ αi K (k , i) i=1

i=1

= K w(k , : )α w ,K ,

i=1

k = 1, 2, ..., N

(33)

where Kw(i, j) = wjK(i, j), then Kw = K·diag(w1, w2, ..., wN), and here, we use the lower case w to distinguish that of KW defined above. So Kw is the training projection matrix. For dimension reduction, the first d eigenvectors of νw,K are used to for the projection coordinates, this refers to choose the first d normalized eigenvectors of KW. So the eigenvalue decomposition of KW is K W α W,K = ΛW,K α W,K

(34)

Each column of α is such α eigenvector of K . By selecting the first d columns of αW,K, denoted as αW,K d , the projection of the training data is W,K

w,K

W

T W,K = K wαdW,K

(25)

(35)

For the query sample ϕ(xq), the projecting matrix is with entries Kwq (q, i) = ϕ(xq)Tϕ(xi)wi = K(q, i)wi, the projection is Kwq

(26)

TqW,K = KqwαdW,K

(36)

At last, the regression coefficient matrix θW,K is calculated by applying TW,K into eq 6. The final output of query sample is estimated by the following formula. yq̂W,K = Tq W,Kθ W,K + y ̅ w

w w

ϕ (xi)

∑ αiwϕw(xi)) = ϕw(xk)T

Using the kernel trick K(i, j) = ϕ(xi) ϕ(xj) and K (i,j) = ϕw(xi)Tϕw(xj), it can be easily seen that

N

ν

(ϕ (x k )

T

Just like KPCR, the eigenvalue matrix and eigenvector matrix of CW,K are difficult to obtain because the mapping function ϕ(x) does not have an explicit form. So such nonzero eigenvalue λw,K and its corresponding νw,K of CW,K that satisfy λw,Kνw,K = CW,Kνw,K will be found in the following procedures. What characteristic of the eigenvector νw,K has is that it can be viewed as a linear combination of ϕw(xi), i = 1,2, ..., N. w ,K

T

k = 1, 2, ..., N

N i=1

w

⎛ 1 × ⎜⎜ ⎝N

(24)

∑ (ϕw(xi))(ϕw(xi))T

w ,K

i=1

The data used for projection are not the weighted samples xwi = wixi (i = 1, 2, ..., N) but the weighted-mean centered samples xi(i = 1,2, ..., N). By applying eqs 6 and 7 and replacing corresponding parameters in LWPCR, we finally get the regression coefficient and the predicted outputs. In a word, compared to PCR, the difference lies in the following two aspects. First, at the centralization step, PCR uses the mean-removal method for both the input and output variables while LWPCR uses the weighted-mean-removal method for the input and output variables. Second, to calculate the covariance matrix and determine the projection directions, PCR uses the mean removed input samples while LWPCR uses that of the weighted-mean removed input samples multiplied by their corresponding similarity weights. 3.2. Locally Weighted Kernel Principal Component Regression (LWKPCR). Though LWPCR can achieve better performance than PCR, it still may not meet the prediction accuracy demand for strong nonlinear process. We can expect the algorithm to get better result if the Hilbert space features are utilized. So the algorithm of locally weighted kernel principal component regression (LWKPCR) is proposed as follows. Just like KPCR, in LWKPCR, the original input variables are first mapped into the high-dimensional space, that is ϕ: Rm → F, x → ϕ(x). To construct the locally weighted model, the new feature variable vector ϕ(x) should also be weighted-mean centered. So the procedure is similar to LWPCR. For simplicity, we assume this is satisfied, and then LWKPCR can be derived. In the Appendix, the detailed derivation of the form of LWKPCR will be given with the procedure of weighted-mean removal. We should emphasis that the ϕ(x) used in this subsection is weighted-mean removed. To determine the projection directions, KPCR uses the mapped high-dimension feature input. Different from that, LWKPCR will use the mapped highdimensional feature input multiplied by its corresponding weight. Here we have ϕw(xi) = wiϕ(xi), i = 1, 2, ..., N. Then the covariance matrix CW,K in LWKPCR can be calculated as CW,K =

(28)

(37) w

The output matrix Y is weighted-mean removed, and y ̅ is the weighted mean of the training output.

(27) D

dx.doi.org/10.1021/ie4041252 | Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Industrial & Engineering Chemistry Research

Article

4. JITL-BASED ONLINE SOFT SENSOR MODEL In general, a global linear PCR model or nonlinear KPCR model cannot function well when the process is time-variant as it is difficult for only one global model to specify the complexity and structure of the process. So it is necessary to use local model to divide the process region into small regions. And to test the importance of feature nonlinearity and sample nonlinearity, performance comparison of the soft sensor algorithms of PCR, KPCR, LWPCR, and LWKPCR should be carried out. As the latter two are local method, so to be fair the four algorithms should be tested under the same condition. So the just-in-time learning (JITL)31−35 strategy is adopted here. For each query sample, training samples are first selected from the historical data set, and then soft sensor models of PCR, KPCR, LWPCR, and LWKPCR are constructed within the selected training samples. 4.1. Similarity Selection and Local Soft-Sensor Model. To select relevant training samples, Euclidean distance is used to evaluate the similarity of query sample xq and the nth historical sample xn, we denote it as dn dn =

(xn − xq)T (xn − xq) ,

n = 1, 2, ..., L

paper include the KPCR and LWKPCR, which both map the inputs to new feature Hilbert space ϕ: Rm → F, x → ϕ(x). Thus, the distance dn in eq 43 is calculated in the mapping space, which can also be dealt with the kernel trick. For all the L historical data samples, we sort them by their weights to the query sample in descending order. Then the first N samples that are nearest to the query sample are selected as relevant data. To construct a predictive soft sensor model for this query sample, the N samples are treated as the training samples in the local model. After training samples are selected, soft sensor model will be constructed for each query sample within the selected training samples using the PCR, KPCR, LWPCR, and LWKPCR, respectively. The schematic diagram of the four soft sensor modeling methods is shown in Figure 2. 4.2. Parameter Optimization and Performance Evaluation. Among the above methods, except for PCR which is a linear regression, all the other three are nonlinear techniques. To obtain a good estimation model, parameter optimization is critical. Typically, a validation set is introduced for this procedure. In this way the training set is used to obtain the parameters of a model. Then the trained model is validated on the validation set to get the optimal parameters. At last, online prediction will be carried out by the model with the optimized parameters. In this study, the parameters involved in each algorithm can be shown in Table 2. σ is the parameter of similar selection, δ denotes the parameter of the Gaussian kernel function, N is the size of training samples selected for local model construction, and d is the number of components for dimension reduction. By carrying out some trialand-error experiments, we obtained the following optimization strategy: parameters N and d are set to proper values while parameter σ and δ are optimized by cross-validation. As will be discussed in case studies, the four algorithms are not so sensitive to parameter N in a relatively wide range, thus N will be fixed to a proper value in each case study. Generally, the first few principal components of PCA based algorithms can adequately represent the main feature of data. Moreover, in order to compare the four regression algorithms fairly, the same condition of component number is applied for these four algorithms. Considering the aforementioned factors, principal component number d is set to about half of the number of original variables. Here parameters N and d are fixed to proper values, and the other two parameters are optimized for each algorithms. What’s more, in the PCR and KPCR algorithms, different values of parameter σ do not alter the results. This is because parameter σ is only useful for selecting relevant samples for query sample in PCR and KPCR. While in LWPCR and LWKPCR, σ is not only used for selecting relevant samples, the weights calculated by it are also used in subsequent algorithms. So the parameters involving optimization in each algorithm are listed in Table 3. To get the optimal parameters, the root-mean-square error (RMSE) criterion is applied on the cross validation set, which is defined as

(38)

where L is the number of the total samples in the historical data set. Then a weight wn is assigned to data xn as wn = exp( −dn 2/σ 2),

n = 1, 2, ..., L

(39)

where σ is a tunable parameter. The weight wn deceases rapidly when σ is small and gradually when σ is large. This can be easily interpreted in Figure 1. For small σ, the variance of weight is

Figure 1. Characteristic curves of similarity function.

large. Thus, those samples that are near to the query sample are assigned with large weights, and they play more important roles in predicting the output of query sample. But if σ is chosen too large, the variance of weight is small. Weights of those samples near to the query sample are not distinguishable from those of the samples far away from the query sample. An extreme case, for example, weights of each sample is equal to 1when σ is infinite. In this case, LWPCR will degenerate into PCR. What above-discussed is to choose the similar samples using the original inputs. However, the algorithms discussed in this

N

RMSECV =

∑i =CV1 (yi ̂ − yi )2 NCV

(40)

where NCV is the number of samples in the cross validation set, ŷi and yi are the estimated and real values of the outputs, respectively. E

dx.doi.org/10.1021/ie4041252 | Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Industrial & Engineering Chemistry Research

Article

Figure 2. Schematic diagram of the JITL-based soft sensor modeling.

Table 2. Parameters Involved in the Algorithms algorithm

involved parameters

PCR, LWPCR KPCR, LWKPCR

σ, N, d σ, δ, N, d

Table 3. Optimized Parameters in Each Algorithm algorithm

optimized parameters

PCR KPCR LWPCR LWKPCR

δ σ σ, δ

Figure 3. Flowchart of the debutanizer column (Adapted with permission from ref 40. Copyright 2007 Springer).

Also to compare the performance of the four different algorithms, the RMSE criterion is applied as well in the testing data set, which will be denoted as RMSET.

variables of the soft sensor, including pressures, temperatures, and flows in the plant. Table 4 gives a detailed description of these variables. A total of 2393 data samples have been collected in this process. For model training, parameter optimization and model testing, the data set is divided into three parts: about half of the data for historical data set, one-sixth for cross validation set, and one-third for testing data set. In order to examine the data characteristic, Figure 4 shows the trend plots of the normalized input and output variables for the training data. Figure 4a−g are the trends of input variables, and part h is the characteristic of the output variable. From these subfigures, it is easy to see the nonlinear relations between the output variable and input variables. In addition, the process is changing gradually or rapidly

5. CASE STUDIES In this section, two case studies are provided to compare the prediction precision of the four regression methods. One is the debutanizer column and the other one is the fermentation process for penicillin production. 5.1. Debutanizer Column. 5.1.1. Description of the Debutanizer Column. The debutanizer column is a part of industrial application of refinery process for desulfuration and naphtha split, which aims to remove propane and butane from the naphtha stream.39 A flowchart of this process is given in Figure 3. Here the output variable is the concentration of butane in the bottom of the debutanizer. There are in total seven input F

dx.doi.org/10.1021/ie4041252 | Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Industrial & Engineering Chemistry Research

Article

on the cross validation set are calculated and the optimal parameters that minimize the RMSEs are obtained. At last, the models with optimal parameter are used online to predict the output variable of query samples. For the modeling results, Table 5 shows the selected optimal parameters and the corresponding RMSEs on the cross validation

Table 4. Input Variables in the Debutanizer Column input variables

variable description

x1 x2 x3 x4 x5 x6 x7

top temperature top pressure reflux flow flow to next process 6th tray temperature bottom temperature bottom pressure

Table 5. Optimal Parameters and RMSEs of the Four Algorithms on the Debutanizer Column σ

method

as the process variables vary regularly. To build a highly accurate soft sensor model, the JITL-based model methods are necessary. 5.1.2. Results of Soft Sensor Models. To build the soft sensor model, several parameters are determined first. The number of samples for the local model is chosen as 30 because too few samples will result in underfitting problems, while too many samples improve the performance little but increase the complexity of the model. The number of principal components is determined as 3, which can explain most of the process information. As the similarity parameter σ does not affect the similar samples in PCR and KPCR, the value is set to 0.6 for these two algorithms. So in KPCR, the search range of the kernel function parameter δ is [0.03 0.06 0.1 0.3 0.6 0.9 1.2 1.5 1.8 3.0 5.0 10 50 100]. In LWPCR, the only optimized parameter is the similarity parameter σ, and the search range is the same as δ in KPCR. In LWKPCR, similarity parameter σ and kernel function parameter δ are to be optimized, and both the search ranges are from [0.03 0.06 0.1 0.3 0.6 0.9 1.2 1.5 1.8 3.0 5.0 10 50 100]. First, the local JITL-based models of PCR, KPCR, LWPCR, and LWKPCR are built using different combinations of parameters involving optimization in each algorithm. Then the RMSEs

PCR KPCR LWPCR LWKPCR

δ 5.0

0.3 0.1

0.6

RMSECV

RMSET

0.1054 0.0914 0.0739 0.0622

0.1041 0.0933 0.0670 0.0577

set and the testing set. According to the RMSE criteria, the RMSE value of the JITL-based LWKPCR is the smallest and that of the JITL-based PCR is the largest both in the cross validation set and testing set. As JITL-based KPCR takes the kernel trick into consideration while JITL-based LWPCR takes the locally weighted trick into consideration, they can increase predictive ability compared to JITL-based PCR. JITL-based LWKPCR comprehensively integrates the feature nonlinearity and the different weights of samples; thus, it gives the best predictive result. More detailed comparison of the quality prediction results in the testing set of the four soft sensor methods is shown in Figure 5. Furthermore, the prediction errors of these four methods are depicted in Figure 6. It is easy to see that the online prediction of the JITL-based LWKPCR matches well with the actual measurement of the bottom butane concentration while the simple JITL-based PCR leads to poor prediction results. There are significant deviations between the actual and predicted

Figure 4. Trend plots of input and output variables for the training data on the debutanizer column. G

dx.doi.org/10.1021/ie4041252 | Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Industrial & Engineering Chemistry Research

Article

Figure 5. Prediction results of the four algorithms on the debutanizer column.

Figure 6. Error results of the four algorithms on the debutanizer column.

Since the proposed algorithm constructs local models in the online procedure, the real-time capability needs to be considered. Here the computational running time of LWKPCR is analyzed. Figure 7 gives the online average CPU running time for each query sample by varying size N of local model samples. The configuration of the computer is as follows: OS Windows 7 (32 bit);

values across the whole process using JITL-based PCR. However, the JITL-based KPCR and JITL-based LWPCR perform better than the JITL-based PCR for the reasons mentioned above. All the results indicate the superiority of the presented LWKPCR method in soft sensor modeling of quality variable prediction. H

dx.doi.org/10.1021/ie4041252 | Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Industrial & Engineering Chemistry Research

Article

Table 6. RMSEs of Each Algorithm under Different d Values on the Debutanizer Column method d

PCR

KPCR

LWPCR

LWKPCR

1 2 3 4 5 6

0.1117 0.1088 0.1041 0.0965 0.0913 0.0865

0.1080 0.1016 0.0933 0.0900 0.0870 0.0817

0.0661 0.0657 0.0670 0.0644 0.0626 0.0615

0.0581 0.0583 0.0577 0.0559 0.0562 0.0563

Figure 7. Average CPU running time for query samples of LWKPCR for debutanizer column.

CPU AMD Athlon 64 X2 Dual Core Processor 4600+ (2.4 GHz); RAM 3.5 GB; MATLAB version 2013a. From the results, the predicting time for a query sample is far smaller than 1 s, which can meet actual requirement in reality. Moreover, the average CPU running time for a query sample is 0.0148 s when size N of local model samples is 30. To test the parameter sensitivity of LWKPCR, we also provide the prediction RMSE results by varying parameter N and d, which is exhibited in Figure 8. Figure 8a and b are the results of Figure 9. Flow sheet of the process (Adapted with permission from ref 7. Copyright 2012 Elsevier).

this study are given as default in the webpage except that the sampling interval is 1 h and the simulation time is 1000 h. There are in total 16 measured variables in the simulation plant. For soft sensor model construction, the penicillin concentration is chosen as the output variable. Eleven variables highly related to the output are selected as the input variables. The description of each variable is listed in Table 7. Table 7. Input Variables in the Fermentation Process Figure 8. Parameter sensitivity of LWKPCR for debutanizer column.

sensitivity analysis of N and d, respectively. Clearly, the RMSEs change little when N and d varies. These results demonstrate that the algorithm is not very sensitive to these two parameters. The results also demonstrate that N and d have been set properly from the view of prediction error. Additionally, an analogous parameter sensitivity study for the other methods is carried out to check the superiority of LWKPCR. Here we take parameter d for example. The results are displayed in Table 6. From Table 6, we can see that under the same parameter of d, LWKPCR can achieve better prediction results than the other three methods. 5.2. Fermentation Process for Penicillin Production. 5.2.1. Description of the Fermentation Process. The fermentation process for penicillin production is a biochemical batch benchmark for soft sensor and fault diagnosis algorithm. A simulation tool of this process can be found at the Web site http://simulator.iit.edu/web/pensim/simul.html. The PenSim simulator contains a fermenter where the biological reaction takes place. Figure 9 shows a flow sheet of the process; a detailed description of the process can be found in ref 41. For different demands, the simulator provides several settings including the controller, process duration, sampling rates, etc. The settings in

input variables

variable description

units

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11

aeration rate agitator power substrate feed rate substrate feed temperature substrate concentration dissolved oxygen concentration biomass concentration culture volume carbon dioxide concentration pH fermentor temperature

L/h W L/h K g/L g/L g/L L g/L K

In this study, a total of 1000 samples are generated from this process. To construct the local models, half of the data are used as the historical data set, about one-sixth of the whole data are for the validation data set, and the rest are for testing purposes. To test the characteristic of the input and output variables, Figure 10 gives the detailed information on them in the training data set. Figure 10a−k describe the trend plots of the input variables while part l shows the trend plot of the output variable. As shown in the figure, the process is divided into three phases: the lag phase, exponential phase, and stationary phase. Thus, strong nonlinearity and time variance exist in this process. To construct a I

dx.doi.org/10.1021/ie4041252 | Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Industrial & Engineering Chemistry Research

Article

Figure 10. Trend plots of input and output variables of the training data on the fermentation process.

the prediction and error results on the testing set, respectively. From these figures, there are deviations between the predicted and actual output values using JITL-based PCR, KPCR, and LWPCR. But for JITL-based LWKPCR, nearly no deviations can be found between the predicted and true output values; the two curves match well with each other. Just like the conclusion obtained in the comparison of RMSEs, the JITL-based PCR approach shows inferior predictive accuracy while the JITLbased KPCR and JITL-based LWPCR give relatively better predictive results, but a superior predictive result is obtained by JITL-based LWKPCR. This is because the proposed method integrates the two forms of feature nonlinear mapping and different weights of training samples into the model simultaneously. But there is an interesting phenomenon between these two case studies. In the debutanizer column, the JITL-based LWPCR performs better than the JITL-based KPCR in quality prediction while the situation is opposite in the fermentation process. This is interpretable like this: in the debutanizer column, the nonlinearity mainly lies in the different weights of training samples while the nonlinear mapping of feature takes a secondary role, hence the JITL-based LWPCR can get a better result. On the contrary, the condition in the fermentation process is opposite. But regardless of the debutanizer column or the fermentation process, the JITL-based LWKPCR performs the best as it takes two forms of nonlinearity simultaneously. All these demonstrate the proposed approach has an advantage in dealing with stronger nonlinearity in soft sensor quality prediction for nonlinear process. Similarly, the evaluation of real-time capability of the LWKPCR algorithm is carried out on the penicillin process. Figure 13 gives the online average CPU running time for each query sample by varying size N of local model samples. As N varies from 10 to 100, the average CPU running time changes

precise soft sensor model, the JITL-based algorithm should be applied. 5.2.2. Results of Soft Sensor Models. Similarly, the number of samples for the local model is chosen as 30. The component number is selected as six to explain most of the process data information. Besides, the optimized parameters and their corresponding search ranges in each algorithm are similar to those used in the case of debutanization column. For the soft sensor procedure, the local JITL-based models of PCR, KPCR, LWPCR, and LWKPCR are first built using different combinations of parameters that involve optimization in each algorithm. Then the RMSE on the cross validation set is calculated and the optimal parameters that minimize the RMSE are obtained. Lastly, the models with optimal parameters are used online to predict the output variable of testing data set. The results of parameter selection and RMSEs of the algorithms on the fermentation process for penicillin production are shown in Table 8. Similarly, in the comparison of RMSEs, Table 8. Optimal Parameters and RMSEs of the Four Algorithms on the Fermentation Process method PCR KPCR LWPCR LWKPCR

σ

δ 50

10 0.1

0.9

RMSECV

RMSET

4.72 × 10−2 1.95 × 10−2 2.59 × 10−2 0.80 × 10−2

5.63 × 10−2 1.85 × 10−2 3.41 × 10−2 1.10 × 10−2

JITL-based PCR gives the worst predictive accuracy and JITLbased LWKPCR shows the best result. Compared to the JITLbased PCR, JITL-based KPCR and JITL-based LWPCR can increase the predictive ability as they incorporate one form of nonlinear trick. But comprehensive JITL-based LWKPCR performs better than them. In detail, Figures 11 and 12 present J

dx.doi.org/10.1021/ie4041252 | Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Industrial & Engineering Chemistry Research

Article

Figure 11. Prediction results of the four algorithms on the fermentation process.

Figure 12. Error results of the four algorithms on the fermentation process.

analysis of N and d, respectively. Similarly, the RMSEs change little when the two parameters vary. These also indicate that the LWKPCR algorithm is not so sensitive to the two parameters. From the view of prediction error, the results also demonstrate that N and d is set properly for soft sensor model.

from 0.0079 to 0.0279 s, which can also meet industrial requirement. So it is suitable to set N as 30 for online prediction. Again, to investigate the parameter sensitivity of LWKPCR, Figure 14 provide the prediction RMSE results by varying parameter N and d. Figure 14a and b are the results of sensitivity K

dx.doi.org/10.1021/ie4041252 | Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Industrial & Engineering Chemistry Research

Article

comparative studies of LWKPCR with PCR, KPCR, and LWPCR have demonstrated that the JITL-based LWKPCR approach is of much higher prediction accuracy and reliability than the others. Thus, the presented soft sensor technique can be implemented online for quality prediction in other processes. Also, this method can be extended to other linear regression methods like ridge regression (RR) and partial least-squares (PLS) regression, etc.



APPENDIX: DETAILED DERIVATION OF WEIGHTED-MEAN CENTERED LWKPCR In section 3.2, the form of LWKPCR has been given under the premise that the mapping features ϕ(xi), i = 1, 2, ..., N are weighted-mean removed in advance. The complete forms of the weighted-mean centered LWKPCR will be presented in this section. For each training and query input vectors, they are first mapped to nonlinear space as x → ϕ(x). Here the weighted mean of input and output variables will be calculated and the mapped input will be weighted-mean removed.

Figure 13. Average CPU running time for query samples of LWKPCR for the penicillin process.

N

w

ϕ(x) =

∑i = 1 ϕ(xi)wi N

∑i = 1 wi N

y̅w =

∑i = 1 yw i i N

∑i = 1 wi

(41)

Just as mentioned in LWPCR, the weight vector w = [w1, w2, ..., wN]T is often normalized to satisfy ∑Ni=1 wi = 1 as normalization does not change the result of the algorithm. So the normalized weight vector will be used in LWKPCR.

Figure 14. Parameter sensitivity of LWKPCR for penicillin process.

Table 9. RMSEs of Each Algorithm under Different d Values on the Fermentation Process

w

ϕc(xi) = ϕ(xi) − ϕ(x) , i = 1, 2, ..., N

method

w

d

PCR

KPCR

LWPCR

LWKPCR

1 2 3 4 5 6 7

5.63 × 10−2 6.05 × 10−2 5.28 × 10−2 5.61 × 10−2 5.03 × 10−2 5.63 × 10−2 5.56 × 10−2

11.51 × 10−2 4.00 × 10−2 3.48 × 10−2 2.42 × 10−2 2.21 × 10−2 1.95 × 10−2 1.91 × 10−2

5.71 × 10−2 5.63 × 10−2 5.10 × 10−2 4.24 × 10−2 4.58 × 10−2 3.41 × 10−2 3.38 × 10−2

1.07 × 10−2 1.14 × 10−2 1.11 × 10−2 1.11 × 10−2 1.10 × 10−2 1.10 × 10−2 1.10 × 10−2

ϕc(xq) = ϕ(xq) − ϕ(x) Yc = Y − 1N y ̅ w

(42)

where subscript c represents the vector or matrix that are weighted-mean centered. Then the weighted vectors of each training sample with weighted-mean centralization are w

ϕcw(xi) = wiϕc(xi) = wi(ϕ(xi) − ϕ(x) ), i = 1, 2, ..., N

Similarly, a comparative study of parameter sensitivity of the other methods is carried out to check the superiority of LWKPCR. Table 9 gives the sensitivity results of parameter d. It is clear that LWKPCR is super to the other three methods under different values of d.

(43)

Thus, the weighted-mean centered characteristic matrix for eigenvalue decomposition is KcW (i , j) = ϕcw(xi)T ϕcw(xj) N

N

= {wi(ϕ(xi) −

6. CONCLUSION In this study, a JITL-based adaptive soft sensor modeling method is presented for nonlinear and time-variant process to precisely estimate key quality variables. As a JITL-based method, proposed locally weighted kernel principal component regression (LWKPCR) has the merit of coping with the time variance problem. Meanwhile, as the kernel trick and local weighted trick are simultaneously integrated into the model, this method has another advantage of solving strong nonlinearity over the KPCR and LWPCR, each of which only takes one form of nonlinearities. Consequently, the quality variables can be accurately predicted with the LWKPCR approach. To test the validity of this method, two case studies on the debutanizer column and the fermentation process of penicillin production are carried out. Meanwhile,

∑ wpϕ(xp))T }·{wj(ϕ(xj) − ∑ wqϕ(xq))} q=1

p=1

N

N

= {(ϕw(xi) − wi ∑ ϕw(xp))T }·{(ϕw(xj) − wj ∑ ϕw(xq))} q=1

p=1 N

= ϕw(xi)T ϕw(xj) − wi ∑ ϕw(xp)T ϕw(xj) p=1 N

N

N

− wj ∑ ϕw(xi)T ϕw(xq) + ww i j ∑ q=1

∑ ϕw(xp)T ϕw(xq)

p=1 q=1 N

N

= K W (i , j) − wi ∑ K W (p , j) − wj ∑ K W (i , q) p=1 N

+ ww i j ∑

∑ KW(p , q)

p=1 q=1

L

q=1

N

(44)

dx.doi.org/10.1021/ie4041252 | Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Industrial & Engineering Chemistry Research

Article

So

KcW

w1 ... w1 ⎤ ⎡ w1 w2 ⎢w w ⎥ w2 ... w2 W ⎥K − K W ⎢ 1 2 ⎢⋮ ⋮ ⋮ ⋱ ⋮⎥ ⎢ ⎥ wN ... wN ⎦ ⎣ w1 w2 ... w1 ⎤ ⎡ w1 w2 ... wN ⎤ ... w2 ⎥ W ⎢ w1 w2 ... wN ⎥ ⎥ ⎥K ⎢ ⋱ ⋮ ⎥ ⎢⋮ ⋮ ⋱ ⋮ ⎥ ⎥ ⎥ ⎢ ... wN ⎦ ⎣ w1 w2 ... wN ⎦

⎡ w1 ⎢w 2 = KW − ⎢ ⎢⋮ ⎢ ⎣ wN ⎡ w1 w1 ⎢w w 2 2 +⎢ ⎢⋮ ⋮ ⎢ ⎣ wN wN

Up to now, the preparatory work has done as the centered w characteristic matrix KW C , the training projection matrix Kc , and w query projection vector Kc,q have been obtained. First, the eigenvalue decomposition is performed as

... wN ⎤ ... wN ⎥ ⎥ ⋱ ⋮⎥ ⎥ ... wN ⎦

ΛcW,K ΑcW,K = KcW ΑcW,K

Then the first d columns of AW,K are chosen to form new c projection directions matrix AW,K c,d , each column of which is normalized to divide the square root of its corresponding eigenvalue. Thus, the projections of the training data and query data are

(45)

Define

W = w·11 × N

⎡ w1 w1 ... w1 ⎤ ⎢ w w ... w ⎥ 2 2 2⎥ =⎢ ⎢⋮ ⋮ ⋱ ⋮⎥ ⎢ w w ... w ⎥ ⎣ N N N⎦

=K

W

− WK

W

W

T

W

− K W + WK W

T



(46)

Then the normalized eigenvectors of KW c can be obtained. Assume one eigenvector is denoted as αw,K c , and then, the projection of the kth training sample to the projection direction is

N

=

j=1

p=1 N

− wj ∑ wqϕ(xk)T ϕ(xq) + wj ∑ q=1

=

N



∑ wpwqϕ(xp)T ϕ(xq)}

p=1 q=1

N

N

N

p=1 N

+ wj ∑

q=1

N

N

N

∑ wpwqK (p , q)} = ∑ αj{K w(k , j) − ∑ wpK w(p , j)

p=1 q=1

j=1 N

N

− wj ∑ K w(k , q) + wj ∑ q=1

p=1

N

∑ wpK w(p , q)}

p=1 q=1

= Kcw(k , : )·αcw ,K ,

k = 1, 2, ..., N

(47)

Hence we have N

Kcw(k ,

w

j) = K ( k , j) −

∑ wpK

N w

(p , j) − wj ∑ K w(k , q)

p=1 N

+ wj ∑

(55)

AUTHOR INFORMATION

q=1

N

∑ wpK w(p , q)

p=1 q=1

(48)

So the centered projection matrix of the training data is Kcw = K w − WTK w − K wWT + WTK wWT

(49)

Also the centered projection row vector of the query data is Kcw, q = Kqw − w TK w − KqwWT + w TK wWT

REFERENCES

(1) Ge, Z.; Song, Z.; Gao, F. Review of recent research on data-based process monitoring. Ind. Eng. Chem. Res. 2013, 52 (10), 3543−3562. (2) Xu, X.; Xiao, F.; Wang, S. Enhanced chiller sensor fault detection, diagnosis and estimation using wavelet analysis and principal component analysis methods. Applied Thermal Engineering 2008, 28 (2), 226−237. (3) Liu, Y.; Hu, N. P.; Wang, H. Q.; Li, P. Soft chemical analyzer development using adaptive least-squares support vector regression with selective pruning and variable moving window size. Ind. Eng. Chem. Res. 2009, 48, 5731−5741. (4) Khatibisepehr, S.; Huang, B.; Khare, S. Design of inferential sensors in the process industry: A review of Bayesian methods. Journal of Process Control 2013, 23 (10), 1575−1596. (5) Wang, S.; Cui, J. Sensor-fault detection, diagnosis and estimation for centrifugal chiller systems using principal-component analysis method. Appl. Energy 2005, 82 (3), 197−213. (6) Ge, Z. Q.; Gao, F. R.; Song, Z. H. Mixture probabilistic PCR model for soft sensing of multimode processes. Chem. Intell. Lab. Syst. 2011, 105, 91−105. (7) Yu, J. A Bayesian inference based two-stage support vector regression framework for soft sensor development in batch bioprocesses. Comput. Chem. Eng. 2012, 41, 134−144. (8) Kim, S.; Okajima, R.; Kano, M.; Hasebe, S. Development of softsensor using locally weighted PLS with adaptive similarity measure. Chemometrics and Intelligent Laboratory Systems 2013, 124, 43−49. (9) Ge, Z. Q.; Chen, T.; Song, Z. H. Quality prediction for polypropylene production process based on CLGPR model. Control Engineering Practice 2011, 19 (5), 423−432.

∑ αj{wK j (k , j) − wj ∑ wpK (p , j) − wj ∑ wqK (k , q) j=1

W,K yĉW,K = TcW,K + y̅w , q θc ,q

ACKNOWLEDGMENTS This work was supported in part by the National Natural Science Foundation of China (NSFC) (61370029), Project National 973 (2012CB720500), and the Fundamental Research Funds for the Central Universities (2013QNA5016).

q=1

∑ αj{wjϕ(xk)T ϕ(xj) − ∑ wpwjϕ(xp)T ϕ(xj) N

(54)



∑ wqϕ(xq))}

N

j=1

θcW,K = ((TcW,K )T TcW,K )−1(TcW,K )T Yc

The authors declare no competing financial interest.

N

∑ wpϕ(xp))T }·∑ {αjwj(ϕ(xj) − p=1

(53)

Notes

j=1

N

w W,K TcW,K , q = Kc , q Αc , d

*Tel.: +86-87951442. E-mail address: [email protected] (Z.G.). *E-mail: [email protected] (Z.S.).

N

N

(52)

Corresponding Authors

tk = ϕc (xk)T ν K = ϕc (xk)T ∑ αjϕcw(xj) = {(ϕ(xk) −

TcW,K = Kcw ΑcW,K ,d

So the regression procedure is

then we have KcW

(51)

(50) M

dx.doi.org/10.1021/ie4041252 | Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Industrial & Engineering Chemistry Research

Article

(32) Cybenko, G. Just-in-time learning and estimation. Nato asi series for computer and systems sciences 1996, 153, 423−434. (33) Cheng, C.; Chiu, M.-S. A new data-based methodology for nonlinear process modeling. Chem. Eng. Sci. 2004, 59 (13), 2801−2810. (34) Bontempi, G.; Bersini, H.; Birattari, M. The local paradigm for modeling and control: from neuro-fuzzy to lazy learning. Fuzzy sets and systems 2001, 121 (1), 59−72. (35) Aha, D. W.; Kibler, D.; Albert, M. K. Instance-based learning algorithms. Machine learning 1991, 6 (1), 37−66. (36) Cleveland, W. S.; Devlin, S. J. Locally weighted regression: an approach to regression analysis by local fitting. Journal of the American Statistical Association 1988, 83 (403), 596−610. (37) Schaal, S.; Atkeson, C. G.; Vijayakumar, S. Scalable techniques from nonparametric statistics for real time robot learning. Applied Intelligence 2002, 17 (1), 49−60. (38) Smola, A. J.; Schölkopf, B. Learning with kernels; Citeseer, 1998. (39) Fortuna, L.; Graziani, S.; Xibilia, M. G. Soft sensors for product quality monitoring in debutanizer distillation columns. Control Engineering Practice 2005, 13 (4), 499−508. (40) Fortuna, L. Soft sensors for monitoring and control of industrial processes; Springer, 2007. (41) Birol, G.; Ü ndey, C.; Cinar, A. A modular simulation package for fed-batch fermentation: penicillin production. Comput. Chem. Eng. 2002, 26 (11), 1553−1565.

(10) Undey, C.; Tatara, E.; Cinar, A. Intelligent real-time performance monitoring and quality prediction for batch/fed-batch cultivations. J. Biotechnol. 2004, 108 (1), 61−77. (11) Facco, P.; Bezzo, F.; Barolo, M. Nearest-Neighbor Method for the Automatic Maintenance of Multivariate Statistical Soft Sensors in Batch Processing. Ind. Eng. Chem. Res. 2010, 49 (5), 2336−2347. (12) Huixin, T.; Zhizhong, M.; Shu, W.; Kun, L. Application of Genetic Algorithm Combined with BP Neural Network in Soft Sensor of Molten Steel Temperature. The Sixth World Congress on Intelligent Control and Automation, Dalian, June 21−23, 2006; pp 7742−7745. (13) Kano, M.; Nakagawa, Y. Data-based process monitoring, process control, and quality improvement: Recent developments and applications in steel industry. Comput. Chem. Eng. 2008, 32 (1−2), 12−24. (14) Bosca, S.; Fissore, D. Design and validation of an innovative softsensor for pharmaceuticals freeze-drying monitoring. Chem. Eng. Sci. 2011, 66 (21), 5127−5136. (15) Nakagawa, H.; Tajima, T.; Kano, M.; Kim, S.; Hasebe, S.; Suzuki, T.; Nakagami, H. Evaluation of Infrared-Reflection Absorption Spectroscopy Measurement and Locally Weighted Partial Least-Squares for Rapid Analysis of Residual Drug Substances in Cleaning Processes. Anal. Chem. 2012, 84 (8), 3820−3826. (16) Yuan, X.; Zhang, H.; Song, Z. A soft-sensor for estimating copper quality by image analysis technology. 2013 10th IEEE International Conference on Control and Automation (ICCA), Hangzou, June 12−14, 2013; pp 991−996. (17) Ge, Z.; Song, Z.; Zhao, L.; Gao, F. Two-level PLS model for quality prediction of multiphase batch processes. Chemometrics and Intelligent Laboratory Systems 2014, 130, 29−36. (18) Rani, A.; Singh, V.; Gupta, J. Development of soft sensor for neural network based control of distillation column. Isa Transactions 2013, 52, 438−449. (19) Ge, Z. Q.; Song, Z. H.; Gao, F. R. Nonlinear quality prediction for multiphase batch processes. AIChE J. 2012, 58, 1778−1787. (20) Hotelling, H. Analysis of a complex of statistical variables into principal components. Journal of educational psychology 1933, 24 (6), 417. (21) Kadlec, P.; Gabrys, B.; Strandt, S. Data-driven soft sensors in the process industry. Comput. Chem. Eng. 2009, 33 (4), 795−814. (22) Hotelling, H. The relations of the newer multivariate statistical methods to factor analysis. British Journal of Statistical Psycholo gy 1957, 10 (2), 69−79. (23) Kendall, M. G. A Course in Multivariate Analysis; Charles Griffin and Company, Ltd.: London, England, 1957. (24) Schö lkopf, B.; Smola, A.; Müller, K.-R. Kernel principal component analysis. In Artificial Neural NetworksICANN’97, Gerstner, W., Germond, A., Hasler, M., Nicoud, J.-D., Eds.; Springer: Berlin Heidelberg, 1997; Vol. 1327, pp 583−588. (25) Rosipal, R.; Girolami, M.; Trejo, L. J.; Cichocki, A. Kernel PCA for feature extraction and de-noising in nonlinear regression. Neural Computing & Applications 2001, 10 (3), 231−243. (26) Rosipal, R.; Trejo, L. J. Kernel partial least squares regression in reproducing kernel hilbert space. Journal of Machine Learning Research 2002, 2, 97−123. (27) Dayal, B. S.; MacGregor, J. F. Recursive exponentially weighted PLS and its applications to adaptive control and prediction. Journal of Process Control 1997, 7 (3), 169−179. (28) Ahmed, F.; Nazir, S.; Yeo, Y. K. Y. A new soft sensor based on recursive partial least squares for online melt index predictions in gradechanging hdpe operations. Chemical Product and Process Modeling 2009, DOI: 10.2202/1934-2659.1271. (29) Wold, S. Exponentially weighted moving principal components analysis and projections to latent structures. Chemometrics and Intelligent Laboratory Systems 1994, 23 (1), 149−161. (30) Kaneko, H.; Arakawa, M.; Funatsu, K. Applicability domains and accuracy of prediction of soft sensor models. AIChE J. 2011, 57 (6), 1506−1513. (31) Bittanti, S.; Picci, G. Identification, adaptation, learning: the science of learning models from data; Springer, 1996; Vol. 153. N

dx.doi.org/10.1021/ie4041252 | Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX