Nonlinear PLS Integrated with Error-Based LSSVM and Its Application

Nov 17, 2012 - ... of Alternate Electrical Power System with Renewable Energy Sources, ... The root-mean-square errors (RMSEs) on the training and tes...
0 downloads 0 Views 1MB Size
Article pubs.acs.org/IECR

Nonlinear PLS Integrated with Error-Based LSSVM and Its Application to NOx Modeling You Lv,* Jizhen Liu, and Tingting Yang The State Key Laboratory of Alternate Electrical Power System with Renewable Energy Sources, North China Electric Power University, Changping District, 102206 Beijing, China ABSTRACT: This article presents a novel nonlinear partial least-squares (PLS) method to address a nonlinear problem with input collinearity in an industrial process. The proposed method integrates an inner least-squares support vector machine (LSSVM) function with an outer linear PLS framework. First, the input and output latent variables are extracted to eliminate the collinearity through PLS projection, and then LSSVM is used to construct nonlinear relation between each pair of latent variables. Moreover, a weight-updating procedure is incorporated to enhance the accuracy of prediction. Then, training and predicting algorithms based on modified nonlinear iterative partial least-squares (NIPALS) steps are also described in detail. The performance of the new method is also investigated with a benchmark data set. Finally, this approach is applied to a real industrial process to predict the NOx emissions of a coal-fired boiler. The root-mean-square errors (RMSEs) on the training and testing data decreased to only 12.6632 and 37.6609, respectively. Compared with the original linear PLS and other kinds of nonlinear PLS methods, a reduction of approximately 40.8−47.4% in the prediction errors is attained. The results reveal that the new approach is capable of modeling the nonlinear relation of NOx emissions with other process parameters and improving the prediction performance.

1. INTRODUCTION To ensure the control performance and economical and secure operation in chemical engineering processes, key variables such as product quality and emission components should be measured accurately and reliably.1,2 However, these important process variables are always difficult to measure using hardwarebased sensors because of economic and technical limitation.3 Therefore, it is necessary to establish a model to predict the key process variables using other relevant parameters. On the other hand, some of these variables, such as product quality, can be optimized by carefully tuning operating parameters such as valve openings using genetic algorithms, expert systems, or other artificial intelligence techniques. The core task of this type of optimization system is to establish a model to predict these variables. In other words, the relationships between the variable to be optimized and other various related operating parameters should be well-known at first.4 A first-principles physical or mechanistic model can be obtained from detailed process knowledge of the chemical and physical phenomena underlying process operation. However, because of the complexity of industrial processes, such mechanistic models either require a great deal of effort and time to develop or are too simplistic to be implemented accurately in practice.5,6 In particular, the development of rigorous theoretical models might not be practical for complex process if it is required to solve a large number of complicated differential and algebraic equations with unknown parameters. On the other hand, data-driven models have been widely used as an alternative to mechanistic models, because they require less specific knowledge of the process and usually supply precise information for a particular operation region.6 Data-driven modeling techniques require experimental data to be collected for those variables believed to be representative of process behavior. Statistical regression techniques and artificial neural networks © 2012 American Chemical Society

(ANNs) are now routinely used in the process industries for constructing data-driven models. To build an accurate neural network model, ideally, large amounts of training data are required. In fact, one is often faced with the problem that the ratio of the size of data samples to variables is small. This is because experiments should be well designed by changing and setting individual process parameters, which is time-consuming and expensive. In addition, in real chemical or thermal industrial processes, large numbers of variables are generally highly correlated and including all of the variables in a model increases the model complexity and even leads to an overfitting problem.7,8 In recent years, partial-least-squares (PLS) regression has become a widely used method for dealing with the problem of correlated inputs and limited observations. PLS is a multipleregression method that projects input and output information into a latent space and extracts a number of principal components with an orthogonal structure while capturing the vast majority of the variance from the original data.9−11 However, PLS is a linear method that is inappropriate for describing significant nonlinear characteristics. To address the issue of nonlinearity, a number of methods have been proposed that can be roughly classified into two categories. The first category is kernel-based PLS (KPLS) with kernel-based learning. The main idea of KPLS is first to map each sample from the original input space into a high-dimensional feature space through nonlinear mapping and then to establish a linear PLS model in that mapped space.3,12−14 Another type of nonlinear PLS is to build an inner relationship between the input Received: Revised: Accepted: Published: 16092

February 28, 2012 October 25, 2012 November 16, 2012 November 17, 2012 dx.doi.org/10.1021/ie3005379 | Ind. Eng. Chem. Res. 2012, 51, 16092−16100

Industrial & Engineering Chemistry Research

Article

decomposed into input and output score vectors (t and u, respectively), input and output loading vectors (p and q, respectively) , and residual error matrices (E and F) as follows

and output latent variables in a nonlinear function while keeping the outer linear PLS framework unaltered. We mainly focus on the latter type of nonlinear PLS method in this article. Several versions of nonlinear function methods have been developed, such as quadratic PLS and neural network PLS (NNPLS). Wold et al.,15 in their pioneering study, described a quadratic PLS algorithm in which the quadratic function was used to replace the linear inner mapping. Qin and McAvoy16 introduced an NNPLS method to identify the relationship between the input and output scores. Wilson et al.17 employed a radial-basis-function neural network to replace the sigmoidal function of NNPLS as the inner mapping function. Baffi et al.18,19 developed the quadratic and neural network nonlinear PLS algorithms with a weight-updating procedure based on the minimization of inner mapping errors. Moreover, some other nonlinear PLS methods, for example, the nested nonlinear PLS algorithm20 and a series of fuzzy PLS algorithms in which the Takagi−Sugeno−Kang (TSK) fuzzy inference system was used as the inner mapping,21,22 have also been proposed. Despite extensive research focusing on nonlinear PLS techniques, some drawbacks still exist. For example, neural network PLS is expected to be adequate to fit complex nonlinearity; however, the extra flexibility can become a source of overfitting and thus lead to serious prediction error.23 The support vector machine (SVM) approach has been successful in mapping the complex and highly nonlinear relationships between system input and output.24 Compared with ANNs and other nonlinear methods, SVM is implemented based on the structural risk minimization principle, which gives it the advantages of avoiding overfitting and excellent generalization properties. The least-squares support vector machine (LSSVM) technique converts inequality constraints into equality constraints and, at the same time, employs a sum of square errors as the cost function. Thus, the solution can be obtained directly by solving a linear set of equations, which reduces the computational complexity.25,26 In this work, the LSSVM method is proposed to describe the inner nonlinearity while the outer linear PLS framework remains unchanged. In addition, an error-based weight-updating procedure is also integrated to obtain a new error-based LSSVM PLS algorithm (denoted as the EBLSSVMPLS algorithm). Subsequently, the new algorithm is assessed with a benchmark data set and then used to model the relation between NOx emissions and operating parameters of a coal-fired boiler. Additionally, performance comparisons with other nonlinear PLS methods were also conducted. The results show that the proposed approach is effective and can predict NOx emissions with high accuracy. The rest of this article is organized as follows: Section 2 introduces the construction process of the new EBLSSVMPLS algorithm in detail. A case simulation is conducted in section 3, and section 4 describes its application to NOx modeling. Section 5 concludes the article.

A

X=

A

∑ tjpj T + E

Y=

∑ u jq j T + F

j=1

(1)

j=1

where A is the number of latent variables. Considering the nonlinear iterative partial least-squares (NIPALS) algorithm,27 the following relation can be obtained pj T = tj TX/tj Ttj

(2)

tj = Xwj

(3)

where wj is the weight vector. A corresponding inner model is then constructed to relate each pair of score vectors

u j = f(tj) + ej

(4)

where f(·) refers to the inner relation between score vectors t and u and e is the residual. For linear PLS, f(tj) = bjtj, where bj is a constant coefficient. When f(·) represents a nonlinear function, then nonlinear PLS is formed. 2.2. Inner LSSVM Function. Because nonlinear characteristics inherently exist in many practical industrial processes, it is necessary to use a nonlinear function to establish the inner relation between each latent variables pair. In this article, LSSVM is introduced to describe the nonlinear structures between score vectors t and u. Compared with standard SVM, which is a quadratic programming (QP) problem and timeconsuming to solve, LSSVM considers equality constraints instead of inequality constraints. As a result, the solution can be obtained from solving a linear system of equations, and the computational complexity is reduced. Given a certain extracted pair of latent variables t and u, which consists of N training samples {ti, ui}Ni=1, where ti, ui ∈ 9 , t = [t1, ..., tN]T, and u = [u1, ..., uN]T, the optimization problem can be described in the form N

min J(ω , ξ) =

ω, b , ξ

s.t.

1 T 1 ω ω + c ∑ ξi 2 2 2 i=1

ui = ωT φ(ti) + b + ξi ,

i = 1, 2, ..., N

(5)

where φ(·): 9 → 9 h is a mostly nonlinear function mapping the input data into a higher-dimensional feature space, weight vector ω ∈ 9 h , error variable ξ = [ξ1, ..., ξN]T, b is a constant, and c is the penalty factor. For the optimization problem, the Lagrangian function is introduced as

2. METHODS 2.1. Outer PLS Framework. PLS is applied as the outer framework to extract pairs of latent variables (score vectors) from the collinear problem while acquiring a large amount of the variations. The data set is assumed to consist of an input (predictor) variable matrix X ∈ 9 N × M and an output (response) variable matrix Y ∈ 9 N × K that are both mean-centered and scaled by the standard deviation, where the indexes N, M, and K represent the numbers of samples, predictors, and response variables, respectively. In PLS, the scaled matrices X and Y are

N

L (ω , b , ξ , α ) = J (ω , ξ ) −

∑ αi[ωTφ(ti) + b + ξi − ui] i=1

(6)

where α = [α1, α2, ..., αN] is the introduced Lagrangian multiplier vector. According to the Karush−Khun−Tucker (KKT) optimal conditions, the solution can be given by the following set of linear equations after eliminating ω and ξ T

16093

dx.doi.org/10.1021/ie3005379 | Ind. Eng. Chem. Res. 2012, 51, 16092−16100

Industrial & Engineering Chemistry Research

Article

Figure 1. Structure of the EBLSSVMPLS method.

⎡0 1 1 ⎢ K (t1 , t 2) ⎢ 1 K (t1 , t1) + 1/c ⎢ K (t 2 , t1) K (t 2 , t 2) + 1/c ⎢1 ⎢⋮ ⋮ ⋮ ⎢ ⎢⎣ 1 K (tN , t1) K (tN , t 2) ⎡b⎤ ⎡0⎤ ⎢α ⎥ ⎢u ⎥ ⎢ 1⎥ ⎢ 1⎥ ⎢ α2 ⎥ = ⎢ u 2 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢⋮⎥ ⎢⋮⎥ ⎢⎣ αN ⎥⎦ ⎢⎣ uN ⎥⎦

··· ··· ··· ⋱ ···

⎤ ⎥ K (t1 , tN ) ⎥ ⎥ K (t 2 , tN ) ⎥ ⎥ ⋮ ⎥ K (tN , tN ) + 1/c ⎥⎦ 1

(7)

which can also be expressed in a form of matrix for simplicity T ⎡ ⎤ ⇀ 1 ⎢0 ⎥⎡ b ⎤ = ⎡ 0 ⎤ ⎢⇀ ⎥⎢⎣ ⎥⎦ ⎢⎣ u ⎥⎦ ⎣ 1 Ω + 1/c·I ⎦ α

(8)

1 = [1, 1, ..., 1]TN×1, I is an N-by-N identity matrix, Ω = where ⇀ {Ωkl| k, l = 1, 2, ..., N}, Ωkl = φ(tk)T φ(tl) = K(tk,tl), and K(·,·) is a symmetric kernel function satisfying Mercer’s conditions. Then, one can easily obtain the solution of eq 8 as ⎧α = ⎪ ⎪ ⎨ ⎪b = ⎪ ⎩

(Ω + 1/c·I)−1(u − b· ⇀ 1) T ⇀ 1 (Ω + 1/c·I)−1u T ⇀ 1 (Ω + 1/c·I)−1 ⇀ 1

i=1

(10)

In this article, the radial basis function (RBF) was selected as the kernel function because the RBF kernel tends to exhibit good performance under general assumptions.28 The RBF kernel function is defined as K (t , ti) = exp( −|| t − ti ||2 /σ 2)

∑ αi exp(−|| t − ti ||2 /σ 2) + b i=1

(12)

In addition, the penalty factor c and kernel parameter σ should be determined in advance. Here, the optimal parameters (c, σ) are obtained from cross-validation by grid search.29 In k-fold cross-validation, the training samples are partitioned into k subsets of approximately equal size. Then, k − 1 of them are used to train the model, and the remaining one is tested. After repeating this step k times with each subset being tested in turn, an average of the testing errors is taken as the performance assessment. Pairs of exponentially growing sequences (c, σ) are tried by grid search, and the pair with the best evaluation is picked as the model parameters. 2.3. EBLSSVMPLS Algorithm. The new nonlinear PLS is implemented by combining the outer PLS framework with the inner LSSVM function mapping. The outer PLS projection is used as a dimension-reduction tool to remove collinearity, and the inner LSSVM function is used to fit the nonlinear relation in the projected latent space. The structure of the EBLSSVMPLS is illustrated in Figure 1. As shown in Figure 1, the input and output score vectors (t and u) are extracted through the outer framework, and then the relation between the score-vector pair is constructed with a single-in−single-out (SISO) LSSVM as follows (13)

where t = [t1 ··· tN]T, f(t) = [f(t1) ··· f(tN)]T, and f(tj) can be calculated using the inner LSSVM function (eq 12). In this method, the data are not used directly to train the LSSVM model, but are preprocessed and transformed by the outer PLS framework. As a result, the original multivariate regression task is decomposed into several univariate regression problems, which makes the LSSVM model simplified. Baffi et al.18 pointed out that the weight updating procedure could be performed when the function applied to fit the inner nonlinear relation between the latent variables pair was continuously differentiable with respect to the weight variable w. Obviously the RBF function, as the kernel of LSSVM, meets the given condition. The weight updating can be executed within the

N

∑ αiK (t , ti) + b

N

f (t ) =

u = f(t) + e

(9)

Finally, the expression of the inner LSSVM function can be written as f (t ) =

where σ is a tuning kernel parameter. Then, after substituting RBF kernel function into eq 10, the detailed expression of the LSSVM can be obtained as

(11) 16094

dx.doi.org/10.1021/ie3005379 | Ind. Eng. Chem. Res. 2012, 51, 16092−16100

Industrial & Engineering Chemistry Research

Article

NIPALS procedure in each iteration and replaces that of linear PLS.18,19 Considering eq 13, by using Newton−Raphson linearization technique for the inner nonlinear function, the output latent variable u can be approximated as u = f 0 + [∂f(t)/∂w T]w0 ·Δw

S4. Weight updating. S4.1. Calculate the incremental weight Δw: Δw = (ZTZ)−1ZTe. S4.2. Update the input weight w, and normalize w to unit length: w = w + Δw, w = w/||w||. S4.3. Calculate the new input score t: t = Xw. S5. Convergence checking. Check convergence on δ = ||t − told||/||t||; if δ > limit, return to S3, otherwise go to S6. S6. Output calculation. S6.1. Calculate the input loading p: pT = tTX/(tTt). S6.2. Calculate the new prediction of u: û = f(t). S6.3. Calculate input and output residual matrices: E = X − tpT and F = Y − ûqT. S6.4. Replace X and Y by E and F, respectively, and repeat S2−S6 until the required components are extracted or the results satisfy the precision requirement.

(14)

where f 0 = û =f(t) is the estimation of the output latent variable calculated by the LSSVM function and w0 represents the current value of w. In fact, the term of partial derivative can be expanded below: ∂f(t) ∂f(t) ∂t ∂f(t) ∂(Xw) ∂f(t) = · = · = ·X ∂w T ∂tT ∂w T ∂tT ∂w T ∂tT ⎡ ∂f (t1) ∂f (t1) ∂f (t1) ⎤ ··· ⎢ ⎥ ∂t 2 ∂tN ⎥ ⎢ ∂t1 ⎢ ⎥ ⎢ ∂f (t 2) ∂f (t 2) ··· ∂f (t 2) ⎥ ∂t 2 ∂tN ⎥ ·X = ⎢ ∂t1 ⎢ ⎥ ⋮ ⋱ ⋮ ⎥ ⎢ ⋮ ⎢ ⎥ ⎢ ∂f (tN ) ∂f (tN ) ··· ∂f (tN ) ⎥ ⎢⎣ ∂t1 ∂t 2 ∂tN ⎥⎦ = diag{f ′(t1) ··· f ′(tN )}·X

After the necessary latent variables have been calculated, the input loading vector p, weight w, estimated output score vector û, and output loading vector q can be stored in matrices denoted by P, W, Û , and Q, respectively. The prediction formulation can be written as ̂ T Ŷ = UQ (15)

where Û = [û1 ··· ûA], Q = [q1 ··· qA], A is the number of required latent variables, and Ŷ is the prediction value of the output. Then, for a new input data matrix Xt, the output Yt can be predicted using the EBLSSVMPLS prediction algorithm, detailed next. Algorithm 2: EBLSSVMPLS Prediction Algorithm. S1. Scale Xt by the mean and standard deviation of the training matrix X. S2. Calculate the new input score matrix T: T = XtW(PTW)−1, where T = [t1 ··· tA], W = [w1 ··· wA], and P = [p1 ··· pA] and the matrices W and P have been calculated by the training algorithm. S3. Calculate new estimated output score vectors and obtain the score prediction matrix Û (Û = [û1 ··· ûA]) using the trained LSSVM function: ûj = fj(tj), j = 1, ..., A. S4. Predict the center-scaled matrix Ŷ t: Ŷ t = Û QT. S5. Rescale Ŷ t by the mean and standard deviation of the training matrix Y.

where f′(t) is the derivative of the LSSVM function f(t) (see eq 12), that is f ′(t ) = −

2 σ2

N

∑ riαisi exp(−ri 2/σ 2) i=1

(16)

Here, ri = ||t − ti|| and si = sign(t − ti). From eqs 13 and 14, the error e can be written by defining ∂f(t)/∂wT as the matrix Z (17) e = u − û = Z ·Δw Then, the incremental weight Δw can be calculated by the least-squares method as

Δw = (ZTZ)−1ZTe

(19)

(18)

Then, the weight vector is updated by letting w = w + Δw. If the matrix Z is rank-deficient, then (ZTZ)−1 needs to be replaced by the Moore−Penrose pseudo inverse (ZTZ)+. The EBLSSVMPLS algorithm can be derived from a sequence of NIPALS steps and is detailed as follows. Algorithm 1: EBLSSVMPLS Training Algorithm. S1. Data preprocessing. Scale X and Y to be of zero mean and unit variance. S2. PLS parameter initialization. S2.1. Select the output score u. Take u from one of the columns of Y: u = Y(:,−). S2.2. Calculate the input weight w, and then normalize w to unit length: wT = uTX/(uTu), w = w/||w||. S2.3. Calculate the input score t: t = Xw. S3. Inner LSSVM mapping. S3.1. Find the inner LSSVM relation function f(·), which predicts the output score u with input score t, and then obtain the estimated value û: û = f(t). S3.2. Calculate the output loading q, and normalize q to unit length: qT = ûTY/(ûTû), q = q/||q||. S3.3. Calculate the new output score u: u = Yq.

3. SIMULATIONS 3.1. Benchmark Data Set Test. In this section, the experimental evaluation of the new model is performed on a benchmark data set, namely, a servo data set,30 which was obtained from the UC Irvine Machine Learning Repository databases (http://archive.ics.uci.edu/ml/data sets/servo). The servo data set consists of 167 observations with the task of predicting the rise time of a servo mechanism in terms of two continuous gain settings and two discrete mechanical linkages. The data set refers to highly nonlinear phenomena and has been widely used as a benchmark to test modeling algorithms.31,32 All of the data were examined, and two abnormal cases were eliminated because they were identified as outliers. Then, 130 samples were used for the training data set, and the remaining 35 samples were used for the testing data set in our experiment. 16095

dx.doi.org/10.1021/ie3005379 | Ind. Eng. Chem. Res. 2012, 51, 16092−16100

Industrial & Engineering Chemistry Research

Article

it is also shown that the EBLSSVMPLS method does a much better job than the EBRBFPLS method because there is some small oscillation in the neural network fitting process (see Figure 2c), which means that the overfitting phenomenon exists. However, the LSSVMPLS method without weight updating reaches only 75.01%, which indicates that the updating procedure is beneficial to enhancing the accuracy. This conclusion can be reinforced by comparing panels d and e of Figure 2. In addition, the performance of the five methods is reflected not only in the values of the variance captured but also in the values of RMSET and RMSEP. As reported in Table 1, for the LPLS method, the RMSE values for the training and testing data sets are 0.9535 and 1.1034, respectively. The EBQPLS method shows a clear improvement, with RMSET = 0.5593 and RMSEP = 0.6002. For EBRBFPLS and EBLSSVMPLS, the values of RMSET and RMSEP are much smaller than the values for the other three methods. Moreover, the EBLSSVMPLS method shows the best prediction ability with the smallest value of RMSEP = 0.4107. In contrast, the LSSVMPLS method exhibits much worse approximation capabilities, with values of RMSEP = 0.6084 and RMSET = 0.4794. This once again confirms that the weight-updating procedure is helpful in improving the accuracy.

To obtain a convincing conclusion, the reference data set was processed by five PLS methods: linear PLS (LPLS), error-based quadratic PLS (EBQPLS), error-based RBF neural network PLS (EBRBFPLS), LSSVMPLS, and EBLSSVMPLS (proposed in this article). All five methods use the same outer PLS framework, but each one uses a different inner mapping function. The prefix “EB” indicates “error-based”, which means that an error-based weight-updating procedure was also added to the algorithm. LPLS is the original linear partial least-squares method. EBQPLS and EBRBFPLS apply a quadratic function and a three-layer RBF neural network, respectively, to model the inner relation with the weight-updating procedure. The RBF neural network was built with Gaussian activation functions and trained based on the neural network toolbox V7.0 available in MATLAB 2011.33 LSSVMPLS and EBLSSVMPLS both use LSSVM as the inner mapping function but differ in that the LSSVMPLS algorithm is operated without weight updating. 3.2. Results and Discussion. Two criteria are often used to evaluate the performance of different PLS models.23 The first is the root-mean-square error (RMSE), given by N

RMSE =

∑ (yi ̂ − yi )2 /N

(20)

i=1

4. INDUSTRIAL APPLICATION: NOX EMISSIONS MODELING FOR A COAL-FIRED BOILER Nitrogen oxide (NOx) emissions from coal-fired power plants are currently a worldwide concern and present a significant challenge in terms of environmental protection. Low NOx combustion optimization, which achieves optimum conditions for the combustion process by tuning operating parameters, has been considered to be an effective way to reduce NOx emissions.35 A reliable and accurate characteristic model for NOx emissions should be established before performing such an optimization on a coal-fired boiler. However, NOx emission is a high-dimensionality process that always exhibits characteristics of nonlinearity and correlations between boiler operating parameters. Therefore, adequate theoretical knowledge does not exist to build an explicit mathematical model for such a complex system.36 In this section, the EBLSSVMPLS method proposed in this article is used to model and predict NOx emissions from a coalfired boiler. The experiments were carried out on a forcedcirculation boiler with a capacity of 1099.3 t/h that was manufactured by Alstom Power Boilers Ltd. The boiler has a double-arch furnace with dimensions of 17.34 × 17.68 m2 in the lower section and 17.34 × 9.08 m2 in the upper section and a height of 51.98 m. Thirty-six direct-current burners are located on the front and back of the arch, and a W-shaped flame is formed at the center of the furnace. The furnace structure diagram is illustrated in Figure 3. Anthracite coal from Songzao Mining was burned in this boiler. The quality and property of the combusted coal were as follows: The contents of ash, volatiles, and moisture were 30.45, 14.25, and 6.88 wt %, respectively. The heating value was 21.60 MJ/kg. A series of field tests were performed at typical loads of around 350, 300, and 250 MW. The data were recorded after the operating state remained steady for 25 min, and a total of 104 test cases were performed. Many more details about the test process can be found in ref 37. Based on the theoretical mechanisms of NOx formation, 10 parameters affecting the emissions were taken as the input variables of the prediction model: the boiler load, the

where yi is the measured value, ŷi is the corresponding predicted value, and N is the number of samples. RMSET and RMSEP represent the RMSE values for the training and testing data sets, respectively. The second criterion is the variance in the input and output variables captured by the latent variables (R2X and R2Y),34 which are defined as R2X = 1 − SS(E)/SS(X)

R2Y = 1 − SS(F)/SS(Y) (21)

where SS(·) represents the sum of squares and E and F denote the residual matrices of X and Y, respectively. In this article, the performance of each model was evaluated in terms of both of these criteria. The comparison results are given in Table 1. The score scatter plot for the first latentvariable pair (t1 and u1) of each model is shown in Figure 2. Table 1. Comparison of Different Modeling Methods for Servo Data Set method

A

R2X (%)

R2Y (%)

RMSET

RMSEP

LPLS EBQPLS EBRBFPLS LSSVMPLS EBLSSVMPLS

3 3 2 3 2

73.62 78.85 63.97 73.64 61.66

50.28 82.63 95.05 75.01 94.93

0.9535 0.5593 0.2840 0.6084 0.2890

1.1034 0.6002 0.5623 0.4794 0.4107

As shown in Figure 2a, the linear PLS method is unable to describe the nonlinear relation between score vectors t1 and u1 and can interpret no more than 50.28% (see Table 1) of the variance of the response. EBQPLS performs better than the LPLS method, explaining 82.63% of the variance, and it can also be seen in Figure 2b that EBQPLS can capture the nonlinearity to some extent. As revealed in Figure 2c,e, the EBRBFPLS and EBLSSVMPLS methods are capable of approximating the nonlinear mapping appropriately. This is because the neural network and LSSVM function both have strong abilities to fit complicated nonlinear relations. The percentage of the variance explained reaches 95.05% and 94.93%, respectively. Moreover, 16096

dx.doi.org/10.1021/ie3005379 | Ind. Eng. Chem. Res. 2012, 51, 16092−16100

Industrial & Engineering Chemistry Research

Article

Figure 2. Score scatter plot for the first latent-variable pair: (a) LPLS, (b) EBQPLS, (c) EBRBFPLS, (d) LSSVMPLS, (e) EBLSSVMPLS.

flue gas. The NOx emissions were calibrated to units of milligrams per cubic meter at a base of 6% O2. The total of 104 cases was split into two parts: one set of 90 samples for model training and the other set of 14 samples for model testing. The simulation results are reported in Table 2. Table 2. Comparison of Different Modeling Methods for NOx Emissions method

RMSET

RMSEP

LPLS EBQPLS EBRBFPLS LSSVMPLS EBLSSVMPLS

66.5621 37.9136 11.9677 59.7543 12.6632

63.5665 63.5046 71.6188 68.5426 37.6609

Similarly to the preceding case, the LPLS, EBQPLS, EBRBFPLS, and LSSVMPLS methods were also applied for comparison. Here, we consider only the criterion of RMSE values, and in fact, the conclusions drawn from the variancecaptured criterion are essentially the same. The conclusions drawn from the results are similar to those from the servo case. Not surprisingly, the LPLS method shows the worst estimation and prediction abilities, with values of RMSET = 66.5621 and RMSEP = 63.5665. EBQPLS provides a

Figure 3. Furnace structure diagram of the investigated boiler.

calorific value of the coal, the volatile matter value of the coal, the primary air pressure value, three opening values of the secondary air dampers, two (up and down) opening values of the tertiary air dampers, and the oxygen content in the outlet 16097

dx.doi.org/10.1021/ie3005379 | Ind. Eng. Chem. Res. 2012, 51, 16092−16100

Industrial & Engineering Chemistry Research

Article

Figure 4. Comparison of score scatter plots for the first latent-variable pair between the (a) EBRBFPLS and (b) EBLSSVMPLS methods.

Figure 5. Estimation errors of NOx emissions between the predicted and actual values: (a) training and (b) testing data sets.

data set using the five modeling methods. It is clear that the prediction results of the LPLS model are not satisfactory because the estimation errors vary in a large range of [−140, 200]. Therefore, it can be concluded that the LPLS model does not fit the data adequately. EBQPLS and LSSVMPLS predict the actual value with much better accuracy and also smaller estimation errors than the LPLS method. The performance of EBQPLS is also superior to that of the LSSVMPLS model. In addition, EBRBFPLS and EBLSSVMPLS exhibit better approximation performance and obtain satisfactory accuracy with the smallest estimation errors. The predictions of NOx emissions of the five models for the testing data set are shown in Figure 5b, from which one can see that EBLSSVMPLS gives the best prediction results and has estimation errors that are distributed more closely to the zero line on the plot as compared to the other methods. LPLS, EBQPLS, and LSSVMPLS achieve relatively good prediction performances. In addition, the estimation errors of the EBRBFPLS model fall into a large range from −100 to 150. That is, EBRBFPLS exhibits a poor ability to predict the testing samples even though it gives perfect approximation accuracy for the training data set.

slight improvement, but it still appears to have insufficient ability to deal with the nonlinearity. Simultaneously, it can be observed that EBLSSVMPLS outperformed the other four models, which is particularly apparent for the testing data set. Although the EBRBFPLS method can achieve the same training accuracy, its prediction ability is much weaker. LSSVMPLS provided even better predictions than the EBRBFPLS model, with RMSEP = 68.5426, but its approximation capability is poor, giving RMSET is 59.7543, which is larger than the values for EBQPLS (RMSET =37.9136), EBRBFPLS (RMSET =11.9677), and EBLSSVMPLS (RMSET =12.6632). Especially important to mention is that the prediction error of EBRBFPLS reaches as high as 71.6188. That is, the EBRBFPLS method exhibits poor generalization capacity and is subjected to heavy overfitting, which is much more evident in the comparison of the score scatter plots in Figure 4. It is shown that there exists serious oscillation when the neural network (Figure 4a) is used to fit the score vectors, whereas this phenomenon never occurs when using the LSSVM function (Figure 4b). Figure 5a shows a detailed profile of the errors between the actual and predicted values of NOx emissions for the training 16098

dx.doi.org/10.1021/ie3005379 | Ind. Eng. Chem. Res. 2012, 51, 16092−16100

Industrial & Engineering Chemistry Research

Article

5. CONCLUSIONS AND FURTHER DISCUSSION In this article, a novel nonlinear PLS method has been proposed for industrial process modeling. The basic idea of the method is to use the conventional PLS framework for outer projection and apply the LSSVM function to map the inner nonlinear relation. An errorbased weight-updating procedure was found to make the modeling results more accurate. Training and predicting algorithms based on modified NIPALS steps with the weight-updating procedure have also been detailed. The performance of the proposed method was evaluated and compared against that of other wellknown partial-least-squares methods on a benchmark data set. The results showed that, of the five methods, EBLSSVMPLS gave the best performance in terms of reducing the estimation errors and predicting unknown observations. Then, the proposed approach was applied to establish a NOx emission model for a coal-fired boiler. The RMSE values for the training and testing data sets decreased to 12.6632 and 37.6609, respectively. Compared with the LPLS (RMSET = 66.5621, RMSEP = 63.5665), EBQPLS (RMSET = 37.9136, RMSEP = 63.5046), EBRBFPLS (RMSET = 11.9677, RMSEP = 71.6188), and LSSVMPLS (RMSET = 59.7543, RMSEP = 68.5426) methods, reductions in the prediction errors of approximately 40.8%, 40.7%, 47.4%, and 45.1%, respectively, were achieved. The comparison demonstrates that the new EBLSSVMPLS model provided comparatively high estimation accuracy and, even more, improved the prediction performance significantly. However, some factors should be noticed when using this type of data-driven method to model industrial process, one of which is the source of the data set. For example, the data used to predict the NOx emissions in this article were collected by experiments carried out on an operating boiler, which is timeconsuming and costly. In addition, we did not provide a timeconsumption analysis for each algorithm, because the field experimental data set determined that the process model was established offline. Finally, as Baffi et al.18 pointed out, the need to use a pseudoinverse is still a potential drawback, and a more accurate result would be obtained if the calculation of matrix inversion could be avoided.



M = number of predictors N = number of samples P = input loading matrix p = input loading vector pj = jth input loading vector Q = output loading matrix q = output loading vector qj = jth output loading vector T = input score matrix t = input score vector tj = jth input score vector U = output score matrix Û = prediction of output score matrix u = output score vector uj = jth output score vector û = estimated output score vector ûj = jth estimated output score vector W = weight matrix w = weight vector wj = jth weight vector X = input data matrix Y = output data matrix ⇀ 1 = all-1 column vector Greek Symbols



α = Lagrangian multiplier vector ξ = error vector of LSSVM Ω = kernel matrix σ = kernel parameter ω = weight vector of LSSVM

REFERENCES

(1) Ge, Z.; Song, Z. Nonlinear Soft Sensor Development Based on Relevance Vector Machine. Ind. Eng. Chem. Res. 2010, 49 (18), 8685− 8693. (2) Doymaz, F.; Palazoglu, A.; Romagnoli, J. A. Orthogonal Nonlinear Partial Least-Squares Regression. Ind. Eng. Chem. Res. 2003, 42 (23), 5836−5849. (3) Zhang, X.; Yan, W.; Shao, H. Nonlinear Multivariate Quality Estimation and Prediction Based on Kernel Partial Least Squares. Ind. Eng. Chem. Res. 2008, 47 (4), 1120−1131. (4) Zhou, H.; Zhao, J. P.; Zheng, L. G.; Wang, C. L.; Cen, K. F. Modeling NOx emissions from coal-fired utility boilers using support vector regression with ant colony optimization. Eng. Appl. Artif. Intel. 2012, 25 (1), 147−158. (5) Khatibisepehr, S.; Huang, B. Dealing with Irregular Data in Soft Sensors: Bayesian Method and Comparative Study. Ind. Eng. Chem. Res. 2008, 47 (22), 8713−8723. (6) Liu, J.; Chen, D.-S.; Shen, J.-F. Development of Self-Validating Soft Sensors Using Fast Moving Window Partial Least Squares. Ind. Eng. Chem. Res. 2010, 49 (22), 11530−11546. (7) Kim, K.; Lee, J. M.; Lee, I. B. A novel multivariate regression approach based on kernel partial least squares with orthogonal signal correction. Chemom. Intell. Lab. Syst. 2005, 79 (1−2), 22−30. (8) Liu, J. On-line soft sensor for polyethylene process with multiple production grades. Control Eng. Pract. 2007, 15 (7), 769−778. (9) Singh, K. P.; Ojha, P.; Malik, A.; Jain, G. Partial least squares and artificial neural networks modeling for predicting chlorophenol removal from aqueous solution. Chemom. Intell. Lab. Syst. 2009, 99 (2), 150−160. (10) Seggiani, M.; Pannocchia, G. Prediction of Coal Ash Thermal Properties Using Partial Least-Squares Regression. Ind. Eng. Chem. Res. 2003, 42 (20), 4919−4926. (11) Ronen, D.; Sanders, C. F. W.; Tan, H. S.; Mort, P. R.; Doyle, F. J. Predictive Dynamic Modeling of Key Process Variables in

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This project was funded by the National Basic Research Program of China (“973” Project) (No. 2012CB215203), the National Natural Science Foundation of China (No. 60774033), and the Fundamental Research Funds for the Central Universities (12QX15).



NOTATION A = number of latent variables b = bias constant c = penalty parameter E = input residual error matrix e = residual error vector ej = jth residual error vector F = output residual error matrix I = identity matrix K = number of response variables 16099

dx.doi.org/10.1021/ie3005379 | Ind. Eng. Chem. Res. 2012, 51, 16092−16100

Industrial & Engineering Chemistry Research

Article

Granulation Processes Using Partial Least Squares Approach. Ind. Eng. Chem. Res. 2011, 50 (3), 1419−1426. (12) Rosipal, R.; Trejo, L. J. Kernel partial least squares regression in reproducing kernel hilbert space. J. Mach. Learn. Res. 2002, 2, 97−123. (13) Zhang, Y.; Zhang, Y. Complex process monitoring using modified partial least squares method of independent component regression. Chemom. Intell. Lab. Syst. 2009, 98 (2), 143−148. (14) Woo, S. H.; Jeon, C. O.; Yun, Y. S.; Choi, H.; Lee, C. S.; Lee, D. S. On-line estimation of key process variables based on kernel partial least squares in an industrial cokes wastewater treatment plant. J. Hazard. Mater. 2009, 161 (1), 538−544. (15) Wold, S.; Kettaneh-Wold, N.; Skagerberg, B. Nonlinear PLS modeling. Chemom. Intell. Lab. Syst. 1989, 7 (1−2), 53−65. (16) Qin, S. J.; McAvoy, T. J. Nonlinear PLS modeling using neural networks. Comput. Chem. Eng. 1992, 16 (4), 379−391. (17) Wilson, D. J. H.; Irwin, G. W.; Lightbody, G. Nonlinear PLS modelling using radial basis functions. In Proceedings of the American Control Conference; IEEE Press: Piscataway, NJ, 1997; pp 3275−3276. (18) Baffi, G.; Martin, E. B.; Morris, A. J. Non-linear projection to latent structures revisited: the quadratic PLS algorithm. Comput. Chem. Eng. 1999a, 23 (3), 395−411. (19) Baffi, G.; Martin, E. B.; Morris, A. J. Non-linear projection to latent structures revisited (the neural network PLS algorithm). Comput. Chem. Eng. 1999b, 23 (9), 1293−1307. (20) Li, B.; Hassel, P. A.; Morris, A. J.; Martin, E. B. A non-linear nested partial least-squares algorithm. Comput. Stat. Data Anal. 2005, 48 (1), 87−101. (21) Bang, Y. H.; Yoo, C. K.; Lee, I.-B. Nonlinear PLS modeling with fuzzy inference system. Chemom. Intell. Lab. Syst. 2002, 64 (2), 137− 155. (22) Kruger, U.; Zhou, Y.; Wang, X.; Rooney, D.; Thompson, J. Robust partial least squares regression: Part I, Algorithmic developments. J. Chemom. 2008, 22 (1), 1−13. (23) Abdel-Rahman, A. I.; Lim, G. J. A nonlinear partial least squares algorithm using quadratic fuzzy inference system. J. Chemom. 2009, 23 (10), 530−537. (24) Vapnik, V. Statistical Learning Theory; Wiley-Interscience: New York, 1998. (25) Suykens, J. A. K.; Vandewalle, J. Least squares support vector machine classifiers. Neural Process. Lett. 1999, 9 (3), 293−300. (26) Suykens, J. A. K.; Van Gestel, T.; De Brabanter, J.; De Moor, B.; Vandewalle, J. Least Squares Support Vector Machines; World Scientific: Singapore, 2002. (27) Geladi, P.; Kowalski, B. R. Partial least-squares regression: A tutorial. Anal. Chim. Acta 1986, 185, 1−17. (28) Keerthi, S. S.; Lin, C. J. Asymptotic behaviors of support vector machines with Gaussian kernel. Neural Comput. 2003, 15 (7), 1667− 1689. (29) van Gestel, T.; Suykens, J.; Baesens, B.; Viaene, S.; Vanthienen, J.; Dedene, G.; de Moor, B.; Vandewalle, J. Benchmarking Least Squares Support Vector Machine Classifiers. Mach. Learn. 2004, 54 (1), 5−32. (30) Quinlan, J. R. Combining instance-based and model-based learning. In Proceedings of the 10th International Machine Learning Conference; Morgan Kaufmann Publishers: Burlington, MA, USA, 1993; pp 236−243. (31) Merz, C. J.; Pazzani, M. J. A principal components approach to combining regression estimates. Mach. Learn. 1999, 36 (1), 9−32. (32) Wu, J. X.; Zhou, Z. H.; Chen, Z. Q. Ensemble of GA based selective neural network ensembles. In Proceedings of the 8th International Conference on Neural Information Processing; Fudan University Press: Shanghai, China, 2001; pp 1477−1482. (33) Beale, M. H.; Hagan, M. T.; Demuth, H. B. Neural Network Toolbox 7 User’S Guide; The MathWorks, Inc.: Natick, MA, 2011. (34) Wold, S.; Trygg, J.; Berglund, A.; Antti, H. Some recent developments in PLS modeling. Chemom. Intell. Lab. Syst. 2001, 58 (2), 131−150.

(35) Zhou, H.; Zheng, L.; Cen, K. Computational intelligence approach for NOx emissions minimization in a coal-fired utility boiler. Energy Convers. Manage. 2010, 51 (3), 580−586. (36) Zheng, L.-G.; Zhou, H.; Cen, K.-F.; Wang, C.-L. A comparative study of optimization algorithms for low NO x combustion modification at a coal-fired utility boiler. Expert Syst. Appl. 2009, 36 (2), 2780−2793. (37) Gu, Y.; Zhao, W.; Wu, Z. Combustion optimization for utility boiler based on least square-support vector machine. Proc. CSEE 2010, 30 (17), 91−97.

16100

dx.doi.org/10.1021/ie3005379 | Ind. Eng. Chem. Res. 2012, 51, 16092−16100