Space-Separation-Based SVM Modeling for ... - ACS Publications

Nov 19, 2010 - School of Mechanical Engineering, Shanghai Jiao Tong University, ... a time/space-separation-based support-vector-machine (SVM) model ...
0 downloads 0 Views 3MB Size
332

Ind. Eng. Chem. Res. 2011, 50, 332–341

Time/Space-Separation-Based SVM Modeling for Nonlinear Distributed Parameter Processes Chenkun Qi,*,† Han-Xiong Li,‡ Xianxia Zhang,§ Xianchao Zhao,† Shaoyuan Li,| and Feng Gao† School of Mechanical Engineering, Shanghai Jiao Tong UniVersity, State Key Laboratory of Mechanical System and Vibration, Shanghai 200240, China, and Department of Manufacturing Engineering & Engineering Management, City UniVersity of Hong Kong, Hong Kong, China, Shanghai Key Laboratory of Power Station Automation Technology, School of Mechatronics and Automation, Shanghai UniVersity, Shanghai 200072, China, and Department of Automation, Shanghai Jiao Tong UniVersity, Shanghai 200240, China

Modeling of distributed parameter systems (DPSs) is difficult because of their infinite-dimensional spatiotemporal nature and complex nonlinearities. Data-based modeling is necessary because there are usually some unknown uncertainties in first-principles modeling. In practice, a low-dimensional spatiotemporal model is often required for real-time implementations. In this study, a time/space-separation-based support-vectormachine (SVM) model identification approach is proposed for unknown nonlinear DPSs. The spatiotemporal output of the system is measured at a finite number of spatial locations, and for easy implementation, the input is assumed to be a finite-dimensional temporal variable. First, Karhunen-Loe`ve (KL) decomposition is used for time/space separation and dimension reduction. Subsequently, the spatiotemporal output is expanded onto a low-dimensional Karhunen-Loe`ve space with temporal coefficients. Finally, the least-squares supportvector-machine (LS-SVM) approach is used to model the system dynamics in a low-dimensional temporal domain. After the time/space synthesis, the nonlinear spatiotemporal dynamics can be reconstructed. Simulations are presented to demonstrate the effectiveness of this spatiotemporal modeling method. 1. Introduction Most physical and chemical processes (e.g., fluid processes, thermal processes, and transport-reaction processes) are distributed parameter systems (DPSs) because of their nonuniformly distributed dynamics in space. Their infinite-dimensional spatiotemporal coupling and complex nonlinear behavior make system analysis, modeling, and control very difficult. In practice, a finite-dimensional or low-dimensional spatiotemporal model is often required for real-time implementations. When the firstprinciples-based partial-differential-equation (PDE) model is known, there are many approaches to model reduction and control problems.1-7 The most popular model reduction approaches are finite-difference and finite-element approaches, the Galerkin method, and the approximate inertial manifold method.3,8 However, the PDE model is often unknown in many situations because of incomplete process knowledge; thus, data-based spatiotemporal modeling from the input and output data has to be used. The parameter estimation problem for a known PDE structure9-11 is not considered here. Recently, the identification of PDE models has been studied.12-14 However, after the PDE model has been recovered from the spatiotemporal data, model reduction methods are still needed for practical applications. Traditional system identification techniques for lumped parameter systems (LPSs)15 do not consider the spatial information. To model the spatiotemporal dynamics, spatial information processing is an important problem. At present, finitedimensional spatiotemporal model identification can be classified into local and global approaches. * To whom correspondence should be addressed. E-mail: chenkqi@ sjtu.edu.cn. † School of Mechanical Engineering, Shanghai Jiao Tong University. ‡ Department of Manufacturing Engineering & Engineering Management, City University of Hong Kong. § School of Mechatronics and Automation, Shanghai University. | Department of Automation, Shanghai Jiao Tong University.

The local approach assumes that the local dynamics are unchanged at different spatial locations and are determined by the neighborhood of the identified spatial location. Utilizing the measurements in small spatiotemporal regions, local models can be established based on the identification theory of lattice dynamical system.16-21 To achieve significant spatiotemporal prediction capability, the neighborhood regions should be selected properly. It is often a difficult task to determine the proper neighborhood, although various heuristic or prespecified approaches are available, in addition to theoretical analysis.21 The local model at each spatial location might be simple, but the global model in the whole spatial domain is of high dimension because it is determined by the number of spatial locations, which is often large. The idea of the global method22-24 comes from Fourier series expansion. A spatiotemporal variable (e.g., temperature or concentration field) can be expressed by an infinite number of ∞ ∞ : y(x,t) ) ∑i)1 ψi(x) yi(t), where ψi(x) basis functions {ψi(x)}i)1 (i ) 1, ..., ∞) represents the spatial frequencies from low to high order and yi(t) (i ) 1, ..., ∞) denotes the corresponding temporal coefficients (states). For many industrial processes, the first finite terms often provide a good approximation because of their slow/fast separation properties. Thus, this approach will result in a finite-dimensional temporal coefficient model. This study focuses on the global method. Generally speaking, this class of modeling approach follows three steps: selection of basis functions, estimation of state variables, and identification of the input-state dynamic model. Modeling accuracy and efficiency is highly dependent on the choice of basis functions such as finite-element bases,22 Fourier series,24 Legendre polynomials, Jacobi polynomials, and Chebyshev polynomials.25 In particular, Karhunen-Loe`ve (KL) decomposition, also called proper orthogonal decomposition and principal component analysis,3-7,26 is a popular approach to finding principal spatial structures and reducing the dimension of the data. Among all linear expansions, KL expansion is the

10.1021/ie1002075  2011 American Chemical Society Published on Web 11/19/2010

Ind. Eng. Chem. Res., Vol. 50, No. 1, 2011

most efficient in the sense that, for a given approximation error, the number of KL bases required is minimal. As a result, KL decomposition can help to reduce the model dimension and the number of estimated parameters. Once the spatial basis functions are designed properly, the corresponding states can be determined by projecting the spatiotemporal data onto these spatial basis functions. To model the input-state dynamics, many traditional approaches, such as the nonlinear state-space model,23,24 the nonlinear autoregressive with exogenous input (NARX) model,22,27 the Volterra model,28 the Wiener model,29 and the Hammerstein models,30,31 have been studied. In the modeling, the unknown nonlinearity is often approximated by polynomials or neural networks. However, the main problem in the previous methods is the difficulty in designing the proper model size and structure, which will significantly affect the modeling performance. Trial-and-error methods for selecting the model order27,28,31 are very timeconsuming. In addition, parameter learning might require complex nonlinear optimization, which could lead to a local minimum problem. In some works, the model size, structure, and parameters can be determined by an algorithm called orthogonal least-squares (OLS); however, the local spline basis functions used by Coca and Billings22 lead to a high-dimensional model, and the Hammerstein model developed by Qi and Li30 is only for distributed Hammerstein systems, which limits its application to other DPSs. The support-vector-machine (SVM)32-35 and least-squares support-vector-machine (LS-SVM)36-41 models are widely used efficient approaches to nonlinear dynamical modeling problems because the weak points of local minimum solutions and selection of the number of hidden neurons existing in neural networks can be removed. However, until now, very little research has been reported on the application of the least-squares support-vector-machine approach to distributed parameter processes because SVM and LS-SVM do not have spatial information processing capabilities. Recently, a local modeling of spatiotemporal series was proposed.19 Here, we systematically discuss the global modeling of DPSs using the LS-SVM approach. In this study, the Karhunen-Loe`ve decomposition-based least-squares support-vector-machine (KL-LS-SVM) modeling approach is developed for nonlinear distributed parameter processes. The estimation of the spatial basis functions and reduction of the problem dimension are implemented through the Karhunen-Loe`ve decomposition. To identify a nonlinear model from the low-dimensional temporal coefficients, leastsquares support vector machines are used. The model dimension is selected by an energy method in the KL decomposition, and the model size, structure, and parameters are determined by the LS-SVM method. The main advantage of the newly proposed approach is that the model size and structure can be effectively determined by SVM modeling. By avoiding the trial-and-error tuning used in the previous methods, this proposed method is the most efficient and convenient method in terms of applications. Moreover, the proposed method can achieve better modeling performance than previous methods introduced in refs 28, 30, and 31 because the NARX model used can handle more complex dynamics. To verify the proposed approach, comparisons with three typical methods [i.e., the spline-function-based LS-SVM (SPLS-SVM), KL-based OLS radial basis function (KL-OLS-RBF), and the KL-based Volterra (KL-VOL) models] are studied. Simulation examples can clearly demonstrate the better performance achieved by the proposed method. Owing to the

333

Figure 1. Time/space-separation-based SVM modeling methodology for nonlinear DPS.

Karhunen-Loe`ve decomposition, the dimension of the states can be lower than that obtained with other spatial-basis-functionbased models (e.g., the SP-LS-SVM model). The simulations also show that the KL-LS-SVM model can be more accurate than the KL-OLS-RBF and KL-VOL models developed by Li and Qi.28 Moreover, the computation time of the KL-LS-SVM model is less than those of both the SP-LS-SVM and KL-OLSRBF approaches. Although the KL-VOL model could be faster than the KL-LS-SVM model, the KL-VOL model will have reduced model accuracy. Although the method proposed by Qi and Li27 can model even more complex systems, its model structure design is too complex and suffers too much in terms of time efficiency. Considering model complexity and accuracy, the proposed method has better potential for industrial applications. This article is organized as follows: The modeling problem is described in section 2. In section 3, the Karhunen-Loe`ve decomposition is introduced. The least-squares support-vectormachine appraoch is presented in section 4. Section 5 contains an illustrative simulation example. Finally, a few conclusions are presented in section 6. 2. Problem Description Consider the nonlinear distributed parameter system in Figure 1, with u(t) ∈ Rm as the temporal input and y(x,t) ∈ R as the spatiotemporal output, where x ∈ Ω is the spatial variable, Ω is the spatial domain, and t is the time variable. Here, for simplicity, suppose that the system is controlled by m actuators with implemented temporal signal u(t) and a certain spatial distribution. The output is measured at the N spatial locations x1, ..., xN. Because of its infinite dimensionality, this distributed parameter system might require an infinite number of actuators and sensors over the whole space to implement perfect modeling and control. Because of practical limitations, limited numbers of actuators and sensors have to be used. The numbers of actuators and sensors depend on the process complexity, the desired accuracy of modeling and control, physical and cost considerations, and so on. The modeling problem is to identify a proper spatiotemporal L N,L model from the input {u(t)}t)1 and the output {y(xi,t)}i)1,t)1 , where L is the total time duration. As shown in Figure 1, the modeling methodology includes two stages. The first stage is the Karhunen-Loe`ve decomposition for the time/space separation. The second stage is the model identification (including the structure and parameters) using the LS-SVM approach. With the time/space synthesis, this model can reconstruct the original spatiotemporal dynamics of the system. 3. Karhunen-Loe`ve Decomposition N,L For simplicity, assume that the process output, {y(xi,t)}i)1,t)1 (called snapshots), is uniformly sampled in time and space. Define the inner product, norm, and ensemble average as

334

Ind. Eng. Chem. Res., Vol. 50, No. 1, 2011

[f(x),g(x)] ) ∫Ωf(x) g(x) dx, ||f(x)|| ) [f(x),f(x)]1/2, and 〈f(x,t)〉 ) L f(x,t) (1/L)∑t)1 Motivated by Fourier series, the spatiotemporal variable y(x,t) can be expanded onto an infinite number of orthonormal spatial ∞ ∞ with temporal coefficients {yi(t)}i)1 basis functions {ψi(x)}i)1 ∞

∑ ψ (x) y (t)

y(x, t) )

i

(1)

i

L

(7)

t

t)1

Substituting eq 7 into eq 6 gives the eigenvalue problem L

∫ ∑ 1 ΩL

i)1

L

y(x, t) y(ζ, t)

t)1



L

γky(ζ, k) dζ ) λ

k)1

∑ γ y(x, t) t

t)1

(8)

Because the spatial basis functions are orthonormal, that is, [ψi(x),ψj(x)] ) ∫Ωψi(x) ψj(x) dx ) 0 (i * j) or 1 (i ) j), the temporal coefficients can be computed from yi(t) ) [ψi(x), y(x, t)],

∑ γ y(x, t)

ψ(x) )

i ) 1, ..., ∞

By defining Ctk )

1 L

∫ y(ζ, t) y(ζ, k) dζ

(9)



(2)

In practice, the expression has to be truncated to a finite dimension

the N × N eigenvalue problem in eq 6 can be reduced to the L × L problem Cγ ) λγ

n

∑ ψ (x) y (t)

yn(x, t) )

i

(3)

i

i)1

where yn(x,t) denotes the nth-order approximation. The main problem of using Karhunen-Loe`ve decomposition for time/space separation is computing the most characteristic n among the spatiotemporal output spatial structure {ψi(x)}i)1 N,L {y(xi,t)}i)1,t)1. 3.1. Spatial Correlation Method. Finding the typical strucn ture {ψi(x)}i)1 can be performed by minimizing the objective function

The solution of this eigenvalue problem yields the eigenvector γ ) [γ1 · · · γL], which can be used in eq 7 to construct the eigenfunctions ψ(x). Because C is symmetric and positive semidefinite, the computed eigenfunctions are orthogonal. 3.3. Selection of Dimension n. The maximum number of nonzero eigenvalues is K ) min(N,L). We arrange the eigenvalues λ1 > λ2 > · · · > λK and the corresponding eigenfunctions ψ1(x), ψ2(x), ..., ψK(x), in order of the magnitude of the eigenvalues. In general, an expansion in terms of only the first few temporal coefficients n

yn(x, t) )

min 〈|y(x, t) - yn(x, t)| 2〉 subject to (ψi, ψi) )

∑ ψ (x) y (t) i

φi(x)

1, ψi ∈ L2(Ω), i ) 1, ..., n

(4)

The orthonormal constraint (ψi,ψi) ) 1 is imposed to ensure that the function ψi(x) is unique. The Lagrangian functional corresponding to this constrained optimization problem is n

J ) 〈|y(x, t) - yn(x, t)| 2〉 +

∑ λ [(ψ , ψ ) - 1] i

i

i

(5)

can be used to represent the dominant dynamics of many spatiotemporal systems. The eigenfunction that corresponds to the first eigenvalue is considered to be the most “energetic”. The total “energy” is defined as being the sum of the eigenvalues. To each eigenfunction, an energy percentage is assigned based on the associated eigenvalue

i)1

K

Ei ) λi /

and the necessary condition of the solution can be obtained as3-7,26,42

∫ R(x, ζ) ψ (ζ) dζ ) λ ψ (x), Ω

i

i

i

(10)

i

i)1

(ψi, ψi) ) 1, i ) 1, ..., n (6)

where R(x,ζ) ) 〈y(x,t) y(ζ,t)〉 is the spatial two-point correlation function, ψi(x) is the ith eigenfunction, and λi is the corresponding eigenvalue. Because R(x,ζ) is symmetric and positive semidefinite, the computed eigenfunctions are orthogonal. In practice, the data are always discrete in space, so one must solve the integral in eq 6 numerically. Discretizing the integral equation gives an N × N matrix eigenvalue problem. Thus, N eigenfunction estimates at N sampled spatial locations can be obtained. Then, one can interpolate the eigenfunctions to locations where the data are not available. 3.2. Temporal Correlation Method. When L is less than N, a computationally efficient way to obtain the solution of the above integral equation is provided by the method of snapshots.26 The requisite eigenfunction ψ(x) is expressed as a linear combination of snapshots as follows

∑λ

j

j)1

Usually, the number of eigenfunctions that captures 99% of the system energy is used to determine the value of n. It can be shown42 that, for some arbitrary set of basis functions {φi(x)}ni)1 n



〈[y( · , t), ψi]2〉 )

n



n

λ2i g

i)1

i)1

∑ 〈[y( · , t), φ ] 〉 2

i

i)1

where 〈f(t)〉

)

1 L

L

∑ f(t) t)1

This means that the Karhunen-Loe`ve expansion is optimal on average in the class of representations by linear combination. 4. Least-Squares Support Vector Machines Modeling To obtain the temporal coefficients from the spatiotemporal output, we define the orthogonal projection operator P as

[



y(t) ) Py( · , t) ) ψ,

Ind. Eng. Chem. Res., Vol. 50, No. 1, 2011

]



∑ ψ ( · ) y (t) i

i

i)1

)

∑ [ψ, ψ ( · )]y (t) i

(11)

where ψ ) [ψ1 · · · ψn]T. Because the basis functions ψi (i ) 1, ..., ∞) are orthonormal, the result of this projection is such that y(t) ) [y1(t) · · · yn(t)]T. In practice, y(t) can be computed from the pointwise data using spline integration. Suppose that the dynamics between u(t) and y(t) can be described by a nonlinear autoregressive with exogenous input (NARX) model y(t) ) F[y(t - 1), ..., y(t - ny), u(t - 1), ..., u(t - nu)]

(12) where u(t) ∈ Rm and y(t) ∈ Rn represent the temporal input and output at time t, respectively, and nu and ny are the maximum input and output lags, respectively. 4.1. Least-Squares Support Vector Machines. Before the dynamic modeling, we introduce the LS-SVM model for nonlinear function estimation first. The LS-SVM model for single-output function estimation is represented as37 y ) wTφ(z) + b

(13)

where z ∈ Rnz and y ∈ R, the nonlinear function φ( · ): Rnz f Rnh maps the input space to a higher-dimensional feature space Rnh (possibly an infinite-dimensional space), w ∈ Rnh is the weight vector, and b ∈ R is a bias term. L , the LS-SVM learning Given a set of training data {zk,yk}k)1 is to find the unknown function in eq 13 by solving the optimization problem L



1 γ min J(w, e) ) wTw + e2 w,b,e 2 2 k)1 k

(14)

subject to equality constraints yk ) wTφ(zk) + b + ek,

where

[

i

i)1

k ) 1, ..., L

A)

]

L(w, b, e;R) ) J(w, e) -

∑ R [w φ(z ) + b + e T

k

k

k

- y k]

R)

[] 0 Y

with Y ) [y1 · · · yL]T, 1L ) [1 · · · 1]T, and R ) [R1 · · · RL]. From application of the Mercer condition, one obtains Qkl ) φ(zk)T φ(zl) ) K(zk, zl),

k, l ) 1, ..., L

(18)

where K( · , · ) is a kernel function. Among the various kernels,33 the radial basis function (RBF) kernel K(zk,zl) ) exp(-||zk zl||22/σ2) is often used, where σ is the width. The resulting LS-SVM model for function estimation becomes LSV

∑ R K(z, z ) + b

y)

k

(19)

k

k)1

where Rk and b are the solution of the linear system in eq 17. LSV denotes the number of support vectors, and SV represents the set of indexes of the support vectors, which corresponds to the nonzero solution to eq 17. 4.2. Dynamic Modeling. The LS-SVM model for the estimation of the NARX model 12 has a similar representation y(t) ) wTφ[z(t)] + b

(20)

Note that the model in eq 20 has multiple outputs and can be equivalently expressed by the single-output form yi(t) ) wiTφ[z(t)] + bi,

i ) 1, ..., n

(21)

where z(t) ) [y(t - 1)T · · · y(t - ny)T u(t - 1)T · · · u(t - nu)T]T ∈ Rmnu + nny, y(t) ∈ Rn, φ( · ): Rmnu+nny f Rnh, w ) [w1 · · · wn]T ∈ Rn×nh, and b ) [b1 · · · bn]T ∈ Rn. L , the model in eq 20 can be Given a set of data {z(t),y(t)}t)1 found by solving the optimization problem 1 2

wi,bi,ei

L

[]

1LT 0 b , X) , R 1L Q + γ-1I

min J(wi, ei) )

where γ (>0) represents the regularization constants for the tradeoff between approximation accuracy and model complexity. The Lagrangian function can be constructed as

335

n

∑w

T

i

wi +

i)1

γi 2

n

L

∑ ∑ e (t)

2

i

(22)

i)1 t)2

subject to the equality constraints yi(t) ) wiTφ[z(t)] + bi + ei(t),

i ) 1, ..., n, t ) 1, ..., L

The Lagrangian function can be constructed as

k)1

(15) where Rk ∈ R represents Lagrange multipliers. The conditions for optimality are given by ∂L )0fw) ∂w

L

∑ R φ(z ) k

n

L

∑ ∑ R (t){w

L(wi, bi, ei ;Ri) ) J(wi, ei) -

i

T

i

i)1 t)2

φ[z(t)] + bi +

ei(t) - yi(t)} (23)

The conditions for optimality are given by

k

k)1

∂L ) 0 f wi ) ∂wi

L



∂L )0f Rk ) 0 ∂b k)1 ∂L ) 0 f Rk ) γek, k ) 1, ..., L ∂ek ∂L ) 0 f wTφ(zk) + b + ek - yk ) 0, ∂Rk

∑ R (t) φ[z(t)], i

i ) 1, ..., n

t)2

L



k ) 1, ..., L

(16) After elimination of w and e, the following linear system is obtained AX ) R

L

(17)

∂L )0f Ri(t) ) 0, i ) 1, ..., n ∂bi t)2 ∂L ) 0 f Ri(t) ) γiei(t), i ) 1, ..., n, t ) 1, ..., L ∂ei(t) ∂L ) 0 f wiTφ[z(t)] + bi + ei(t) - yi(t) ) 0, ∂Ri(t) i ) 1, ..., n, t ) 1, ..., L

(24) Now, the following solution is obtained

336

Ind. Eng. Chem. Res., Vol. 50, No. 1, 2011

[

A1

·

·· An

][ ] [ ] X1 R1 l ) l Xn Rn

(25)

where Ai )

[

]

[]

1LT bi 0 , Xi ) , Ri 1L Q + γi-1I

[]

Figure 2. Tubular reactor with a chemical reaction.

0 Ri ) Y i

with Yi ) [yi(1) · · · yi(L)]T, 1L ) [1 · · · 1]T, and Ri ) [Ri(1) · · · Ri(L)]. From the Mercer condition, one obtains Qtτ ) φ[z(t)]T φ[z(τ)] ) K[z(t), z(τ)],

t, τ ) 1, ..., L

(26) where K( · , · ) is a kernel function. The resulting LS-SVM model becomes LSV

y(t) )

∑ R(τ) K[z(t), z(τ)] + b

(27)

τ)1

where R(t) ) [R1(t) · · · Rn(t)]T and b are the solution of the linear system in eq 25. The LSV number of support vectors correspond to the nonzero solution to eq 25. After the LS-SVM model has been estimated, the results are used in simulation mode, where the future predictions are computed using past predictions LSV

yˆ(t) )

{ {

∂T ) Peh[Ti(t) - T(x, t)] ∂x x ) 0, ∂C ) Pem[Ci(t) - C(x, t)] ∂x -

∑ R(τ) K[zˆ(t), z(τ)] + b τ)1

where zˆ(t) ) [yˆ(t - 1)T · · · yˆ(t - ny)T u(t - 1)T · · · u(t - nu)T]T. With the time/space synthesis in eq 10, the spatiotemporal dynamics yˆn(x,t) can be recovered. Remark 1: Sparse LS-SVM by Pruning. Compared with the original SVM approach, the LS-SVM approach might not achieve a very sparse model because of the least-squares error in eq 14 [and also the fact that Ri(t) ) γiei(t)]. However, the sparseness of the LS-SVM model can be enhanced by using a pruning procedure.43 By sorting the support value spectrum |Ri(t)|, the most significant data for contribution to the LS-SVM model can be evaluated. Then, sparseness can be imposed by gradually omitting the least important data from the training set and retraining the LS-SVM model based on the reduced training set.

∂T )0 x ) 1, ∂x ∂C )0 ∂x The state variables, T and C are dimensionless temperature and concentration, respectively. The parameters Peh, Pem, Le, Da, γ, η, µ, Ti, and Ci denote the Peclet number (energy), Peclet number (mass), Lewis number, Damko¨hler number, activation energy, heat of reaction, heat-transfer coefficient, inlet temperature, and inlet concentration, respectively. This system of equations can be classified as nonself-adjoint, parabolic PDEs with Neumann boundary conditions. The two PDEs are coupled, and for practical reasons, only temperature measurements are available. The parameter values used in this study are as follows: Peh ) 5.0, Pem ) 5.0, Le ) 1.0, Da ) 0.875, γ ) 15.0, η ) 0.8375, µ ) 13, Ti ) 1, and Ci ) 1. The wall temperature, Tw, is the manipulated variable, for which there are three actuators distributed along the length of the reactor; that is, Tw(x,t) ) b(x) u(t), where the temporal input is u(t) ) [u1(t) · · · u3(t)]T and the spatial distribution is b(x) ) [b1(x) · · · b3(x)] with bi(x) ) H[x -(i - 1)π/3] - H(x - iπ/3), with H( · ) being the standard Heaviside function. There is no unique method for designing the input signals for nonlinear system modeling. In this case, the temporal input ui(t) ) 1.0 + 0.4 sin(-t/10 + i/10) (i ) 1, 2, 3) was used for exciting the system. This periodic signal, which depends on the spatial location and time instant, can excite nonlinear spatiotemporal dynamics. There are 61 temperature sensors located at different positions along the reactor length. A noise-free data set of 800 data points was collected from eq 28. The dimensionless sampling time ∆t was 0.0005, and the simulation period was 0.4. White noise bounded by 0.001 with 0 mean was added additively to the noise-free data to obtain the noisy output. This set of data was used as the training data,

5. Case Study The considered tubular reactor (see Figure 2) represents a typical convection-diffusion-reaction process in the chemical industry. A dimensionless model is provided to describe this nonlinear tubular chemical reactor with both diffusion and convection phenomena and a nonlinear heat generation term5,6 ∂T 1 ∂2T 1 ∂T + ) ∂t Peh ∂x2 Le ∂x 1 ηC exp γ 1 + µ[Tw(x, t) - T(x, t)] T ∂C 1 ∂2C ∂C 1 ) - Da · C exp γ 1 ∂t Pem ∂x2 ∂x T

[(

)]

[(

subject to the boundary conditions

(28)

)] Figure 3. Measured output for the training.

Ind. Eng. Chem. Res., Vol. 50, No. 1, 2011

337

Figure 4. Karhunen-Loe`ve basis functions (n ) 4). Figure 6. Temporal coefficient y4(t) on the training data.

Figure 5. Temporal coefficient y1(t) on the training data.

with the first 600 data points as the estimation data and the next 200 data points as the validation data. The validation data were used to monitor the training process and help to determine the parameters nu, ny, γi, and σ using the cross-validation method. To verify the model performance, a new set of 200 data points was collected as the testing data with the input ui(t) ) 1.0 + 0.3 sin(-t/9 + i/11). The measured temperature y(x,t) ) T(x,t) for training the model is shown in Figure 3 (first 200 samples). The first four Karhunen-Loe`ve basis functions as shown in Figure 4 were used for the modeling. Using cross-validation, the LS-SVM parameters were set to nu ) 2, ny ) 3, γi ) 1000, and σ ) 2.5. The measured and predicted temporal coefficients on the training data for y1(t) and y4(t) are plotted in Figures 5 and 6, respectively, where the dotted line corresponds the measured temporal coefficients and the solid line to the predicted ones. Very good agreement between the measured and predicted dynamics can be observed. The predicted temperature yˆn(x,t) and the prediction error e(x,t) ) y(x,t) - yˆn(x,t) of the KL-LS-SVM model for the training data are shown in Figures 7 and 8, respectively. The measured output for the testing, the predicted output, and the prediction error for the testing data are also shown in Figures 9-11, respectively (first 100 samples). Obviously the KL-LS-SVM model can satisfactorily approximate this spatiotemporal process. To provide a comparison with modeling using finite-element bases, an LS-SVM model with six third-order spline spatial basis functions (see Figure 12) (SP-LS-SVM) was identified. See refs

Figure 7. KL-LS-SVM model output on the training data.

Figure 8. Prediction error of the KL-LS-SVM model on the training data.

22 and 44 for details on the construction of spline functions. The performance indexes, namely, the relative L2-norm error RLNE(t) ) [∫e(x,t)2 dx]1/2/[∫y(x,t)2 dx]1/2 in Figure 13, the temporal normalized absolute error TNAE(x) ) ∑|e(x,t)|/∑∆t in Figures 14 and 15, and the root of the mean squared error RMSE )[∫∑e(x,t)2 dx/∫dx∑∆t]1/2 in Table 1, over the training and testing data set show that the modeling with KL basis functions is more accurate than the modeling even with a greater number of spline basis functions. This is because of the optimality of the KL basis functions.

338

Ind. Eng. Chem. Res., Vol. 50, No. 1, 2011

Figure 9. Measured output for the testing.

Figure 12. Spline basis functions (n ) 6).

Figure 10. KL-LS-SVM model output on the testing data. Figure 13. RLNE(t) (%) of the SP- and KL-LS-SVM models on the training data.

Figure 11. Prediction error of the KL-LS-SVM model on the testing data.

For comparison with the neural networks model, a KL-based radial basis function model with the orthogonal least-squares algorithm (KL-OLS-RBF) was also obtained (see the Appendix for a brief description). The performance indexes TNAE(x) in Figures 16 and 17 and RMSE in Table 1 over the training and testing data sets show that the KL-LS-SVM model performs even better than the KL-OLS-RBF model in this case. Comparison with the KL-based Volterra (KL-VOL) model in ref 28 was also studied. The performance indexes TNAE(x) in Figures 18 and 19 and RMSE in Table 1 over the training and testing data sets also show that the accuracy of the KLLS-SVM model is better than that of the KL-VOL model in

Figure 14. TNAE(x) of the SP- and KL-LS-SVM models on the training data.

this case. This is because the complexity of the KL-LS-SVM model is greater than that of the KL-VOL model, which will reduce the modeling error. As shown in Table 2, the computation time for the models was also compared. The computation environment in terms of hardware and software was 1.83 GHz Intel Centrino Duo CPU, 1 GB RAM, Windows XP, and Matlab 7.0. The KL-LS-SVM model estimation required an acceptable time. Because a higherdimensional model is required with local spline basis functions,

Ind. Eng. Chem. Res., Vol. 50, No. 1, 2011

Figure 15. TNAE(x) of the SP- and KL-LS-SVM models on the testing data.

339

Figure 18. TNAE(x) of the KL-LS-SVM and KL-VOL models on the training data.

Table 1. RMSEs of the KL-LS-SVM, SP-LS-SVM, and KL-OLS-RBF Models

training testing

KL-LS-SVM

SP-LS-SVM

KL-OLS-RBF

KL-VOL

0.0030451 0.0013632

0.015414 0.015814

0.010088 0.016026

0.0044646 0.0090447

the SP-LS-SVM model requires more modeling time than the KL-LS-SVM model. The results also show that the KL-LS-

Figure 19. TNAE(x) of the KL-LS-SVM and KL-VOL models on the testing data. Table 2. Comparison of Computation Times

Figure 16. TNAE(x) of the KL-LS-SVM and KL-OLS-RBF models on the training data.

model

computation time (s)

KL-LS-SVM SP-LS-SVM KL-OLS-RBF KL-VOL

142.75 174.47 505.06 48.453

SVM modeling is much faster than the KL-OLS-RBF modeling because of the computation efficiency of the LS-SVM algorithm. The KL-VOL modeling is fastest because the Volterra model is relatively simple but with a loss of accuracy. Considering the modeling accuracy and complexity, the proposed modeling approach in this study has better potential for industrial applications. 6. Conclusions

Figure 17. TNAE(x) of the KL-LS-SVM and KL-OLS-RBF models on the testing data.

The spatiotemporal modeling of distributed parameter systems under unknown circumstances is necessary for various applications such as control design. A data-based spatiotemporal modeling approach using input and output measurements was proposed herein. With the help of Karhunen-Loe`ve decomposition, the least-squares support-vector-machine modeling approach is extended to spatially distributed systems. The temporal input is taken only for simplification, and this modeling method can be easily applied to the case of spatiotemporal input.

340

Ind. Eng. Chem. Res., Vol. 50, No. 1, 2011

Simulations were also presented to illustrate the effectiveness of this modeling method and its potential for a wide range of distributed industrial processes.

where A is an L × L upper unit triangular matrix

[

Acknowledgment The authors thank the associate editor and the anonymous reviewers for their valuable comments and suggestions. This work was partially supported by an RGF project from RGC of Hong Kong (CityU 117310); an SRG project from City University of Hong Kong (7002562); and grants from NSF China (61004047, 60804033), National Basic Research Program of China (2009CB724301), and National S&T Major Projects (2009ZX04013-021, 2009ZX04002-061).

]

1 a12 · · · a1L 0 1 · · · a2L A) 0 0 ··· l 0 0 0 1 and Q is an L × L orthogonal matrix Q ) [q1 · · · qL ]

(38)

(39)

with orthogonal columns that satisfy qiTqj ) 0,

if i * j

(40)

Rearranging eq 32 yields Appendix: OLS-RBF Modeling Algorithm The following OLS-RBF modeling algorithm is summarized from previous work.45-47 Assume that L training samples, L , are available. Then, a radial basis function model {z(t),y(t)}t)1 can be obtained as y(t) ) θTφ[z(t)]

where

]

[

b11 · · · bn1 B ) l ··· l b1L · · · bnL

(29)

(41)

(42)

and bi ) [b1i · · · bni]. The total squared error cost function

or equivalently yi(t) ) θiTφ[z(t)],

i ) 1, ..., n

L

∑ θ K[z(t), z(τ)] + e (t), iτ

i ) 1, ..., n

i

(31)

τ)1

where ei(t) is the training error. Equation 31 forms a “full” model because all of the data are used to construct the model. However, the full model will lead to overfitting and require a longer training time. The orthogonal least-squares method48 is an efficient subset selection algorithm. Equation 31 can be expressed in matrix form as Y ) PΘ + Ξ where Y)

[

[ [

e1(1) · · · en(1) · ·· l Ξ) l e1(L) · · · en(L)

] ]

K[z(1), z(1)] · · · K[z(1), z(L)] · ·· P) l l K[z(L), z(1)] · · · K[z(L), z(L)]

[

]

(33)

bi )

qiTY qiTqi

i ) 1, ..., L

,

(44)

With A and B, Θ can be determined from AΘ ) B. Using eqs 41 and 44, after some simple calculation, the cost function can be expressed as L

J ) trace(YTY) -

∑ (b b

T

i i

)(qiTqi)

(45)

i)1

The error reduction ratio (ERR)49 due to qi can be expressed as ERRi )

(bibiT)(qiTqi) trace(YTY)

(46)

Using the ERR, significant model terms can be selected in a forward-selection procedure (see ref 48 for more details). At the ith iteration, the term that gives the largest ERRi is selected and added to the previously selected (i - 1)-term model. The selection procedure can be terminated when Ls

1-

(34)

]

(35)

∑ ERR

i