Probabilistic PCA-Based Spatiotemporal Multimodeling for Nonlinear

Mar 27, 2012 - eters can be determined using an usual eigendecomposition of the sample covariance matrix or a more efficient expectation- maximization...
0 downloads 0 Views 624KB Size
Article pubs.acs.org/IECR

Probabilistic PCA-Based Spatiotemporal Multimodeling for Nonlinear Distributed Parameter Processes Chenkun Qi,*,† Han-Xiong Li,‡,⊥ Shaoyuan Li,§ Xianchao Zhao,† and Feng Gao† †

School of Mechanical Engineering, State Key Laboratory of Mechanical System and Vibration and §Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, China ‡ Department of Systems Engineering & Engineering Management, City University of Hong Kong, Hong Kong, China ⊥ State Key Laboratory of High Performance Complex Manufacturing, Central South University, China ABSTRACT: Many industrial processes are nonlinear distributed parameter systems (DPSs). Data-based spatiotemporal modeling is required for analysis and control when the first-principles model is unknown. Because a DPS is infinite-dimensional and time−space coupled, a low-order model is necessary for prediction and control in practice. For low-order modeling, traditional principal component analysis (PCA) is often used for dimension reduction and time−space separation. However, it is a linear method and leads to only one set of fixed spatial basis functions. Therefore, it might not be always effective for nonlinear systems. In this study, a spatiotemporal multimodeling approach is proposed for unknown nonlinear DPSs. First, multimodel decomposition is performed, where probabilistic PCA (PPCA) is used to obtain multiple sets of spatial basis functions from the experimental data by maximizing a likelihood function. Using these multiple sets of PCA spatial bases for time−space separation, the high-dimensionality spatiotemporal data can be reduced to multiple sets of low-dimensionality temporal series. Then, multiple low-order neural models can be easily established to model these local dynamics. Finally, the original spatiotemporal dynamics can be reconstructed by multimodel synthesis. Because the proposed spatiotemporal modeling approach involves a multimodeling mechanism, it can achieve better performance than the traditional PCAbased single-modeling for nonlinear DPSs, which is demonstrated by numerical simulations.

1. INTRODUCTION Distributed parameter systems (DPSs) are a class of spatiotemporal dynamic systems that exist widely in physical, chemical, and biological processes, such as thermal, transport−reaction, and fluidflow processes. System modeling is often required for analysis, prediction, and control. In particular, an offline model instead of an online model is often required in many cases. Compared with the modeling of traditional lumped parameter systems (LPSs), the modeling of DPSs is very complicated because of the spatiotemporal coupling and infinite-dimensional characteristics. Although partial differential equations (PDEs) can be used to describe the DPSs, the PDEs are also infinite-dimensional. For practical implementation, model reduction methods, such as finite-difference and finiteelement methods, the spectral method,1−3 and the approximate inertial manifold,4 have to be used to reduce infinite-dimensional PDEs to finite-dimensional ordinary differential equations (ODEs). However, all of these methods require the PDEs to be known. In many situations, an accurate PDE model cannot be obtained from first principles because of missing process knowledge. Thus, databased modeling or system identification is often used. When the structure of the PDEs is known and only some parameters are unknown, parameter estimation methods5−7 can be used. If the structure of the PDEs is unknown, the problem becomes a black-box spatiotemporal modeling problem. This kind of modeling becomes very difficult because of complex dynamics, especially spatiotemporal coupling and nonlinear dynamics. Generally, there are two ways to identify unknown DPSs, namely, the infinite-dimensional PDE modeling approach and the finite-dimensional ODE spatiotemporal modeling approach. The identification of PDE models has attracted extensive studies.8−10 After the PDE model has been recovered from the © 2012 American Chemical Society

spatiotemporal data, model reduction methods are still needed for practical applications. Finite-dimensional model identification methods for unknown DPSs are often developed based on model reduction methods for known DPSs. Time−space discretization of a PDE using the finitedifference method leads to a so-called lattice dynamical system, whose identification problem has been studied.11−14 However, this method often requires a large number of spatial discretization points, which results in a high-order model. Another method comes from time−space separation through expansion in a set of spatial basis functions. After the infinite-dimensional expansion is truncated into a finite-dimensional expansion, it becomes a traditional finitedimensional modeling problem. In general, the model performance is highly dependent on the choice of basis functions. The dimension of a model obtained using the finite-element bases15 and Fourier spectral bases16 might not be sufficiently low for many practical applications. This study focuses on the low-dimensionality modeling of unknown DPSs in complex nonlinear circumstances, where the obtained model can work offline. Principal component analysis (PCA), which is also called Karhunen−Loève decomposition,17−22 is a popular approach to finding principal spatial structures from data. Among all linear expansions, PCA basis functions can provide lower-dimensionality models. PCA has been widely used for identifying DPSs with the help of traditional system identification techniques, such as neural networks,23−27 the Wiener model,28 and the Hammerstein Received: Revised: Accepted: Published: 6811

November 13, 2011 March 25, 2012 March 27, 2012 March 27, 2012 dx.doi.org/10.1021/ie202613t | Ind. Eng. Chem. Res. 2012, 51, 6811−6822

Industrial & Engineering Chemistry Research

Article

model.29 However, because traditional PCA is a linear dimension reduction method, it might not be very efficient for modeling nonlinear dynamics.30,31 In addition, only one set of spatial basis functions can be obtained from PCA. This fixed set of bases might not be very suitable for DPSs with complex nonlinear dynamics, because their dominant spatial bases might change. For nonlinear problems, the nonlinear (NL) PCA32−35 has been developed with more complexities. As a nonlinear dimension reduction method, it can retain more information using fewer components. Nonlinear PCA has been used for model reduction of multivariate time series36 and known DPSs,37,38 as well as model identification of unknown DPSs.39 Although it can model more complex nonlinear DPSs, the training of the NL-PCA transformation is not easy. An alternative to the global nonlinear approach is to capture nonlinear complexity by a combination of local linear PCA projections.40−44 Most local PCA methods generally necessitate a two-stage procedure: partitioning of the data space followed by estimation of the principal subspace within each partition. However, this might make the problem nondifferentiable.40 In addition, although several combination methods have been proposed,42 they do not correspond to a probability density, so there is no unique way to combine local PCA models. Adaptive model reduction methods42−44 can be used to deal with nonlinear PDEs, where the spatial bases are updated continually using traditional or adaptive PCA. However, the obtained reduced-order models cannot be used offline because new process data are always required. Fortunately, PCA can be formulated within a probabilistic framework, based on a Gaussian latent variable model whose parameters can be determined using an usual eigendecomposition of the sample covariance matrix or a more efficient expectationmaximization (EM) algorithm.45 Such probabilistic PCA (PPCA) can be further extended to mixture models,46 where the data are partitioned in a soft way and multiple local principal axes are estimated simultaneously. Because multiple sets of local spatial basis functions can be easily obtained and combined in this probabilistic framework, these sets of basis functions are expected to be suitable for modeling nonlinear DPSs. In the modeling of LPSs, multimodeling47−49 is often used for modeling nonlinear dynamics. Each local model describes the local dynamics at one working point. The local model can be linear or nonlinear according to the complexity of the local dynamics. Using the hard switching or soft weighting of multiple local models, an integrated model can better describe the global system. Different multimodeling methods, such as local model networks,50 Takagi− Sugeno fuzzy models,51,52 multiple-neural-network models,53 local polynomial interpolation, and piecewise-linear models, involve the same underlying ideas. Although the multimodeling approach is very efficient for nonlinear systems, it receives a little attention for the modeling of DPSs. In this study, a spatiotemporal multimodeling approach is developed for unknown nonlinear DPSs. First, multimodel decomposition is performed, where multiple sets of spatial basis functions are obtained from the spatiotemporal experimental data using probabilistic PCA by maximizing a likelihood function. Using these multiple sets of PCA spatial bases for time−space separation, the high-dimensionality spatiotemporal data can be transformed and reduced into multiple sets of low-dimensionality temporal series. Then, multiple low-order neural networks can be easily established for modeling these local dynamics with the help of traditional identification techniques. Finally, the original spatiotemporal dynamics can be reconstructed using multimodel synthesis. Unlike models obtained from adaptive model reduction methods,42−44

the model obtained in this study can be used offline because multiple sets of bases are obtained in advance. Numerical simulations presented herein demonstrate that this probabilistic-PCAbased multimodeling approach can achieve better performance than the traditional PCA-based single-modeling approach. Because of the space−time-separated structure of the obtained model, traditional control and optimization methods for ODEs using the multimodel can be easily extended to unknown DPSs. Based on the multimodel, single-model control and optimization methods for PDEs can be extended to control more complex systems. The low-order nature of the multimodel can lead to rapid computation and easy implementation. Moreover, the performance can be better because the multimodel is more accurate than the single model. The rest of this article is organized as follows: The spatiotemporal multimodeling methodology is described in section 2. The implementation is presented in section 3, where probabilistic PCA-based multimodel decomposition is introduced in section 3.1 and dynamic modeling using neural networks is presented in section 3.2. Section 4 reports simulations on a typical distributed parameter system. Finally, a few conclusions are presented in section 5.

2. SPATIOTEMPORAL MULTIMODELING METHODOLOGY In the modeling presented herein, we suppose that u(x,t) ∈ R is the spatiotemporal input and y(x,t) ∈ R is the spatiotemporal output, where x ∈ Ω is the spatial variable, Ω is the spatial domain, and t is the time variable. In practice, the spatiotemporal input u(x,t) is often implemented by m actuators with temporal signals ui(t) ∈ R and a certain spatial distribution bi(x) ∈ R (i = 1, ..., m). Thus, for simplicity, we assume u(x,t) = bT(x) u(t), where u(t) = [u1(t), ..., um(t)] ∈ Rm is the control input and b(x) = [b1(x), ..., bm(x)] ∈ Rm is the spatial distribution. The output is measured at N spatial locations x1, ..., xN. The modeling problem is to identify a proper nonlinear spatiotemporal model from the input {u(t)}Lt=1 and the output {y(xi,t)}N,L i=1,t=1, where L is the time length. Because the system has complex nonlinear dynamics, we expect the model to also be nonlinear and to be able to perform well over a reasonable working range. The proposed spatiotemporal multimodeling methodology is shown in Figure 1. (1) The overall spatiotemporal dynamics of the DPS is first decomposed into multiple local spatiotemporal dynamics through probabilistic PCA by maximizing a likelihood function. (2) Then, local spatiotemporal modeling is performed. Time−space separation is used to decompose the local spatiotemporal dynamics into spatial basis functions with corresponding temporal coefficients. Dynamic modeling is performed to determine the dynamics of the temporal coefficients. After time−space synthesis, the local spatiotemporal dynamics are reconstructed. (3) Finally, to reconstruct the original dynamics of the DPS, these local spatiotemporal models are combined using multimodel synthesis. The combination weights are obtained simultaneously from probabilistic PCA. This approach can model nonlinear DPSs because it provides a degree of freedom to integrate multiple local nonlinear models at different states of the system. The complexity of the multimodeling 6812

dx.doi.org/10.1021/ie202613t | Ind. Eng. Chem. Res. 2012, 51, 6811−6822

Industrial & Engineering Chemistry Research

Article

solution is to project the output y(x,t) onto M sets of state spaces y i (x , t ) = P iy(x , t ),

i = 1, ..., M

(5)

i

where P is the projection operator of the ith state space. Each state space can be formed using a set of spatial bases Φi(x), where each set of Φi(·): Ω → Rn consists of n orthonormal spatial basis functions {ϕij(x)}nj=1. Thus, by defining Pi(·) = ΦiT(x)[Φi(x),·], we have y i (x , t ) = Φi T(x)[Φi(x), y(x , t )],

is reduced because the multimodel decomposition and synthesis are performed in a convenient framework. Note that, in the reduced-order modeling of the DPS, one should consider not only the temporal scale, but also the spatial scale. The simultaneous identification of the spatial basis functions and temporal dynamics from the data could achieve a more accurate or lower-order model. However, this type of modeling approach is very complicated because it involves a complex nonlinear optimization problem. For nonlinear DPSs, its implementation is too difficult. Therefore, the time−space-separation-based modeling idea (i.e., first time−space separation and then dynamic modeling) is used here because this approach can largely reduce the modeling complexity. 2.1. Multimodel Decomposition. In the modeling presented herein, multimodel decomposition is first performed to obtain multiple local spatiotemporal dynamics. Each of them can be considered as a local approximation of original nonlinear dynamics. For a nonlinear distributed parameter system

n

y i (x , t ) =

(1)

i=1

yji (t ) = [ϕij(x), y i (x , t )],

y i(t ) = [Φi(x), y i (x , t )] = [y1i (t ), ..., yni (t )]T ∈ Rn

y i(t ) = [Φi(x), y(x , t )]

M i

∑ πiy (x , t ) i=1

(8)

(9)

Using eq 6 and considering the orthonormality of Φi(x), one can compute yi(t) from y(x,t) directly

(3)

the original spatiotemporal output has the decomposition y(x , t ) =

j = 1, ..., n

where, in practice, the inner product can be computed from the pointwise data using spline integration. For convenience, we define a vector form of temporal coefficients

(2)

where M is the number of local models, f i is the ith local model, and πi is the weight of the ith local model (πi ≥ 0 and ∑πi = 1). By defining the output corresponding to the ith local model as y i (x , t ) = fi [x , {u(τ)}]

(7)

Correspondingly, the temporal coefficients are obtained as

M

∑ πifi [x , {u(τ)}]

∑ ϕij(x)yji (t ) j=1

where {u(τ)} = {u(τ)| τ = 1, ..., t} denotes the input and f denotes the system model, the multimodel decomposition can be expressed as y(x , t ) =

(6)

where [f(x), g(x)] = ∫ Ω f(x) g(x) dx denotes the inner product. By assigning proper weights πi to the state spaces, the multimodel decomposition can be performed. In this study, the projections in eq 5 are assumed to be of the same dimension for simplicity. With a minor revision, the proposed method can also be applicable for the case of different dimensions. When the dimensions are different, more complex dynamics can be modeled but with a more complex model structure. In practice, same-dimensionality projections are sufficient for many systems. The same dimension is also convenient for prediction and control applications. Now the problem becomes to find M sets of spatial bases Φi(x) with proper weights πi from the experimental data y(x,t). This problem can be solved through probabilistic PCA by maximizing a likelihood function, as discussed in section 3.1. 2.2. Local Spatiotemporal Modeling. For local spatiotemporal dynamics, time−space separation can be easily performed. Then, multiple local dynamic models can be obtained using traditional modeling methods. By integrating these local models, the original nonlinear spatiotemporal dynamics can be modeled. 2.2.1. Time−Space Separation. For easy understanding and comparison, traditional PCA for time−space separation is presented in Appendix A. Unlike traditional PCA, where only one set of spatial basis functions is used for time−space separation, multimodel decomposition through probabilistic PCA leads to M sets of spatial basis functions Φi(x) (i = 1, ..., M) for time−space separation, where each set Φi(·): Ω → Rn consists of n orthonormal spatial basis functions {ϕij(x)}nj=1. Thus, for the ith local dynamics, the time−space separation of the output can be expressed as

Figure 1. Spatiotemporal multimodeling methodology.

y(x , t ) = f [x , {u(τ)}]

i = 1, ..., M

(10)

2.2.2. Dynamic Modeling. The next step of the spatiotemporal modeling is to establish the dynamic relationship between yi(t) ∈ Rn and u(t) ∈ Rm for each local dynamic system. The low-dimensionality time series yi(t) can be obtained by projecting the spatiotemporal data y(x,t) onto

(4)

Given the original output y(x,t), the main task of multimodel decomposition is to find the individual output yi(x,t) of each local model and its weight πi among all local models. One 6813

dx.doi.org/10.1021/ie202613t | Ind. Eng. Chem. Res. 2012, 51, 6811−6822

Industrial & Engineering Chemistry Research

Article

the ith set of spatial bases Φi(x) as described in eq 10. Because of the reduced dimensionality of yi(t) (i.e., its dimension is less than the dimension of the spatiotemporal data, n < N), the spatiotemporal modeling becomes a simple low-order temporal modeling problem. Thus, traditional nonlinear system identification techniques can be easily applied. The time series yi(t) are often described by nonlinear autoregressive with exogenous input (NARX) models because they can cover a wide range of nonlinear dynamics54

but the computation is from the right side to the left side. For simplicity, πi is assumed to be independent of states in this study. The modeling complexity is largely reduced because the multimodel decomposition, local modeling, and multimodel synthesis becomes relatively simple. To model more complex nonlinear systems, the proposed method can be improved where πi is dependent on states. In that case, the relationship between the weights πi and the states should be established. Selection or optimization of the model size (or degrees of freedom) is a key issue in the modeling. In general, to avoid underfitting and overfitting, the model complexity should be matched with the system complexity. Complex systems might need complex models. Here, it is very convenient to adjust the model size by changing the dimension and the number of local models according to the final modeling performance.

y i(t ) = F[y i(t − 1), ···, y i(t − dy), u(t − 1), ···, u(t − du)] + e(t )

(11)

where du and dy denote the maximum input and output lags, respectively, and e(t) ∈ Rn denotes the model error. The unknown function F: Rndy+mdu → Rn can be estimated from the input−output data {u(t), yi(t)}Lt=1 by minimizing a properly defined modeling error criterion using one of many options, such as polynomials, neural networks, wavelets, and kernel functions.55 In this study, the implementation is based on neural networks, as discussed in section 3.2. 2.2.3. Time−Space Synthesis. For the ith model, the spatiotemporal output can be reconstructed using time−space synthesis, where the equation is the same as eq 7 in time−space separation, but the computation is from the right side to the left side. In summary, the ith spatiotemporal model is composed of a low-order temporal model

3. SPATIOTEMPORAL MULTIMODELING IMPLEMENTATION The implementation of the proposed spatiotemporal multimodeling methodology includes three main steps: multimodel decomposition; local spatiotemporal modeling; and multimodel synthesis, where local spatiotemporal modeling is further divided into the time−space separation, dynamic modeling, and time− space synthesis. Among these steps, multimodel decomposition and dynamic modeling are two key issues, as discussed in the following sections, whereas the other steps can be easily implemented as in section 2. 3.1. Mixture of PPCA for Multimodel Decomposition. For multimodel decomposition, the design of multiple sets of spatial basis functions is an essential step that can be performed using statistics-based probabilistic PCA (PPCA). Probabilistic PCA provides an alternative way to implement PCA in a probability framework.45 More importantly, this probability framework can be used to derive a mixture of probabilistic PCA model,46 where multiple local PCA models can be easily combined together to model complex data structures. Moreover, all of the parameters can be determined from the maximum-likelihood, where both the appropriate partitioning of the data and the determination of the respective principal axes are automatic. For easy understanding, single probabilistic PCA is given in Appendix B. Now, for the mixture of probabilistic PCA model, the log-likelihood of observing the output {yx(t)}Lt=1 is46

ŷ i(t ) = F̂ [ŷ i(t − 1), ···, ŷ i(t − dy), u(t − 1), ···, u (t − du)]

(12)

and a time−space synthesis equation n

y î (x , t ) =

∑ ϕij(x)yjî (t ) j=1

i

(13)

i

where ŷ (t) and ŷ (x,t) denote the model predictions. Given the initial conditions, the model in eq 12 can generate the prediction ŷi(t) at any time t. Combined with eq 13, this loworder model can reproduce the ith local spatiotemporal dynamics. 2.3. Multimodel Synthesis. As shown in Figure 2, the original spatiotemporal output can be reproduced using multimodel synthesis

L

J=

t=1

M

∑ πiy î (x , t ) i=1









(15)

where p[yx(t)|i] ∈ R is a single probabilistic PCA model (see eq 31 in Appendix B) and πi ∈ R is the corresponding mixing proportion, with πi ≥ 0 and ∑πi = 1. For simplicity, here, yx(t) ∈ RN is used to represent the spatiotemporal vector [y(x1,t), ..., y(xN,t)]T. Compared with the single PCA model, now each of M mixture components has a separate mean vector yix̅ ∈ RN, along with the parameters Wi ∈ RN×n and σi2 ∈ R (see eq 28 in Appendix B). The mixture model requires the combination of all components according to the proportions πi, followed by sampling from the yx ∈ RN and ε ∈ RN distributions and application of eq 28 in Appendix B as in the single-model case, taking care to utilize the appropriate parameters yix̅ , Wi, and σi2. Furthermore, for a given data point yx(t), there is a posterior distribution associated with each PCA model, and the mean of the ith PCA model is given by (σi2I + WiTWi)−1WiT[yx(t) − yix̅ ].

Figure 2. Synthesis of multiple spatiotemporal models.

y ̂( x , t ) =

⎫ ⎧M ln ∑ ⎨∑ πip[yx(t )|i]⎬ ⎭ t=1 ⎩ i=1 L

∑ ln{p[yx(t )]} =

(14)

where ŷi(x,t) denotes the ith model prediction in eq 13. This equation is similar to eq 4 in the multimodel decomposition, 6814

dx.doi.org/10.1021/ie202613t | Ind. Eng. Chem. Res. 2012, 51, 6811−6822

Industrial & Engineering Chemistry Research

Article

An iterative EM algorithm46 has been developed to optimize all of the model parameters πi, yix̅ , Wi, and σi2. If Rti = p[i|yx(t)] is the posterior responsibility of the mixture i for generating the data point yx(t)

R ti =

There are different algorithms to train this hybrid RBF network. Note that the model should not only work on the modeling data, but also work well on the untrained data. To achieve this goal, the experimental data are separated into three sets in the modeling: estimation data, validation data, and testing data. The model is first trained using the estimation data. Then, the prediction error on the validation data provides a guide to determine some design parameters in the modeling (e.g., the model size) and to retrain the model. Cross-validation can guarantee that the model has a certain generalization capability on the untrained data. Finally, the trained model is tested on a new set of testing data. In the training stage, most algorithms determine the parameters cj and Σj first and then estimate unknown parameters A, B, and W using the recursive least-squares method.56 Note that the center cj is often selected using clustering techniques or placed on a proper uniform grid within the relevant region of the estimation data. The norm matrix Σj is often chosen to be diagonal and contains the variance of the estimation data. These parameters cj and Σj can also be adjusted according to the model prediction error on the validation data. The network size (i.e., the number of neurons l) should be carefully determined, because a model that is too simple leads to the underfitting problem and a model that is too complex leads to the overfitting problem. In this study, the regularization technique on the estimation stage and the cross-validation technique on the validation stage were used to determine a satisfactory network size.

p[yx(t )|i]πi p[yx(t )]

(16)

then it can be shown that the parameters can be updated as follows

π̃i =

1 L

L

∑ R ti

(17)

t=1 L

ỹ ix =

∑t = 1 R ti yx(t ) L

∑t = 1 R ti

(18)

σi2

Furthermore, the axes Wi and the noise variance can be determined from the local responsibility-weighted covariance matrix Si =

1 π̃iL

L

∑ R ti[yx(t ) − ỹ ix][yx(t ) − ỹ ix]T t=1

∈ RN × N (19)

by the standard eigendecomposition in the same manner as for a single PPCA model. As shown in ref 46, the above-defined procedure can find a solution of the local maximum of the log-likelihood in eq 15 by iteration of eqs 16−18 in sequence followed by computation of Wi and σi2 from eq 19 using eqs 39 and 40 in Appendix B. Each weight matrix Wi spans the principal subspace of its respective Si. Finally, the multiple sets of continuous spatial basis function Φi(x) (i = 1, ..., M) can be obtained from Wi using spline interpolation. 3.2. Neural Network for Dynamic Modeling. In our studies, for simplicity, the model in eq 11 is assumed to be of the form

4. APPLICATIONS In the simulations, we consider the temperature distribution in a long, thin nonisothermal catalytic packed-bed reactor as shown in Figure 3.57 A reaction of the form C → D takes place on the

y i(t ) = A y i(t − 1) + F̅ [y i(t − 1)] + Bu(t − 1) + e(t ) (20)

where the matrices A ∈ Rn×n and B ∈ Rn×m denote the linear part and F̅: Rn → Rn denotes the nonlinear part. It can easily be seen from eq 22 that the system in the simulation studies can give a linear and nonlinear separated model when linear PCA is used for dimension reduction. This model can still be used in the probabilistic PCA-based multimodeling approach, because it reduces the complexity of nonlinear function F in eq 11. In the identification procedure, F̅ is modeled by a network of radial basis functions (RBFs) because of its good nonlinear approximation capability. Therefore, the model in eq 20 can be rewritten as

Figure 3. Catalytic packed-bed reactor.

catalyst. The reaction is endothermic, and a jacket is used to heat the reactor. This system is a typical DPS in the chemical industry. A dimensionless model to describe this nonlinear tubular chemical reactor is given by εp

∂t

=−

∂yg ∂x

+ αc(y − yg ) − αg[yg − b(x)T u(t )]

∂y ∂ 2y = 2 + β0eγy /(1 + y) − βc(y − yg ) ∂t ∂x − βp[y − b(x)T u(t )]

y i(t ) = A y i(t − 1) + W K[y i(t − 1)] + Bu(t − 1) + e(t )

∂yg

(21)

where W = [W1, ..., Wl] ∈ R denotes the weight, K(·) = [K1(·), ..., Kl(·)]T: Rn → Rl denotes the radial basis function, and l is the number of neurons. The radial basis function is often selected as a Gaussian kernel Kj(yi) = exp[−(yi − cj)T(Σj)−1(yi − cj)/2] (j = 1, ..., l) with the proper center vector cj ∈ Rn and the norm matrix Σj ∈ Rn×n. Note that other approximators can also be used here, such as polynomials, multilayer neural networks, and support vector machines. n×l

(22)

subject to boundary conditions atx = 0, yg = 0, atx = 1, 6815

∂y =0 ∂x

∂y =0 ∂x (23)

dx.doi.org/10.1021/ie202613t | Ind. Eng. Chem. Res. 2012, 51, 6811−6822

Industrial & Engineering Chemistry Research

Article

where yg, y, and b(x)T u(t) denote the dimensionless temperatures of the gas, the catalyst, and the jacket, respectively. It is assumed that only catalyst temperature measurements are available. The values of the process parameters are given by εp = 0.01, γ = 21.14, βc = 1.0, βp = 15.62, β0 = −0.003, αc = 0.5, and αg = 0.5. For simplicity, we assume that only one heater u(t) is available with a spatial distribution b(x) = sin(πx), although the proposed modeling approach is also applicable for the case of multiple heaters. To obtain the training data, the input is set as u(t) = 1.1 + 3 × randn × sin(500t), where randn is a normally distributed random function with mean 0 and standard deviation 1. Some samples for the input u(t) with the dimensionless simulation time 0.05 are shown in Figure 4. Sixteen

Figure 5. Three spatial basis functions in the first local model.

Figure 4. Input signal of the actuator, u(t).

Figure 6. Three spatial basis functions in the second local model.

sensors are used to capture the sufficient spatial information (N = 16). A noise-free data set of 2000 data points is collected by simulating the system in eqs 22 and 23. The sampling interval Δt is 0.0001, and the simulation time is 0.2. The initial condition is set to be the steady state with the input u(t) = 1.1. The measured noisy data are obtained by adding Gaussian white noise with mean 0 and standard deviation σ(x) = 0.0001 to the noise-free data. This set of data is used as the training data, with the first 1500 data points as the estimation data and the next 500 data points as the validation data. The validation data are used not to estimate the model parameters but to determine the model size M, n, and l using the cross-validation method. To test the model performance, a new set of 1000 noise-free data points is collected as the testing data. As shown in Figures 5 and 6, two sets of PCA spatial bases (M = 2) are used in the modeling, each of which includes the first three PCA spatial bases (n = 3). The two sets of bases are very similar with only small differences. In fact, depending on different systems, they could be significantly different. It takes about 10 iterations for the EM algorithm to converge. This is acceptable because the computation time is about 5 s. The computation environment is 1.83 GHz Intel Centrino Duo CPU, 1 GB RAM, Windows XP, and Matlab 7.0. Ten radial basis functions (l = 10) are chosen as nonlinear bases in the neural model identification. The measured output for the training, the predicted output, and the prediction error of the model on the training data are shown in Figures 7−9, respectively. On the other hand, the measured output for the testing, the predicted output, and the prediction error of the model on the testing data are shown in Figures 10−12, respectively. It can be seen that the two-PPCA model can capture the dominant dynamics of the original system.

Figure 7. Measured output for the training.

The obtained spatiotemporal multimodel can be used for finitedimensional controller design. Note that the proposed method is mainly applicable to parabolic PDEs, where the eigenspectrum can be separated into two parts: slow and fast. For some hyperbolic PDEs, the method might also be applicable, but the model order might not be too low because of the difficult separation of the eigenspectrum.28 The simulation example is a mixture of a hyperbolic PDE and a parabolic PDE. It has a twotime-scale property,57 where the dynamics of the gas temperature in the hyperbolic PDE is much faster than the dynamics of the catalyst temperature in the parabolic PDE. In our study, the catalyst temperature in the parabolic PDE is modeled, and the hyperbolic part is considered to be in steady state because of its fast dynamics. To verify the influence of different sets of PCA bases on the modeling performance, the prediction errors of the traditional PCA model (with only three PCA bases) and of one-, two-, and three-PPCA models (with one, two and three sets of PCA bases) on the training data (simulation time 0.2) and the 6816

dx.doi.org/10.1021/ie202613t | Ind. Eng. Chem. Res. 2012, 51, 6811−6822

Industrial & Engineering Chemistry Research

Article

Figure 12. Prediction error of the two-PPCA model on the testing data.

Figure 8. Predicted output of the two-PPCA model on the training data.

Figure 9. Prediction error of the two-PPCA model on the training data.

Figure 13. SNAE(t) of the traditional PCA model and the one-, two-, and three-PPCA models on the training data.

Figure 10. Measured output for testing.

Figure 14. SNAE(t) of the traditional PCA model, and the one-, two-, and three-PPCA models on the testing data.

Table 1. RMSEs of the Models on the Estimation, Validation, and Testing Data

Figure 11. Predicted output of the two-PPCA model on the testing data.

model

estimation

validation

testing

traditional PCA model one-PPCA model two-PPCA model three-PPCA model

0.0004088 0.0004086 0.0001114 0.0001040

0.0005359 0.0005449 0.0001140 0.0001108

0.0008854 0.0008896 0.0001324 0.0001244

spatial normalized absolute error. The root-mean-square error, RMSE = {[∫ ∑e(x,t)2 dx]/(∫ dx ∑Δt)}1/2, of these models on the estimation, validation, and testing data is also compared in Table 1. It can be seen that the one-PPCA model is similar to the traditional PCA model, and the modeling error becomes

testing data (simulation time 0.1) are shown in Figures 13 and 14, respectively, where SNAE(t) = (1/N)∑Ni=1| e(xi,t)| is the 6817

dx.doi.org/10.1021/ie202613t | Ind. Eng. Chem. Res. 2012, 51, 6811−6822

Industrial & Engineering Chemistry Research

Article

smaller with more sets of PCA bases. However, for this system, two sets of PCA bases are sufficient for a good approximation because one more set of PCA bases provides less improvement in the modeling performance. This means that the complexity of two local models can just match the excited system complexity. For more complex systems, more local models might be required. However, a model that is too complex might result in a slightly better or even worse performance because of possible overfitting. Therefore, in this case, the two-PPCA model was selected as the final model. The proposed multimodeling approach is further compared with the single PCA modeling approach with more spatial bases. As shown in Figures 15 and 16, the two-PPCA model Figure 17. SNAE(t) of the traditional PCA model (with nine PCA bases) and the three-PPCA model (with a total of nine PCA bases) on the training data.

Figure 15. SNAE(t) of the traditional PCA model (with six PCA bases) and the two-PPCA model (with a total of six PCA bases) on the training data. Figure 18. SNAE(t) of the traditional PCA model (with nine PCA bases) and the three-PPCA model (with a total of nine PCA bases) on the testing data.

Figure 16. SNAE(t) of the traditional PCA model (with six PCA bases) and the two-PPCA model (with a total of six PCA bases) on the testing data. Figure 19. SNAE(t) of the traditional PCA model and the one-, two-, and three-PPCA models on the training data.

(with a total of six PCA bases) can achieve better performance than the sixth-order single model (with six PCA bases). Similarly, as shown in Figures 17 and 18, the three-PPCA model (with a total of nine PCA bases) is also better than the ninth-order single model (with nine PCA bases). This is because nonlinear dynamics can be better captured by the proposed method. To test the modeling performance under random parameters in the system, β0 is set as β0 = −0.003 with a probability of 0.8 and β0 = −0.0025 with a probability of 0.2. The prediction errors, SNAE(t), of the traditional PCA model and one-, two-, three-PPCA models on the training data and testing data are shown in Figures 19 and 20, respectively. The performance

index RMSE is reported in Table 2. The one-PPCA model has an accuracy similar to that of the traditional PCA model. The modeling performance is better with more sets of PCA bases. Compared with Table 1, it can be observed that the modeling error increases with this random parameter. However, the model error of the traditional PCA model increases more than the model errors of the two- and three-PPCA models. The twoPPCA model accuracy is satisfactory, and the algorithm can work reasonably well because of the multimodeling mechanism. Note that, although satisfactory modeling performance is achieved in the simulations, there is often some model error 6818

dx.doi.org/10.1021/ie202613t | Ind. Eng. Chem. Res. 2012, 51, 6811−6822

Industrial & Engineering Chemistry Research

Article

The projection of the spatiotemporal output y(x,t) ∈ R onto these PCA spatial bases can be expressed as y(t ) = Py(x , t ) = [Φ(x), y(x , t )] = [y1(t ), ..., yn (t )]T (24)

yj (t ) = [ϕj(x), y(x , t )],

n

y ̂( x , t ) =

validation

testing

traditional PCA model one-PPCA model two-PPCA model three-PPCA model

0.0007001 0.0006999 0.0001213 0.0001159

0.0008799 0.0008960 0.0001223 0.0001287

0.0011060 0.0011120 0.0001393 0.0001343

∑ ϕj(x)yj (t ) j=1

Table 2. RMSEs of the Models with the Random System Parameter β0 estimation

(25)

where P is an orthonormal projection operator, y(t) ∈ Rn is the corresponding temporal coefficients, and [f(x),g(x)] = ∫ Ω f(x) g(x) dx denotes the inner product. On the other hand, the spatiotemporal output can be reconstructed using time−space synthesis as

Figure 20. SNAE(t) of the traditional PCA model and the one-, two-, and three-PPCA models on the testing data.

model

j = 1, ..., n

(26)

Thus, the goal PCA is to find the most characteristic spatial n among the spatiotemporal output modes {ϕ j (x)} j=1 N,L {y(xi,t)}i=1,t=1, which can be performed by minimizing the objective function min ||e(x , t )||2 Φ(x)

(27)

where e(x,t) = y(x,t) − ŷ(x,t) is the reconstruction error and ⟨f(x,t)⟩ = (1/L)∑Lt=1f(x,t) and ||f(x)|| = [f(x), f(x)]1/2 are the ensemble average and the norm, respectively. PCA can be implemented in many ways, such as singular value decomposition, the spatial correlation method, and the temporal correlation method.22

between the estimated model and the true model. In practice, the “true” multimodel is unknown because of complex nonlinearities and unknown dynamics. Thus, it is very difficult to check how the true model is discovered. One practical approach is to look at the final modeling performance using the simulations, as was done in this study. It is known that multimodel decomposition is performed by maximizing a likelihood function as in eq 16 and that the obtained multimodel is optimal in this sense. In theory, model error analysis is still an open problem even for the multimodeling of lumped parameter systems. This issue is worth further investigation in the future.



APPENDIX B: PROBABILISTIC PCA

B.1. Probability Model

Probabilistic PCA45,46 is based on a linear latent variable model yx(t ) = W y(t ) + yx̅ + ε(t )

5. CONCLUSIONS In this study, a spatiotemporal multimodeling approach is proposed for unknown nonlinear DPSs. Unlike traditional PCA, in which only one set of spatial basis functions is obtained for time−space separation, the mixture of probabilistic PCA model is used here to obtain multiple sets of spatial bases for the multimodel decomposition and time−space separation. With multiple sets of spatial bases, the complex nonlinear dynamics are first decomposed into multiple simple local dynamics. Moreover, with time−space separation, the high-dimensionality spatiotemporal data can be reduced into multiple sets of low-dimensionality temporal series. Then, multiple low-order neural models are constructed to model these local dynamics. Finally, the original spatiotemporal dynamics are reconstructed through multimodel synthesis. Because the proposed modeling approach integrates multiple local nonlinear models, it is more effective for nonlinear DPSs than the traditional PCA-based single-modeling approach. Its effectiveness was verified herein by numerical simulations. Future studies will include multimodel-based control design.

(28)

For simplicity, the spatiotemporal output [y(x1,t), ..., y(xN,t)]T is denoted as yx(t) ∈ RN. The matrix W ∈ RN×n contains spatial basis functions (factor loadings), and yx̅ ∈ RN permits the nonzero mean of yx(t). In the derivation, we assume the latent variables y(t) ∈ Rn to be independent and Gaussian with unit variance y(t) ≈ N(0,I), and we assume the noise ε(t) ∈ RN to be Gaussian with diagonal variance Ψ ∈ RN×N: ε(t) ≈ N(0,Ψ). Then, it can be shown that the output yx(t) is also normally distributed, that is, yx(t) ≈ N(yx̅ ,C), where the covariance is C = Ψ + WWT ∈ RN×N. The problem is to determine unknown parameters yx̅ , W, and Ψ from the spatiotemporal data N,L . {y(xi,t)}i=1,t=1 For simplicity, we consider the case of isotropic noise ε(t) ≈ N(0,σ2I). Then, for a given y(t), the probability distribution of yx(t) can be derived from eq 28 as p[yx(t )|y(t )] = (2πσ2)−N /2 exp[−||yx(t ) − W y(t ) − yx̅ ||2 /2σ2] (29)



APPENDIX A: TRADITIONAL PCA FOR TIME−SPACE SEPARATION Traditional PCA has one set of orthonormal spatial basis functions Φ(x) = {ϕj(x)}nj=1: Ω → Rn for time−space separation.

Because the latent variables are Gaussian, defined by p[y(t )] = (2π)−n/2 exp[−y(t )T y(t )/2] 6819

(30)

dx.doi.org/10.1021/ie202613t | Ind. Eng. Chem. Res. 2012, 51, 6811−6822

Industrial & Engineering Chemistry Research

Article

Noise Variance σ2. It can also be shown that, for W = WML, the maximum-likelihood estimator for σ2 is given by45,46

the marginal distribution of yx(t) can be expressed as p[yx(t )] =

∫ p[yx(t )|y(t )]p[y(t )]dy(t )

σML 2 =

= (2π)−N /2 |C|−1/2 exp{− [yx(t ) − yx̅ ]T × C −1[yx(t ) − yx̅ ]/2}

where the covariance is (32)

Given the variables yx(t), the posterior distribution of the latent variables y(t) can be derived using Bayes’ rule46

× (σ−2Q ){y(t ) − Q−1WT[yx(t ) − yx̅ ]}/2) (33)

where the posterior covariance matrix is given by (34)

Note that Q ∈ Rn×n and C ∈ RN×N. The log-likelihood of observing the data under this model is L

yx̂ (t ) = WML(WML TWML)−1Q y(t ) + yx̅

∑ ln{p[yx(t )]} t=1

L = − [N ln(2π) + ln|C| + tr(C −1S)] 2

(35)

L

∑ [yx(t ) − yx̅ ][yx(t ) − yx̅ ]T t=1



(36)

is the sample covariance matrix of the observed {yx(t)}.

*E-mail: [email protected].

We now consider the maximum-likelihood estimators for the parameters yx̅ , W, and σ2. Mean yx̅ (t). It can be easily shown that the maximumlikelihood estimate of the parameter yx̅ is given by the mean of the data46 1 L

Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The authors thank the associate editor and the anonymous reviewers for their valuable comments and suggestions. This work was partially supported by a GRF grant from RGC of Hong Kong SAR (CityU: 117310) and grants from NSF China (60825302, 61004047, 60804033, 51175519), and Specialized Research Fund for the Doctoral Program of Higher Education of China (20100073120045).

L

∑ yx(t ) t=1

(37)

Weight Matrix W. By taking the derivative of eq 35 with respect to W ∂J = L(C −1SC −1W − C −1W ) ∂W



(38)

it can be shown that the log-likelihood eq 35 is maximized when the columns of W span the principal subspace of the data45,46 W = Un(Λ n − σ2I )1/2 V

AUTHOR INFORMATION

Corresponding Author

B.2. Solutions of the Maximum-Likelihood Estimators

yx̅ =

(41)

with reconstruction error identical to that of conventional PCA. It is not only the posterior mean ⟨y(t)⟩ that can be considered as the reduced-dimensionality representation of the data. In fact, in addition to the conventional PCA representation UnT[yx(t) − yx̅ ], the vectors ŷ(t) = WMLT[yx(t) − yx̅ ] could equally be used without loss of information and reconstructed using ŷx(t) = WML(WMLTWML)−1ŷ(t) + yx̅ .46

where 1 S= L

(40)

In conventional PCA, the projection of yx(t) is given by y(t) = UnT[yx(t) − yx̅ ], and its reconstruction is given by ŷx(t) = Uny(t) + yx̅ . However, in probabilistic PCA, yx(t) is represented in the latent space not by a single vector, but by the Gaussian posterior distribution defined by eq 33. Thus, a convenient representation of yx(t) would be the posterior mean ⟨y(t)⟩ = Q−1 WMLT[yx(t) − yx̅ ].45,46 Note that WML⟨y(t)⟩ + yx̅ derived from eq 28, might not be the optimal linear reconstruction of yx(t) because it is not an orthonormal projection of yx(t) when σ2 > 0. In fact, the optimal least-squares linear reconstruction from the posterior mean ⟨y(t)⟩ can be obtained from45,46

× exp( −{y(t ) − Q−1WT[yx(t ) − yx̅ ]}T

J=

λj

j=n+1

B.3. Dimensionality Reduction and Optimal Reconstruction

p[y(t )|yx(t )] = (2π)−n /2 |σ−2Q |1/2

Q = σ2I + WTW

N



where λn+1, ..., λN are the smallest eigenvalues of S, and so, σML2 has a clear interpretation as the average variance “lost” per discarded dimension. To implement probabilistic PCA, we generally first compute the usual eigendecomposition of S, after which σML2 is found from eq 40, followed by WML from eq 39. The final sigma value can be adjusted by changing n according to the modeling performance.

(31)

C = σ2I + WWT

1 N−n

REFERENCES

(1) Ray, W. H. Advanced Process Control; McGraw-Hill: New York, 1981. (2) Christofides, P. D. Nonlinear and Robust Control of PDE Systems: Methods and Applications to Transport-Reaction Processes; Birkhäuser: Boston, 2001. (3) Li, H.-X.; Qi, C. K. Modeling of distributed parameter systems for applicationsA synthesized review from time−space separation. J. Process Control 2010, 20 (8), 891. (4) Christofides, P. D.; Daoutidis, P. Finite-dimensional control of parabolic PDE systems using approximate inertial manifolds. J. Math. Anal. Appl. 1997, 216 (2), 398.

(39)

The matrix of spatial basis functions Un ∈ RN×n comprises the principal eigenvectors of S; the diagonal matrix Λn ∈ Rn×n contains the corresponding eigenvalues λ1, ..., λn, where the eigenvalues of S are indexed in order of decreasing magnitude; and V is an arbitrary n × n orthonormal rotation matrix. 6820

dx.doi.org/10.1021/ie202613t | Ind. Eng. Chem. Res. 2012, 51, 6811−6822

Industrial & Engineering Chemistry Research

Article

(5) Banks, H. T.; Kunisch, K. Estimation Techniques for Distributed Parameter Systems; Birkhauser: Boston, 1989. (6) Coca, D.; Billings, S. A. Direct parameter identification of distributed parameter systems. Int. J. Syst. Sci. 2000, 31 (1), 11. (7) Müller, T. G.; Timmer, J. Fitting parameters in partial differential equations from partially observed noisy data. Physica D 2002, 171 (1− 2), 1. (8) Guo, L. Z.; Billings, S. A. Identification of partial differential equation models for continuous spatio-temporal dynamical systems. IEEE Trans. Circuits Syst. II: Express Briefs 2006, 53 (8), 657. (9) Guo, L. Z.; Billings, S. A.; Wei, H. L. Estimation of spatial derivatives and identification of continuous spatio-temporal dynamical systems. Int. J. Control 2006, 79 (9), 1118. (10) Bär, M.; Hegger, R.; Kantz, H. Fitting partial differential equations to space−time dynamics. Phys. Rev. E. 1999, 59 (1), 337. (11) Mandelj, S.; Grabec, I.; Govekar, E. Statistical approach to modeling of spatiotemporal dynamics. Int. J. Bifurcation Chaos Appl. Sci. Eng. 2001, 11 (11), 2731. (12) Parlitz, U.; Merkwirth, C. Prediction of spatiotemporal time series based on reconstructed local states. Phys. Rev. Lett. 2000, 84 (9), 1890. (13) Coca, D.; Billings, S. A. Identification of coupled map lattice models of complex spatio-temporal patterns. Phys. Lett. A. 2001, 287 (1), 65. (14) Guo, L. Z.; Billings, S. A. Sate-space reconstruction and spatiotemporal prediction of lattice dynamical systems. IEEE Trans. Autom. Control 2007, 52 (4), 622. (15) Coca, D.; Billings, S. A. Identification of finite dimensional models of infinite dimensional dynamical systems. Automatica 2002, 38 (11), 1851. (16) Deng, H.; Li, H.-X.; Chen, G. Spectral-approximation-based intelligent modeling for distributed thermal processes. IEEE Trans. Control Syst. Technol. 2005, 13 (5), 686. (17) Baker, J.; Christofides, P. D. Finite-dimensional approximation and control of nonlinear parabolic PDE systems. Int. J. Control 2000, 73 (5), 439. (18) Armaou, A.; Christofides, P. D. Dynamic optimization of dissipative PDE systems using nonlinear order reduction. Chem. Eng. Sci. 2002, 57 (24), 5083. (19) Park, H. M.; Cho, D. H. The use of the Karhunen−Loève decomposition for the modeling of distributed parameter systems. Chem. Eng. Sci. 1996, 51 (1), 81. (20) Hoo, K. A.; Zheng, D. Low-order control-relevant models for a class of distributed parameter systems. Chem. Eng. Sci. 2001, 56 (23), 6683. (21) Newman, A. J. Model Reduction via the Karhunen−Loève Expansion. Part II: Some Elementary Examples; Technical Report 96-33; University of Maryland: College Park, MD, 1996. (22) Holmes, P.; Lumley, J. L.; Berkooz, G. Turbulence, Coherent Structures, Dynamical Systems, and Symmetry; Cambridge University Press: New York, 1996. (23) Sahan, R. A.; Koc-Sahan, N.; Albin, D. C.; Liakopoulos., A. Artificial neural network-based modeling and intelligent control of transitional flows. In Proceedings of the 1997 IEEE International Conference on Control Applications; IEEE Press: Piscataway, NJ, 1997; p 359. (24) Zhou, X.; Liu, L.; Dai, Y.; Yuan, W. Modeling of a fixed bed reactor using KL expansion and neural networks. Chem. Eng. Sci. 1996, 51 (10), 2179. (25) Smaoui, N.; Al-Enezi, S. Modelling the dynamics of nonlinear partial differential equations using neural networks. J. Comput. Appl. Math. 2004, 170 (1), 27. (26) Qi, C. K.; Li, H.-X. Hybrid Karhunen−Loève/neural modeling for a class of distributed parameter systems. Int. J. Intell. Syst. Technol. Appl. 2008, 4 (1−2), 141. (27) Aggelogiannaki, E.; Sarimveis, H. Nonlinear model predictive control for distributed parameter systems using data driven artificial neural network models. Comput. Chem. Eng. 2008, 32 (6), 1225.

(28) Qi, C. K.; Li, H.-X. A Karhunen−Loève decomposition based Wiener modeling approach for nonlinear distributed parameter processes. Ind. Eng. Chem. Res. 2008, 47 (12), 4184. (29) Qi, C. K.; Li, H.-X. A time/space separation-based Hammerstein modeling approach for nonlinear distributed parameter processes. Comput. Chem. Eng. 2009, 33 (7), 1247. (30) Malthouse, E. C. Limitations of nonlinear PCA as performed with generic neural networks. IEEE Trans. Neural Netw. 1998, 9 (1), 165. (31) Wilson, D. J. H.; Irwin, G. W.; Lightbody, G. RBF principal manifolds for process monitoring. IEEE Trans. Neural Netw. 1999, 10 (6), 1424. (32) Kramer, M. A. Nonlinear principal component analysis using autoassociative neural networks. AIChE J. 1991, 37 (2), 233. (33) Saegusa, R.; Sakano, H.; Hashimoto, S. Nonlinear principal component analysis to preserve the order of principal components. Neurocomputing 2004, 61, 57. (34) Hsieh, W. W. Nonlinear principal component analysis by neural networks. Tellus A 2001, 53 (5), 599. (35) Hinton, G. E.; Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science 2006, 313 (5786), 504. (36) Hsieh, W. W. Nonlinear multivariate and time series analysis by neural network methods. Rev. Geophys. 2004, 42 (1), RG1003. (37) Kirby, M.; Miranda, R. The nonlinear reduction of highdimensional dynamical systems via neural networks. Phys. Rev. Lett. 1994, 72 (12), 1822. (38) Smaoui, N. Linear versus nonlinear dimensionality reduction of high-dimensional dynamical systems. SIAM J. Sci. Comput. 2004, 25 (6), 2107. (39) Qi, C. K.; Li, H.-X. Nonlinear dimension reduction based neural modeling for spatio-temporal processes. Chem. Eng. Sci. 2009, 64 (19), 4164. (40) Kambhatla, N.; Leen, T. K. Dimension reduction by local principal component analysis. Neur. Comput. 1997, 9 (7), 1493. (41) Cappelli, R.; Maio, D.; Maltoni, D. Multispace KL for pattern representation and classification. IEEE Trans. Pattern Anal. 2001, 23 (9), 977. (42) Broomhead, D. S.; Indik, R.; Newell, A. C.; Rand, D. A. Local adaptive Galerkin bases for large-dimensional dynamical systems. Nonlinearity 1991, 4 (2), 159. (43) Soli, F. J. Local adaptive Galerkin bases. Nonlinear Anal. 2001, 47, 4961. (44) Varshney, A.; Pitchaiah, S.; Armaou, A. Feedback control of dissipative PDE systems using adaptive model reduction. AIChE J. 2009, 55, 906. (45) Tipping, M. E.; Bishop, C. M. Probabilistic principal component analysis. J. R. Stat. Soc. B 1999, 61 (Part 3), 611. (46) Tipping, M. E.; Bishop, C. M. Mixtures of probabilistic principal component analyzers. Neur. Comput. 1999, 11 (2), 443. (47) Narendra, K. S.; Balakrishnan, J.; Ciliz, M. K. Adaptation and learning using multiple models, switching, and tuning. IEEE Control Syst. Mag. 1995, 15 (3), 37. (48) Baruch, I. S.; Lopez, R. B.; Guzman, J. L. O.; Flores, J. M. A fuzzy-neural multi-model for nonlinear systems identification and control. Fuzzy Set Syst. 2008, 159 (20), 2650. (49) Banerjee, A.; Arkun, Y.; Ogunnaike, B.; Pearson, R. Estimation of nonlinear systems using linear multiple models. AIChE J. 1997, 43 (5), 1204. (50) Göttsche, Th. H.; Hunt, K. J.; Johansen, T. A. Nonlinear dynamics modeling via operating regime decomposition. Math. Comput. Simul. 1998, 46, 543. (51) Pal, N. R; Saha, S. Simultaneous Structure Identification and Fuzzy Rule Generation for Takagi−Sugeno Models. IEEE Trans. Syst. Man Cybern. B 2008, 38(6), 1626. (52) Boukhris, A.; Mourot, G.; Ragot, J. Non-linear dynamic system identification: A multi-model approach. Int. J. Control 1999, 72 (7−8), 591. (53) Eikens, B.; Karim, M. N. Process identification with multiple neural network models. Int. J. Control 1999, 72 (7−8), 576. 6821

dx.doi.org/10.1021/ie202613t | Ind. Eng. Chem. Res. 2012, 51, 6811−6822

Industrial & Engineering Chemistry Research

Article

(54) Leontaritis, I. J.; Billings, S. A. Input−output parametric models for non-linear systems. Part I: Deterministic non-linear systems. Int. J. Control 1985, 41 (2), 303. (55) Sjöberg, J.; Zhang, Q.; Ljung, L.; Benveniste, A.; Delyon, B.; Glorennec, P.; Hjalmarsson, H.; Juditsky, A. Nonlinear black-box modeling in system identification: A unified approach. Automatica 1995, 31 (12), 1691. (56) Nelles, O. Nonlinear System Identification: From Classical Approaches to Neural Networks and Fuzzy Models; Springer-Verlag: Berlin, 2001. (57) Christofides, P. D. Robust control of parabolic PDE systems. Chem. Eng. Sci. 1998, 53 (16), 2949.

6822

dx.doi.org/10.1021/ie202613t | Ind. Eng. Chem. Res. 2012, 51, 6811−6822