Hammerstein Modeling with Structure Identification for Multi-input Multi

Aug 2, 2011 - multi-output (MIMO) nonlinear industrial systems will be investigated with several ... fitting problems. A simple model is also suitable...
2 downloads 0 Views 4MB Size
ARTICLE pubs.acs.org/IECR

Hammerstein Modeling with Structure Identification for Multi-input Multi-output Nonlinear Industrial Processes Chenkun Qi,*,† Han-Xiong Li,§ Xianchao Zhao,† Shaoyuan Li,‡ and Feng Gao† †

School of Mechanical Engineering and ‡Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, China § Department of Manufacturing Engineering & Engineering Management, City University of Hong Kong, Hong Kong, China ABSTRACT: Hammerstein modeling with structure identification for multi-input multi-output (MIMO) nonlinear industrial processes is investigated in this study. The structure identification of the Hammerstein model is very challenging because the model terms are vectors, and some model terms are inputs of other model terms (i.e., model term coupling). An efficient model structure selection algorithm for the Hammerstein model is proposed with the multi-output locally regularized orthogonal least-squares (LROLS), A-optimality design, and a vector model term selection. To enhance the well-posedness of the regressors, estimation robustness, and model adequacy, the A-optimality criterion is integrated into the model error reduction criterion in the multi-output LROLS. To handle the vector model term coupling problem, a vector model term selection rule is synthesized into the multi-output LROLS. After the model structure is determined, to improve the robustness of the parameter estimation, the regularized leastsquares method with the singular value decomposition (RLS-SVD) is used. The simple or sparse Hammerstein model structure can be determined from the noisy process data. The structure identification algorithm only includes a few user-designed parameters which are easy to select. Therefore, the ability of automatic construction of the Hammerstein model is enhanced. Three application examples are used to illustrate the effectiveness of the proposed modeling approach, including the simple model structure, the satisfactory modeling accuracy, the robustness of the algorithm to the noise, and the easy selection of user-designed parameters.

1. INTRODUCTION Hammerstein models have been widely used in the identification of nonlinear industrial systems due to their significant modeling capability and simple block-oriented nonlinear structure, e.g., pH neutralization processes,1 distillation columns,1 and fluid catalytic cracking processes.2 There are many identification algorithms for Hammerstein models in the literature.310 See ref 11 for a classification of existing algorithms. However, many Hammerstein modeling methods only consider the parameter estimation with known structure and order. The model order selection problem has been discussed in some Hammerstein modeling literature7,10,1215 where the model structure is known. Only a few studies consider the model term and structure selection of a Hammerstein model. Though the nonlinear part with unknown structure can be estimated using least-squares support vector machines (LSSVMs),15 the structure of the linear part is known. Very often, the Hammerstein model structure is determined by trial-error or cross-validation. The systematic model structure identification or term selection of a Hammerstein model with a completely unknown structure is still an open problem. Specifically, a least-squares and singular value decomposition (LS-SVD) based method is proposed for HammersteinWiener systems5 and Hammerstein systems.10 The parameter estimation of LS-SVD becomes a linear regression and convex optimization problem; however the number of unknown parameters is often large, and therefore, well-conditioned estimation becomes an issue. Similarly, it also assumes that the model structure is known and only unknown parameters need to be estimated. In many cases, the structure is unknown, thus the model orders and terms have to be carefully determined that is usually a time-consuming process. r 2011 American Chemical Society

In this study, the Hammerstein modeling for multi-input multi-output (MIMO) nonlinear industrial systems will be investigated with several requirements: (1) Simple model structure. From the parsimonious principle, the model should be made as simple as possible to prevent the ill-conditioned and overfitting problems. A simple model is also suitable for control design. (2) Satisfactory modeling accuracy. (3) Robust modeling algorithm. This is because the modeling is often performed under highly noisy conditions. (4) Easy selection for user-designed parameters. To avoid a time-consuming process to select the model structure or user-designed parameters, a highly automatic model structure selection with easily designed parameters is desirable. The model term selection problem has been extensively studied for the linear regression model.1620 The main motivation is that a small number of important terms may be enough to describe the system, and too many model terms will lead to complex models and ill-conditioned problems. • For single-output regression with scalar model terms, orthogonal least-squares (OLS)21 is a fast and effective model selection algorithm. However, OLS can not entirely avoid the overfitting problem, especially under highly noisy conditions. To overcome this problem, regularized OLS (ROLS)22 has been proposed. To further enhance the model’s sparsity, a locally regularized orthogonal leastsquares (LROLS) algorithm has been proposed with individual regularization for each parameter.23 To enhance the well-posedness of the regressors, estimation robustness, Received: November 9, 2010 Accepted: August 2, 2011 Revised: August 2, 2011 Published: August 02, 2011 11153

dx.doi.org/10.1021/ie102273c | Ind. Eng. Chem. Res. 2011, 50, 11153–11169

Industrial & Engineering Chemistry Research

ARTICLE

Figure 1. MIMO Hammerstein model.

and model adequacy, the optimal experimental design with A- and D-optimality criteria have been used, e.g., A-optimality OLS,24 D-optimality OLS,25 A-optimality LROLS,26 and D-optimality LROLS.26,27 Among them, the A- and D-optimality LROLS are more automatic because there is only one user-specified weighting parameter which has no critical influence on the performance. • For multi-output regression with scalar model terms, there are two methods: multiple single-output regression19,28 or single multi-output regression. The latter one can result in a more sparse model with the same accuracy. Recently, the multi-output LROLS29 and the multi-output LROLS with D-optimality30 have been proposed. The existing model term selection methods for linear regression models only consider scalar model terms and also the model terms can not be inputs of other model terms. However, for the MIMO Hammerstein system, the model terms are vectors and some model terms are inputs of other model terms (i.e., model term coupling). Thus the existing model term selection methods can not be directly used here. The difficulty is that it is a complex structure identification problem of multi-output regression with vector and coupled model terms. Recently, a MIMO Hammerstein model selection has been proposed.31 Though the vector model term problem is considered, only a multiple single-output OLS algorithm is used. Therefore, the ill-conditioned and overfitting problem can not be fully avoided, and further studies to improve the model sparseness, estimation robustness, model adequacy, and automatic modeling capability are required. The standard OLS algorithm is also applied to Hammerstein systems;1 however, this algorithm may be incomplete since the model term coupling and vector model term selection problems are not considered. In this study, a Hammerstein modeling with structure identification is developed for MIMO nonlinear systems. First, to obtain a sparse MIMO Hammerstein model structure automatically from the noisy data, a model selection algorithm with the multi-output LROLS, A-optimality design, and a vector model term selection rule is proposed. Next, to improve the robustness of parameter estimation, the parameters are estimated using the regularized least-squares method and the singular value decomposition (RLS-SVD). Compared with the previous work,31 the main advantage of this method is that the model sparseness, estimation robustness, model adequacy, and automatic model construction can be improved. Though the model is first transformed into a widely studied general linear model (GLR), there are also some special problems for the MIMO Hammerstein modeling. The GLR modeling can not be directly used here. The main contribution is that a model structure identification method for the MIMO Hammerstein model is proposed with the multioutput LROLS, the A-optimality design, and a vector model term selection. The new ideas are the following: • To enhance the well-posedness of the regressors in the linear regression models, the A-optimality criterion is integrated into the model error reduction criterion in the multi-output LROLS for model term selection. Compared

Figure 2. Hammerstein modeling algorithm.

with the existing model selection methods of linear regression models, it is the first time to combine the A-optimality design into the multi-output LROLS. • To handle the vector model term coupling problem in the MIMO Hammerstein modeling (i.e., some model terms are inputs of other model terms), a vector model term selection rule is proposed. It is the first time to synthesize the vector model term selection into the LROLS for model structure identification. The rest of the paper is organized as follows. The MIMO Hammerstein model is described in section 2. In section 3, the modeling algorithm is introduced, where the model parametrization is presented in section 3.1 and the model structure selection and the parameter identification algorithm are given in sections 3.2 and 3.3, respectively. Section 4 contains two simulation examples and one experiment example. Finally, a few conclusions are presented in section 5.

2. MIMO HAMMERSTEIN MODEL The MIMO Hammerstein model considered in this study is shown in Figure 1, where a static nonlinear element f( 3 ): Rm f Rm is followed by a linear time-invariant (LTI) dynamical system G(q1) (a n  m transfer function matrix). The inputoutput relationship is then given by yðtÞ ¼ yðtÞ þ eðtÞ ¼ Gðq1 ÞvðtÞ þ eðtÞ ̅ ¼ Gðq1 Þf ðuðtÞÞ þ eðtÞ

ð1Þ

where u(t) ∈ Rm, y(t) ∈ Rn, v(t) ∈ Rm, e(t) ∈ Rn, and y(t) ∈ Rn represent the input, model (or predicted) output, intermediate variable, model error, and system (or measured) output at time t, respectively. Assume the model error e(t) has a finite variance σ2I with possible nonzero mean. Now the identification problem is to estimate this model including model structure and parameters from a process data set {u(t), y(t)}Lt=1, where L is the number of data points.

3. HAMMERSTEIN MODELING ALGORITHM As shown in Figure 2, the proposed Hammerstein modeling algorithm mainly includes: parametrization, sparse structure design, and parameter estimation, which will be discussed in this section. 3.1. Parameterization. In this study, assume that the LTI system with the transfer function G(q1) can be represented by a linear AutoRegressive with eXogenous input (ARX) model yðtÞ ¼ Aðq1 ÞyðtÞ þ Bðq1 ÞvðtÞ 11154

ð2Þ

dx.doi.org/10.1021/ie102273c |Ind. Eng. Chem. Res. 2011, 50, 11153–11169

Industrial & Engineering Chemistry Research

ARTICLE

where A(q1) and B(q1) are n  n and n  m matrix polynomials 1

ny

ð3Þ

Bðq1 Þ ¼ B1 q1 þ 3 3 3 þ Bnu qnu

ð4Þ

Aðq Þ ¼ A1 q

1

þ 3 3 3 þ Any q

Here Ai ∈ Rnn (i = 1, ..., ny) and Bi ∈ Rnm (i = 1, ..., nu) are unknown matrix parameters, nu and ny are the maximum input and output lags, respectively, and q1 is a time-shift operator (q1u(t) = u(t  1)). On the other hand, assume that the nonlinear static element f(u) can be approximated by vðtÞ ¼ f ðuÞ ¼

nf



i¼1

Ci f i ðuÞ

ð5Þ

where fi( 3 ): Rm f Rm (i = 1, ..., nf) are nonlinear basis functions such as polynomials, splines, radial basis functions, and wavelets,32 Ci ∈ Rmm (i = 1, ..., nf) are unknown matrix parameters, and nf is the number of basis functions. This kind of parametrization is used here because the ARX model can represent many linear dynamic systems and the basis functions expansion can approximate many nonlinear functions to an arbitrary accuracy under certain conditions. However, the idea in the modeling could be also applicable to other parametrization schemes. In the modeling, most identification algorithms assume the values of ny, nu, and nf are known. However, their true values are often unknown at the beginning of the identification due to incomplete process knowledge. Only the order selection may not provide a simple model structure since the model term selection is not considered. In fact, many terms in (35) are often redundant and can be removed from the model. This means that there exists the values of nys, nus, and nfs (nys < ny, nus < nu, and nfs < nf), such that the model 1

1

yðtÞ ¼ As ðq ÞyðtÞ þ Bs ðq ÞvðtÞ As ðq1 Þ ¼ Ai1 qi1 þ 3 3 3 þ Ainys qinys

v ¼ f s ðuÞ ¼ Ck1 f k1 ðuÞ þ 3 3 3 þ Cknfs f knfs ðuÞ

ð9Þ

can provide a satisfactory representation of the system, where ir ∈ {1, ..., ny}, jw ∈ {1, ..., nu} and kv ∈ {1, ..., nf}. Note that unlike traditional model selection, this MIMO Hammerstein model in (25) has multiple outputs and vector model terms. To determine a simple structure for the MIMO Hammerstein system, here a multi-output A-optimality LROLS algorithm with a vector model term selection is proposed. 3.2. Sparse Structure Design. 3.2.1. Multioutput Regression Model. Substituting (25) into (1), the inputoutput relationship can be written as nu

ny

nu

nf

∑ Aiy̅ ðt  iÞ þ j∑¼ 1 k∑¼ 1 Djk f kðuðt  jÞÞ þ eðtÞ i¼1

ð11Þ

Equation 11 can be further expressed as a linear regression form y̅ T ðtÞ ¼ ϕðtÞΘ þ eT ðtÞ

ð12Þ

where Θ ¼ ½A1 , :::, Any , D11 , :::, D1nf , :::, Dnu 1 , :::, Dnu nf T ∈ R nM n , ϕðtÞ ¼ ½y̅ T ðt  1Þ, :::, y̅ T ðt  ny Þ, f1 T ðuðt  1ÞÞ, :::, fnf T ðuðt  nu ÞÞ ∈ R1nM

and nM = nny + mnunf. The advantage is that the bilinear formulation (10) is transformed to a linear-in-the parameters one. Now the model structure design becomes easy since the linear model selection method can be used and the difficult nonlinear model selection can be avoided. In the modeling, the data size should be larger than the number of parameters, i.e., Ln g nMn. Under the fixed data size, the impact on the parameter convergence due to the increase of the parameters in (11) can be reduced with gradually removing the redundant parameters in the sparse structure selection process. Considering the L-point data set and further defining 2 3 2 3 2 3 y̅ T ð1Þ ϕð1Þ eT ð1Þ 6 7 6 7 6 7 7 6 7 6 7 Y ¼6 4 l 5, Φ ¼ 4 l 5, E ¼ 4 l 5 T T y̅ ðLÞ e ðLÞ ϕðLÞ the following equation can be given Y ¼ ΦΘ þ E

ð13Þ

Let an orthogonal decomposition of the regression matrix Φ be Φ ¼ WZ where

2

1 6 60 Z¼6 6l 4 0

ð7Þ ð8Þ

ny

y̅ ðtÞ ¼

ð6Þ

Bs ðq1 Þ ¼ Bj1 qj1 þ 3 3 3 þ Bjnus qjnus

y̅ ðtÞ ¼

Defining Djk = BjCk ∈ Rnm, then (10) can be written as

ð14Þ

z1, 2 1 3

3

3

333

333 3 33

3

0

3

3

z1, nM l

3

7 7 7 ∈ R nM nM znM  1, nM 7 5 1

and W ¼ ½w 1 3 3 3 w nM  ∈ R LnM which satisfies wiTwj = 0, if i 6¼ j. Then (13) can alternatively be expressed as Y ¼ WZΘ þ E ¼ WH þ E

ð15Þ

Knowing H and Z, Θ can be computed from ZΘ = H. For convenience, define the following expressions: Y ¼ ½y 1 , :::, y n  ∈ R Ln E ¼ ½e1 , :::, en  ∈ R Ln

nf

∑ Ai y̅ ðt  iÞ þ j∑¼ 1 k∑¼ 1 Bj Ckf kðuðt  jÞÞ þ eðtÞ i¼1

H ¼ ½h1 , :::, hn  ∈ R nM n hi ¼ ½h1i , :::, hnM i T

ð10Þ 11155

dx.doi.org/10.1021/ie102273c |Ind. Eng. Chem. Res. 2011, 50, 11153–11169

Industrial & Engineering Chemistry Research

ARTICLE

3.2.2. Multi-output LROLS with A-optimality Design. To better understand this structure selection algorithm, its two components are discussed first. For more details of its singleoutput version, see ref 26. A. Multi-output LROLS. The multi-output LROLS algorithm selects the significant model terms by minimizing the following regularized error criterion29 minfJ1 ¼ traceðET E þ H T ΛHÞ ¼ ¼

n

∑ ei

i¼1

T

ei þ

nM

n

n

ðei T ei þ hi T Λhi Þ ∑ i¼1

∑ ð ∑ hji ÞRjg

updating formulas:29 n

Rj new ¼

j¼1 i¼1

γj ¼

where r = [R1, ..., RnM] is the regularization parameter vector, and Λ = diag{R1, ..., RnM}. Note that the parameter estimation of (16) satisfies WTY = (WTW + Λ)H, then criterion (16) can be easily expressed as traceðET E þ H T ΛHÞ ¼ traceðY T Y  H T ðW T W þ ΛÞHÞ nM

n

n

y i T y i  ∑ ð ∑ hji 2 Þðw j T w j þ Rj Þ ∑ j¼1 i¼1 i¼1

ð17Þ

Normalizing (17) by trace(YTY) yields traceðET E þ H T ΛHÞ ¼ 1 traceðY T Y Þ

nM

∑ j¼1

ð

n

∑ hji 2 Þðwj T wj þ RjÞ i¼1 traceðY T Y Þ ð18Þ

Thus the regularized error reduction ratio (RERR) due to the regressor wl is defined as ð ½rerrl ¼

n

∑ hli2Þðwl T wl þ Rl Þ i¼1 traceðY T Y Þ

ð19Þ

On the basis of this ratio, significant regressors can be selected in a forward-regression procedure.29 At the lth step, a regressor with the largest [rerr]l among the remaining nM  l + 1 candidates is selected as the lth model term, and the selection is stopped at the nMsth step if 1

nMs

∑ ½rerrl < ξ

n

∑ hji 2 i¼1

, 1 e j e nM

ð21Þ

wj T wj Rj þ w j T w j

ð22Þ

and

T

¼

L  γold

∑ ei T ei i¼1

where

ð16Þ

2

γj

old

ð20Þ

l¼1

is satisfied, where 0 < ξ < 1 is a chosen tolerance. Thus a sparse model with nMs (e nM) significant model terms is obtained. The more detailed algorithm can be found in ref 29. To obtain a good model structure, the initial model order should be large enough, e.g., a high-order ARX model can be used. The proposed method is valid for a high-order model and in fact the condition nM e L can be removed. Though there are many regressors or parameters in the identification with a high-order model, the illconditioned problem does not happen because significant regressors are selected one by one in a forward-regression procedure. On the basis of the Bayesian evidence procedure,33 the regularization parameters can be “optimized” using the following

γ¼

nM

∑ γj j¼1

ð23Þ

Usually a few iterations can find a good r. In addition, the choice of ξ is less critical than in the original OLS algorithm because the sparsity is enforced by multiple regularizers.26 For example, if ξ is too small, those unimportant terms will have very large Rl which make their weights to zero. Nevertheless, an appropriate value for ξ is desired, which can be determined by Akaike information criterion (AIC) or other criteria. However, these criteria only penalize the model complexity but do not penalize the regressor that may cause poor model performance (e.g., too large variance of parameter estimate or ill-posedness of the regression matrix). Optimal experimental design criteria provide better solution to improve well-posedness of the regressors, parameter robustness, and model efficiency. B. A-Optimality Design. The least-squares (LS) estimate of Θ ^ = (ΦTΦ)1ΦTY, where ΦTΦ is in (13) can be obtained as Θ called the design matrix in experimental design. When model (13) represents the true system, e(t) ∼ N(0,σ2I) and ΦTΦ is ^ is unbiased and the covariance nonsingular, the estimate Θ matrix of the estimate is determined by the design matrix: ( ^Þ ¼ Θ EðΘ ð24Þ ^ Þ ¼ ðΦT ΦÞ1 σ 2 covðΘ ^ is given by The mean square error (MSE) of the parameter estimate Θ ^ ÞÞ ¼ σ2 traceððΦT ΦÞ1 Þ ¼ σ2 traceðcovðΘ

nM

∑ i ¼ 1 λi 1

ð25Þ

M are the eigenvalues of ΦTΦ. Note that for an illwhere {λi}ni=1 M can be very conditioned design matrix, the minimum of {λi}ni=1 ^ small so that the MSE of Θ will be very large. Fortunately, the illposedness of the design matrix can be controlled by the experimental design. Then the variance of parameter estimate can be reduced and the model adequacy can be improved. The optimum experimental design based on the A-optimality criterion minimizes (25) directly ( ) nM 1 2 ^ÞÞ ¼ σ ð26Þ min J2 ¼ traceðcovðΘ i ¼ 1 λi



Note that the criterion based on J2 involves intensive computation of the eigenvalues of ΦTΦ. Thus another simple and computationally fast A-optimality design criterion is often 11156

dx.doi.org/10.1021/ie102273c |Ind. Eng. Chem. Res. 2011, 50, 11153–11169

Industrial & Engineering Chemistry Research

ARTICLE

Figure 3. Vector model term selection.

used which minimizes the variance of the auxiliary parameter estimate H ( ) nM 1 2 ^ min J3 ¼ traceðcovðHÞÞ ¼ σ ð27Þ i ¼ 1 ki



where ki = wiTwi. Due to ZΘ = H, it can be assumed that to penalize the large variance of the auxiliary parameter H will also consequently penalize the large variance of the parameter Θ.24,26 C. Multioutput LROLS with A-optimality Design. The multioutput LROLS with A-optimality design algorithm is based on minimizing the following combined criterion J ¼ J1 þ β1 J3 ¼ þβ

nM

n

nM

n

∑ yTi yi  j∑¼ 1ði∑¼ 1 h2ji ÞðwTj wj þ RjÞ i¼1

∑ T i ¼ 1 wj wj 1

ð28Þ

where β = β1σ2 is a small positive number. In this algorithm, the updating of the model weights and regularization parameters is exactly as in the LROLS algorithm, but the model selection is according to the new regularized error reduction ratio ð ½crerrl ¼

n



i¼1

h2li Þðw l T w l þ Rl Þ  β traceðY T Y Þ

y̅ ðtÞ ¼

1 wl T wl

nys

nus

nfs

∑ Ai y̅ ðt  ir Þ þ w∑¼ 1 v∑¼ 1 Bj Ck f k ðuðt  jw ÞÞ þ eðtÞ r¼1 r

w

v

v

ð31Þ

ð29Þ

and the selection is terminated with an nMs-term model when ½crerrl e 0, for nMs þ 1 e l e nM

Algorithm 1: Structure Design Step 1: Initialization. Set the initial values of Ri (i = 1, ..., nM) to the same small positive value (e.g., 0.001) and choose a fixed β. Set iteration index k = 1. Step 2: Given the current r, select a subset model with Is terms using the forward regression based on [crerr]l. Step 3: Select a Hammerstein model structure with nMk terms using vector model term selection. Step 4: Update r using (2123) with nM = nMk. If r remains sufficiently unchanged in two successive iterations or a preset maximum iteration number is reached, stop; otherwise, set k = k + 1 and go to step 2. As a result, the model structure of linear and nonlinear parts of the Hammerstein model can be determined as in (69). This multi-output LROLS with A-optimality design offers an efficient procedure to construct a sparse and adequate MIMO Hammerstein model structure with good performance. • Compared with the LROLS, the model selection procedure is simplified and it is no longer necessary to specify the tolerance ξ, as the algorithm automatically terminates when condition (30) is reached. • Unlike the OLS with A-optimality design,24 the value of weighting β does not critically influence the performance, and β can be chosen with ease from a large range of values. This is demonstrated in the modeling examples. • Compared with the single-output LROLS with A-optimality design, this algorithm is a multi-output one which is suitable for the MIMO Hammerstein model with the help of a vector model term selection. 3.3. Parameter Estimation. Substituting (69) into (1) will give

ð30Þ

Since the tolerance ξ is no longer required and the parameter β and initial values of r can be easily selected as shown in the simulation examples, this algorithm can detect a sparse model structure in an automatic manner. 3.2.3. Vector Model Term Selection. Now define a combination of the selected term index as Is. For the Hammerstein model (11), the index sets of significant terms Iys = {i1, ..., inys}, Ius = {j1, ..., jnus}, and Ifs = {k1, ..., knfs} will be determined by Is. The selection rule is that for each i ∈ {1, ..., ny}, j ∈ {1, ..., nu}, and k ∈ {1, ..., nf}, if any element of the corresponding model term vector y(t  i) ∈ Rn and fk(u(t  j)) ∈ Rm belongs to Is, then the corresponding i, j, and k are selected as the elements of Iys, Ius, and Ifs respectively. For example, if the significant terms of yw(t  i) in Is look like those in Figure 3, then Iys = {2, 4, 6}. This rule is relatively conservative because a whole column will be selected if any element in that column is significant. However, it is used here because preserving a small set of redundant terms is often better and also more reasonable than deleting some significant terms. The iterative model selection procedure can now be summarized as follows:

Now the problem is to estimate unknown parameter matrices Air (r = 1, ..., nys), Bjw (w = 1, ..., nus), and Ckv (v = 1, ..., nfs) from the data set {u(t), y(t)}Lt=1. Define Djwkv = BjwCkv ∈ Rnm, then (31) can be rewritten as y̅ ðtÞ ¼

nys

∑ Ai y̅ ðt  ir Þ r¼1 r

þ

nus

nfs

∑ ∑ Dj k f k ðuðt  jw ÞÞ þ eðtÞ w¼1 v¼1 w v

v

ð32Þ

Obviously, (32) can be expressed as a linear regression form y̅ T ðtÞ ¼ ϕs ðtÞΘs þ eT ðtÞ

ð33Þ

where Θs ¼ ½Ai1 , :::, Ainy , Dj1 k1 , :::, Dj1 knfs , :::, Djnus k1 , :::, Djnus knfs T ∈ RnMs n

ð34Þ ϕs ðtÞ ¼ ½y̅ T ðt  i1 Þ, :::, y̅ T ðt  iny Þ, f k1 ðuðt  j1 ÞÞ, :::, f knf ðuðt  jnu ÞÞ ∈ R nMs

ð35Þ and nMs = nnys + mnusnfs. After a sparse Hammerstein model structure as in (69) is determined using Algorithm 1 in section 3.2, the parameter 11157

dx.doi.org/10.1021/ie102273c |Ind. Eng. Chem. Res. 2011, 50, 11153–11169

Industrial & Engineering Chemistry Research

ARTICLE

estimation in (31) can then be performed based on the RLS-SVD algorithm (See Appendix). Unlike the LS-SVD algorithm,5 the regularized least-squares is used here to improve the robustness of the parameter estimation. Algorithm 2: Parameter Estimation. ^s in (33), and then Step 1: Compute the least-squares estimate Θ ^ ^ obtain Air (r = 1, ..., nys) and Djwkv (w = 1, ..., nus, v = 1, ..., nfs) from ^s as in (34). Θ ^ using D ^ j k as in (55) and then Step 2: Construct the matrix D w v ^ as in (56) and the compute the economy-size SVD of D partition of this decomposition as in (59) (see the Appendix). ^ = V1Σ1, respectively, ^ as B ^ and C ^ = U1 and C Step 3: Compute B ^j and then obtain the estimates of the parameter matrices B w ^ k (v = 1, ..., nfs) from B ^ as in (53) and ^ and C (w = 1, ..., nus) and C v (54) (see the Appendix). Finally, the estimated Hammerstein model can be used in simulation mode for the dynamics prediction as below ^y ðtÞ ¼

nys

Figure 4. Random multilevel input signals for the training.

nfs

nus

∑ A^ i ^yðt  ir Þ þ w∑¼ 1 B^j v∑¼ 1 C^ k f k ðuðt  jw ÞÞ r¼1 r

w

v

v

4. APPLICATIONS In order to evaluate the proposed modeling method, three examples are studied. The first one is a Hammerstein system, the second is a pH neutralization process, and the last one includes an experiment on a snap curing oven thermal process. 4.1. Hammerstein System. Consider a nonlinear two-input two-output process described by a Hammerstein system as follows:34 Figure 5. Random multilevel input signals for the testing.

v ¼ f ðuÞ

ð37Þ

where "

# " # 0 0:6 0 , A2 ¼ , 1:72 0 0:738 " # " # 0:162 0:038 0:112 0:038 , B2 ¼ , B1 ¼ 0:1408 0:2408 0:1408 0:2228 " # 3 þ 8=ð1 þ e3u1 Þ  7 f ðuÞ ¼ 10½3 þ 8=ð1 þ e3u1 Þ  7½ð10u2 þ 50Þ=ð10u2 þ 51Þ  50=51 A1 ¼

performance, a new set of 1000 noise-free data is collected as the testing data with the inputs as shown in Figure 5. In the modeling, the initial orders for the linear part of the Hammerstein model are set to ny = 3 and nu = 5. The radial basis functions fk(u) = exp{ u  ck 22/2σ2} with the centers ck uniformly distributed in the (2, 2) and the width σ = 0.2 are selected as the basis functions of the nonlinear part, where k denotes the kth radial basis function (k = 1, ..., nf, nf = 20) and ck = [2 + (k  1)  4/19]  I. The initial regularization parameter r and the parameter β are set to be r = 1  103 and β = 5  103. The settings of these parameters are initialized based on a priori information or experience and, then, determined or adjusted using the cross-validation method according to the modeling performance on the validation data. Starting with this initial model, the structure design algorithm in section 3.2 leads to the following compact Hammerstein model structure

1:55 0

In the modeling, the random multilevel input signals are used to excite this process. The sampling interval is Δt = 1. The 2000 training samples for input signals with the simulation time 2000 are shown in Figure 4, where the first 1800 samples are estimation input and the remaining 200 samples are validation input. Then the noise-free output data are obtained from (3637), and the white noise with zero mean and standard deviation σ = [0.35, 0.35]T is added additively to the noise-free data to obtain a total of 2000 noisy output measurements, where the first 1800 data are used for model estimation and the last 200 data for model validation. The validation data is used to monitor the estimation process and help to determine the initial model size ny, nu, nf and the type of nonlinear basis function fi( 3 ) in (35), the initial regularization parameter r and the parameter β using the cross-validation method. To verify the model

)

ð36Þ

)

yðtÞ ¼ A1 yðt  1Þ þ A2 yðt  2Þ þ B1 vðt  1Þ þ B2 vðt  2Þ

yðtÞ ¼ ðA1 q1 þ A2 q2 ÞyðtÞ þ ðB1 q1 þ B2 q2 ÞvðtÞ ð38Þ vðtÞ ¼ f s ðuÞ¼ C1 f 1 þ C5 f 5 þ C9 f 9 þ C12 f 12 þ C16 f 16 þ C20 f 20

ð39Þ

Then the unknown parameters are estimated using the parameter estimation algorithm in section 3.3. The measured output y(t) for training the model and the predicted output ^y(t) of the Hammerstein model on the training data are presented in Figure 6 (for y1(t)) and Figure 7 (for y2(t)). 11158

dx.doi.org/10.1021/ie102273c |Ind. Eng. Chem. Res. 2011, 50, 11153–11169

Industrial & Engineering Chemistry Research

ARTICLE

Figure 6. Measured output for the training and predicted output of Hammerstein model on the training data: y1(t).

Figure 8. Measured output for the testing and predicted output of Hammerstein model on the testing data: y1(t).

Figure 7. Measured output for the training and predicted output of Hammerstein model on the training data: y2(t).

Figure 9. Measured output for the testing and predicted output of Hammerstein model on the testing data: y2(t).

The measured output for the testing and the predicted output on the testing data are also illustrated in Figure 8 (for y1(t)) and Figure 9 (for y2(t)). It is shown that the Hammerstein model can capture the dominant dynamics of original system. The actual and estimated nonlinear parts of the Hammerstein system are presented in Figure 10 (for f1(u)) and Figure 11 (for f2(u)), where u1 is set equal to u2 for simplicity. The estimated function can approximate the actual nonlinear function well. Here the estimated f1(u) and f2(u) are linearly scaled to the range of actual functions because the parametrization (38) and (39) is not unique, since any parameter vectors kB and k1C, for some nonzero constant k, provide the same inputoutput equation. For different parameters β and initial values of r in (28), the root of mean squared error RMSE = (∑e(t)2/∑Δt)1/2 of the model on the estimation, evaluation, and testing data are shown in Tables 1 and 2. The modeling errors are insensitive to these parameters since there are a wide range of parameters which can be chosen to obtain a good performance. To test the robustness of the modeling with respect to the measurement noise, the independent white noise with different standard deviations σ is added to the noise-free simulation data. It is found that the selected model structure is unchanged even the noise is increased up to σ = 3  [0.35, 0.35]T. The modeling performance on the estimation and validation data with this noise is shown in Figure 12 (for y1(t)) and Figure 13 (for y2(t)). In fact, as shown in Figure 14 (for y1(t)) and Figure 15 (for y1(t)), the model performance at high noise (σ = 3  [0.35, 0.35]T) is

Figure 10. Actual and estimated nonlinear function of the Hammerstein system: f1(u).

similar to the model obtained with the low noise (σ = [0.35, 0.35]T). This means that the model can still capture the dominant dynamics under the highly noisy conditions. This is benefited from the efficient and reliable structure design and parameter estimation algorithms. The theoretical analysis on the tolerance to the noise level may be needed in the future. 4.2. pH Neutralization Process. The considered pH neutralization process35 is schematically depicted in Figure 16. The process consists of an acid (HNO3) stream, a base (NaOH) 11159

dx.doi.org/10.1021/ie102273c |Ind. Eng. Chem. Res. 2011, 50, 11153–11169

Industrial & Engineering Chemistry Research

ARTICLE

Figure 11. Actual and estimated nonlinear function of the Hammerstein system: f2(u).

Figure 12. Measured output and predicted output of Hammerstein model obtained under high noise: y1(t).

Table 1. RMSE of Hammerstein Models: A-LROLS Structure Design with Different β

y1(t)

y2(t)

β = 101

β = 102

β = 103

β = 104

estimation

0.16102

0.16558

0.17156

0.17706

validation

0.16978

0.17850

0.17638

0.15692

testing

0.16534

0.16078

0.15758

0.16940

estimation

0.11454

0.12074

0.12650

0.11638

validation

0.09208

0.12014

0.11108

0.13022

testing

0.09076

0.10658

0.09584

0.09406

Table 2. RMSE of Hammerstein Models: A-LROLS Structure Design with Different r

y1(t)

y2(t)

r = 101

r = 102

r = 103

r = 104

estimation

0.17910

0.17590

0.17156

0.17900

validation

0.19416

0.18830

0.17638

0.18280

testing estimation

0.16684 0.12532

0.16164 0.11778

0.15758 0.12650

0.16000 0.11632

validation

0.12796

0.13434

0.11108

0.10828

testing

0.11094

0.10266

0.09584

0.08608

Figure 13. Measured output and predicted output of Hammerstein model obtained under high noise: y2(t).

" f ðxÞ ¼

#T

1=2

cv x  1 A

0

2

1 6 6A 6 1 gðxÞ ¼ 6 6 Ax1 ðwaa  x2 Þ 6 4 1 ðwab  x3 Þ Ax1

stream, and a buffer (NaHCO3) stream that are mixed in a continuously stirred tank reactor (CSTR). The inputs to the system are the acid flow rate Fa, the base flow rate Fb, and the buffer flow rate Fbf, while the output is the liquid level h and the pH. In this study, the buffer flow rate Fbf is considered as a disturbance, thus the process is a two-input twooutput system. The system is highly nonlinear due to the implicit output equation, known as titration curve (46). The simulation model, based on first principles, is given as follows:35

 pðxÞ ¼

1 A

ð42Þ

0 3 1 7 A 7 7 1 ðwba  x2 Þ 7 7 Ax1 7 5 1 ðwbb  x3 Þ Ax1

1 1 ðwbfa  x2 Þ ðwbfb  x3 Þ Ax1 Ax1

c1 ðx, yÞ ¼ y1  x1

ð43Þ

T ð44Þ ð45Þ

c2 ðx, yÞ ¼ x2  10y2 þ 10y2  14

dx ¼ f ðxÞ þ gðxÞu þ pðxÞd dt

ð40Þ

cðx, yÞ ¼ 0

ð41Þ

þ x3

where x = [h wa wb]T, u = [Fa Fb]T, d = Fbf, y = [h pH]T, c = [c1 c2]T,

1 þ 2  10y2  pk2 1 þ 10pk1  y2 þ 10y2  pk2

ð46Þ

The meanings of the parameters and their nominal values in a typical operating conditions are given in Table 3.35 In the modeling, the random multilevel input signals are used. More specifically, the exciting inputs (see Figures 17 and 18 for 11160

dx.doi.org/10.1021/ie102273c |Ind. Eng. Chem. Res. 2011, 50, 11153–11169

Industrial & Engineering Chemistry Research

ARTICLE

Figure 16. pH neutralization process.

yðtÞ ¼ ðA1 q1 þ A2 q2 ÞyðtÞ þ ðB1 q1 þ B2 q2 þ B3 q3 ÞvðtÞ

ð47Þ

vðtÞ ¼ f s ðuÞ ¼ C1 f 1 þ C5 f 5 þ C8 f 8 þ C11 f 11 þ C15 f 15

ð48Þ

1000 samples) are selected as u1 ðtÞ ¼ 16 þ 4randn=3, u2 ðtÞ ¼ 16 þ 4randn=3

)

)

where randn is a normally distributed random function with mean zero and standard deviation of one. Since these input signals include many different frequencies and amplitudes, they are suitable to excite nonlinear dynamics of the system.36 The disturbance is set as d(t) = 0.55 + d0randn, where d0 is a constant parameter. A noise-free data set of 1000 data is collected from (40) and (41). The sampling interval Δt is 1, and the simulation time is 1000. The white noise with zero mean and standard deviation σ = [0.01, 0.03]T is added to the noise-free data to obtain the noisy output. This set of data is used as the training data with the first 800 data as the estimation data and the next 200 data as the validation data. The validation data is used to monitor the training process and help to determine the initial model size ny, nu, nf and the type of nonlinear basis function fi( 3 ) in (35), the initial regularization parameter r and the parameter β using the cross-validation method. To test the model performance, a new set of 1000 noise-free data is collected as the testing data. In the modeling, the initial model size for the linear part of the Hammerstein model is selected as ny = 3 and nu = 4. The radial basis functions fk(u) = exp{ u  ck 22/2σ2} with the centers ck uniformly distributed in the (12, 20) and the width σ = 20 are selected as the basis functions of the nonlinear part, where k

11161

)

)

)

)

)

)

)

)

Figure 15. Predicted output of Hammerstein models obtained under low and high noise: y2(t).

)

where f1(u) = exp{ u  12  I 22/2(20)2}, f5(u) = exp{ u  14.3  I 22/2(20)2}, f8(u) = exp{ u  16  I 22/2(20)2}, f11(u) = exp{ u  17.7  I 22/2(20)2}, and f15(u) = exp{ u  20  I 22/2(20)2}. Then, the parameter estimation algorithm in section 3.3 is used to estimate the unknown parameters. Note that in the structure identification stage, ck can also be selected randomly within a range, and then, the model selection algorithm will select a set of significant radial basis functions. Because ck are optimized in the structure design stage, they do not need further change in the parameter estimation stage. First, the measurement noise is set to σ = [0.01, 0.03]T and the stochastic process disturbance is set to d0 = 0. The measured output y(t) for the model training are presented in Figure 19 (for y1(t)) and Figure 20 (for y2(t)). To test the model performance, the measured output for the testing and the predicted output on the testing data are given in Figure 21 (for y1(t)) and Figure 23 (for y2(t)). Correspondingly the prediction errors are shown in Figure 22 (for e1(t)) and Figure 24 (for e2(t)). Obviously, the dynamics of original system can be identified by the proposed Hammerstein modeling approach. The estimated nonlinear parts f(u) of the Hammerstein model are also presented in Figure 25 (for f1(u)) and Figure 26 (for f2(u)). Though so far it is very difficult to check their physical correctness due to limited knowledge, the satisfactory modeling performance means that the estimated nonlinear functions should be reasonable on the model approximation. The physical correctness has to be checked using more first-principle knowledge or special designed experiments. Because this study is to establish satisfactory model for control, a further check will be conducted in the future. To study the effect of the higher measurement noise, it is set to σ = 2  [0.01, 0.03]T. The measured output y2(t), predicted output ^y2(t), and the prediction error e2(t) on the testing data are shown in Figures 27 and 28, respectively. The modeling is still satisfactory though there is a little bit larger error. Of course, if the )

Figure 14. Predicted output of Hammerstein models obtained under low and high noise: y1(t).

denotes the kth radial basis function (k = 1, ..., nf, nf = 15) and ck = [12 + (k  1)  (20  12)/14]  I. The initial regularization parameter r and the parameter β are designed as r = 1  103 and β = 1  103. These parameters are often initialized based on a priori information or experience and, then, adjusted according to the modeling performance on the validation data using the cross-validation method. With this initial model structure, the following sparse Hammerstein model structure are obtained using the structure design algorithm in section 3.2

dx.doi.org/10.1021/ie102273c |Ind. Eng. Chem. Res. 2011, 50, 11153–11169

Industrial & Engineering Chemistry Research

ARTICLE

Table 3. Meanings and Nominal Values of Parameters for the pH Neutralization Process parameter

nominal value

definition

A

207 cm2

cross-sectional area of a CSTR

cv

8.75 mL cm1 s1

valve coefficient

pk1

6.35

first disassociation constant of the weak acid H2CO3

pk2

10.25

second disassociation constant of the weak acid H2CO3

waa

3  103

charge invariant for the acid stream

wab

0

carbonate invariant for the acid stream

wba

3.05  103

charge invariant for the base stream

wbb wbfa

5  105 3  102

carbonate invariant for the base stream charge invariant for the buffer stream

wbfb

3  102

carbonate invariant for the buffer stream

Figure 17. Random multilevel input signal u1(t) in the modeling.

Figure 19. Measured output for the training: y1(t).

Figure 18. Random multilevel input signal u2(t) in the modeling.

Figure 20. Measured output for the training: y2(t).

measurement noise is too large, the modeling will become unacceptable. The results on y1(t) are very similar which are not shown here for simplicity. The stochastic process disturbance d(t) = 0.55 + d0randn also has some effect on the modeling performance. Now the measurement noise is fixed as σ = [0.01, 0.03]T, the stochastic disturbance is changed to different levels d0. The error index RMSE of the model on the estimation, evaluation, and testing data are shown in Table 4. The modeling errors change a little when the disturbance d0 e 0.4. The modeling performance will deteriorate rapidly if the disturbance becomes too large (e.g., d0 = 0.6).

The modeling errors RMSE with different parameters β and initial values of r in (28) are shown in Tables 5 and 6. Here the measurement noise is set to σ = [0.01, 0.03]T and the stochastic process disturbance is set to d0 = 0.2. It can be seen that the modeling errors are insensitive to these parameters, and there are a wide range of parameters for a good performance which make their selection become easy. The proposed A-optimality LROLS structure design is also compared with another structure design: the OLS.31 Figure 29 (for e1(t)) and Figure 30 (for e2(t)) display the prediction error e(t) = y(t)  ^y(t) of the corresponding two models on the testing data, where the solid line corresponds to the A-optimality LROLS and the dashed line to the OLS. Table 7 shows the 11162

dx.doi.org/10.1021/ie102273c |Ind. Eng. Chem. Res. 2011, 50, 11153–11169

Industrial & Engineering Chemistry Research

Figure 21. Measured output for the testing and predicted output of Hammerstein model on the testing data: y1(t).

ARTICLE

Figure 24. Prediction error of Hammerstein model on the testing data: e2(t).

Figure 22. Prediction error of Hammerstein model on the testing data: e1(t).

Figure 25. Estimated nonlinear function of the Hammerstein model: f1(u).

Figure 23. Measured output for the testing and predicted output of Hammerstein model on the testing data: y2(t).

Figure 26. Estimated nonlinear function of the Hammerstein model: f2(u).

modeling error RMSE on the estimation, evaluation, and testing data. They both show that the A-optimality LROLS algorithm can obtain a more accurate model for this process because a more reliable and suitable model structure can be selected. The comparison with another Hammerstein modeling: the RLS-SVD algorithm without model structure selection is also studied. The model order is set as n y = 2, n u = 2, and nf = 3. The modeling errors on the testing data are shown in Figure 31 (for e1 (t)) and Figure 32 (for e2 (t)). It can be observed that the RLS-SVD algorithm may lead to large

model errors due to the poor model structure, while the proposed structure design method can largely improve the modeling performance. The proposed Hammerstein modeling is also compared with a 10th-order linear state space model obtained using the subspace identification method (see MATLAB System Identification Toolbox). The prediction errors on the testing data in Figure 33 (for e1(t)) and Figure 34 (for e2(t)) show that the Hammerstein model can achieve a better performance due to its nonlinear modeling capability. Simulations with higher-order linear model also show similar results. 11163

dx.doi.org/10.1021/ie102273c |Ind. Eng. Chem. Res. 2011, 50, 11153–11169

Industrial & Engineering Chemistry Research

ARTICLE

Table 5. RMSE of Hammerstein Models: A-LROLS Structure Design with Different β

y1(t)

y2(t)

Figure 27. Measured output for the testing and predicted output of Hammerstein model on the testing data: y2(t).

β = 102

β = 103

β = 104

β = 105

estimation

0.031

0.029

0.029

0. 029

validation testing

0.031 0.016

0.030 0.015

0.035 0.014

0. 038 0. 015

estimation

0.138

0.145

0.150

0. 152

validation

0.140

0.139

0.149

0. 154

testing

0.097

0.098

0.098

0. 106

Table 6. RMSE of Hammerstein Models: A-LROLS Structure Design with Different r

y1(t)

y2(t)

r = 102

r = 103

r = 104

r = 105

estimation

0.029

0.029

0.029

0.027

validation

0.030

0.030

0.029

0.027

testing

0.013

0.015

0.016

0.012

estimation validation

0.151 0.140

0.145 0.139

0.156 0.147

0.155 0.147

testing

0.099

0.098

0.105

0.108

Figure 28. Prediction error of Hammerstein model on the testing data: e2(t).

Table 4. RMSE of Hammerstein Models with Different Stochastic Process Disturbance d0 d0 = 0.4

d0 = 0.6

estimation

0.028

0.029

0.033

0.043

validation testing

0.030 0.014

0.030 0.015

0.032 0.016

0.041 0.024

estimation

0.143

0.145

0.147

0.337

validation

0.135

0.139

0.144

0.341

testing

0.093

0.098

0.101

0.338

4.3. Snap Curing Oven. In the semiconductor back-end packaging process, the snap curing oven (Figure 35) is important equipment to provide the required curing temperature. After epoxy die attaches, the bonded leadframe to be cured is moved in and out from the inlet and outlet. It is equipped with four heaters (h1h4) and four temperature sensors (s1s4). For better control, a temperature model for the oven is required. In the experiment, the random multilevel input signals are used to excite this thermal process because their rich frequency and amplitude variation are suitable to excite nonlinear dynamics in the systems as shown in many successful applications.3741 As an example, the total 1700 samples for heaters 1 and 4 are shown in Figures 36 and 37. Correspondingly, a total of 1700 output measurements are collected with a sampling interval Δt = 10 s. A total of 1200 measurements from four sensors are used to train the model. The last 500

Figure 29. Prediction error of Hammerstein models on the testing data with A-LROLS and OLS structure design: e1(t).

measurements from sensors are chosen to test the model for untrained data. In the modeling, the initial model size for the linear part of the Hammerstein model is selected as ny = 3 and nu = 5. The radial basis functions fk(u) = exp{ u  ck 22/2σ2} (k = 1, ..., nf, nf = 10) with the centers ck uniformly distributed in the (0, 0.28) and the width σ = 0.5 are selected as the basis functions of the nonlinear part. The initial regularization parameter r and the parameter β are designed as r = 3  103 and β = 1  102. Using the structure design algorithm, the following Hammerstein model structures are obtained )

y2(t)

d0 = 0.2

)

y1(t)

d0 = 0

yðtÞ ¼ ðA1 q1 þ A2 q2 ÞyðtÞ þ ðB1 q1 þ B2 q2 þ B4 q4 ÞvðtÞ

ð49Þ

vðtÞ ¼ f s ðuÞ ¼ C1 f 1 þ C6 f 6 þ C7 f 7 þ C10 f 10

ð50Þ

After the training using the first 1200 data points, a process model can be obtained with the significant performance such as y1 in Figure 38 and y4 in Figure 39. The RMSE of the model is about 1.3. This is acceptable since the sensor noise is about (1. 11164

dx.doi.org/10.1021/ie102273c |Ind. Eng. Chem. Res. 2011, 50, 11153–11169

Industrial & Engineering Chemistry Research

ARTICLE

Figure 30. Prediction error of Hammerstein models on the testing data with A-LROLS and OLS structure design: e2(t).

Figure 32. Prediction error e2(t) of Hammerstein models on the testing data: with structure design (A-LROLS RLS-SVD) and without structure design (RLS-SVD).

Table 7. RMSE of Hammerstein Models: A-LROLS and OLS Structure Design A-LROLS y1(t)

y2(t)

OLS

estimation

0.029

0.047

validation

0.030

0.046

testing estimation

0.015 0.145

0.043 0.259

validation

0.139

0.269

testing

0.098

0.208

Figure 33. Prediction error e1(t) of Hammerstein model and linear model on the testing data.

Figure 31. Prediction error e1(t) of Hammerstein models on the testing data: with structure design (A-LROLS RLS-SVD) and without structure design (RLS-SVD).

The estimated nonlinear parts f(u) of the Hammerstein system are presented in Figure 40, where u1, u2, u3, and u4 are set to be equal for simplicity. The nonlinearities could come from actuators, conduction, and convention, etc. The modeling results show that they can be approximated by the input nonlinearity of Hammerstein system. It means that the system gain will change with the inputs as in Figure 40. The obtained Hammerstein model is also compared with a 15th-order linear state space model which is estimated using the subspace identification method. As shown in Figures 41 and 42, the Hammerstein model can achieve better performance due to its nonlinear modeling capability. Actually, the RMSE of this linear model is about 2, which is higher than the Hammerstein model.

Figure 34. Prediction error e2(t) of Hammerstein model and linear model on the testing data.

Compared with the previous examples, this system actually has more unmeasured stochastic disturbances, such as external environment changes, complex boundary conditions, material motion, and gas flow. The dynamics is complicated because of the distributed temperature nature and nonlinear conduction and convection. This system also has more number of inputs and outputs. Though the modeling and experiment complexity is higher than the previous ones, it still works well. 11165

dx.doi.org/10.1021/ie102273c |Ind. Eng. Chem. Res. 2011, 50, 11153–11169

Industrial & Engineering Chemistry Research

ARTICLE

Figure 35. Snap curing oven system.

Figure 38. Measured output for modeling and predicted output of Hammerstein model on the training and testing data: y1(t).

Figure 36. Input signal u1(t) of heater 1 in the experiment.

Figure 39. Measured output for modeling and predicted output of Hammerstein model on the training and testing data: y4(t).

Figure 37. Input signal u4(t) of heater 4 in the experiment.

Of course, this study only provides a simple and effective MIMO Hammerstein modeling approach for some industrial processes. For a more complex cases, the proposed modeling could need further check and improvement. For example, if the number of inputs and outputs is very large, the modeling will become more difficult because it will have a very large set of candidate model terms as well as parameters. The persisted exciting experiment for high-dimensional systems will also become very challenging. When the amount of data is smaller than the number of parameters, or the system has dead time and right half plane zeros, the proposed modeling approach will need some changes. Note that there are also some popular statistical analysis software (e.g., Minitab and SPSS) for general linear models; though they can not be directly used for this

Figure 40. Estimated nonlinear function of the Hammerstein model: f(u) (u1 = u2 = u3 = u4).

MIMO Hammerstein modeling, their experiment design and model selection ideas are very useful for improving the proposed modeling. The comparison with existing software for general linear models is also very interesting. Due to the limited space, these interesting problems are worth investigating in the future. 11166

dx.doi.org/10.1021/ie102273c |Ind. Eng. Chem. Res. 2011, 50, 11153–11169

Industrial & Engineering Chemistry Research

ARTICLE

’ APPENDIX: PARAMETER ESTIMATION A.1. Regularized Least Squares Estimation. The parameter estimation problem in (31) can be performed based on the RLSSVD algorithm. Considering the L-point data set, the linear regression problem in (33) can be given by

Y ¼ Φs Θs þ E where

ð51Þ

3 y̅ T ð1Þ 6 7 7 Y ¼6 4 l 5, Φs ¼ T y̅ ðLÞ

Figure 41. Prediction error of Hammerstein model and linear model on the training and testing data: e1(t).

2

2

3 ϕs ð1Þ 6 6 l 7 7, E ¼ 4 5 ϕs ðLÞ

2

3 eT ð1Þ 6 6 l 7 7 4 5 T e ðLÞ

^s can be obtained using the It is well known that the estimate Θ regularized least squares (RLS) method, which minimizes the following regularized error criterion: JR ¼ traceðET E þ Θs T ΛΘs Þ

ð52Þ

where Λ = diag{λ1, ..., λnMs} is the regularization parameter matrix. The sparse model structure design and the regularization technique can make sure the matrix ΦnMsTΦnMs + Λ be full rank ^ j k (w = 1, ..., ^ i (r = 1, ..., nys) and D and well-conditioned. Then A r w v ^s. Now the problem is nus, v = 1, ..., nfs) can be easily derived from Θ ^j k . to reconstruct Bjw and Ckv from D w v A.2. Singular Value Decomposition. With the following definitions for the matrices B and C,

C ¼ ½Ck1 , Ck2 , :::, Cknfs T ∈ R mnfs m

ð54Þ

it is easy to see that 2 Bj1 Ck1 6 Bj2 Ck1 6 BCT ¼ 6 6 l 4

333 333 3 33

Bjnus Ck1

333

Bj1 Cknfs Bj2 Cknfs l

3 7 7 7 ¼ D ∈ R nnus mnfs 7 5

Bjnus Cknfs ð55Þ

^ of the matrix D can then be obtained from the An estimate D ^ j k . The problem now is how to estimate the paraestimate D w v ^ . It is clear that the meter matrices B and C from the estimate D ^ are those ^ and C closest, in the Frobenius norm sense, estimates B that solve the following optimization problem ^ Þ ¼ argminf D ^  BCT 2F g ^, C ðB )

5. CONCLUSIONS A MIMO Hammerstein model structure identification approach is proposed in this study. Compared with linear regression models, the main difficulty of this problem is that the model terms are vectors and some model terms are inputs of other model terms (i.e., model term coupling). An effective algorithm is designed with multi-output locally regularized orthogonal least-squares, A-optimality design criterion, and a vector model term selection rule. To enhance the wellposedness of the regressors, estimation robustness, and model adequacy, the A-optimality criterion is integrated into the model error reduction criterion in the multi-output LROLS. To handle the vector model term coupling problem, a vector model term selection rule is synthesized into the multi-output LROLS. The proposed approach can identify a simple MIMO Hammerstein model structure with good performance from the noisy data. The algorithm is highly automatic because it only includes a few design parameters which are easy to select for the user. The effectiveness of the proposed method is verified by two simulation examples and one experimental example. Because many practical systems could have more complex nonlinear dynamics, higher stochastic disturbances, and a large number of input and outputs, further studies and improvements on more complex systems will be conducted in the future.

ð53Þ

)

Figure 42. Prediction error of Hammerstein model and linear model on the training and testing data: e4(t).

B ¼ ½Bj1 T , Bj2 T , :::, Bjnus T T ∈ R nnus m

B, C

The solution to this optimization problem is provided by the ^ .5,10 SVD of the matrix D ^ ∈ Rnnusmnfs have rank γ g m and let the Theorem 1:10 Let D ^ be given by economy-size SVD of D ^ ¼ Uγ Σγ Vγ T ¼ D

γ

σi μi υi T ∑ i¼1

ð56Þ

where the singular matrix Σγ = diag{σi} such that σ1 g 3 3 3 g σγ > 0 and where the matrices Uγ = [μ1, ..., μγ] ∈ Rnnusγ and Vγ = [υ1, ..., υγ] ∈ Rmnfsγ contain only the first γ columns of the unitary 11167

dx.doi.org/10.1021/ie102273c |Ind. Eng. Chem. Res. 2011, 50, 11153–11169

Industrial & Engineering Chemistry Research

ARTICLE

matrices U ∈ Rnnusnnus and V ∈ Rmnfsmnfs provided by the full ^, SVD of D ^ ¼ UΣV T D

ð57Þ )

)

^ ∈ Rmnfsm that ^ ∈ Rnnusm and C respectively. Then the matrices B ^  BCT F2, are given by10 minimize the norm D )

)

^ Þ ¼ argminf D ^  BCT 2F g ¼ ðU1 , V1 Σ1 Þ ^, C ðB

ð58Þ

B, C

where U1 ∈ Rnnusm, V1 ∈ Rmnfsm, and Σ1 = diag{σ1, ..., σm} are given by the following partition of the economy-size SVD in (56) " #" # V1 T Σ1 0 ^ ¼ ½ U1 U2  ð59Þ D 0 Σ2 V2 T and the approximation error is given by 2 F

)

)

^T ^ B ^C D

¼

γ



i¼m þ 1

σ 2i

ð60Þ

’ AUTHOR INFORMATION Corresponding Author

*E-mail: [email protected].

’ ACKNOWLEDGMENT The authors would like to thank the associate editor and the anonymous referees for their valuable comments and suggestions. The work is partially supported by a GRF grant from RGC of Hong Kong SAR (CityU: 117310), grants from NSF China (60825302, 61004047, 60804033), National Basic Research Program of China (2009CB724301), National S&T Major Projects (2009ZX04013-021, 2009ZX04002-061), and Specialized Research Fund for the Doctoral Program of Higher Education of China (20100073120045). ’ REFERENCES (1) Fruzzetti, K. P.; Palazoglu, A.; McDonald, K. A. Nonlinear model predictive control using Hammerstein models. J. Process Control 1997, 7 (1), 31. (2) Harnischmacher, G.; Marquardt, W. Nonlinear model predictive control of multivariable processes using block-structured models. Contr. Eng. Pract. 2007, 15 (10), 1238. (3) Narendra, K.; Gallman, P. An iterative method for the identification of nonlinear systems using a Hammerstein model. IEEE Trans. Autom. Control 1966, 11 (3), 546. (4) Stoica, P.; S€oderstr€om, T. Instrumental-variable methods for identification of Hammerstein systems. Int. J. Control 1982, 35 (3), 459. (5) Bai, E. W. An optimal two-stage identification algorithm for Hammerstein-Wiener nonlinear systems. Automatica 1998, 34 (3), 333. (6) Chen, H. F. Pathwise convergence of recursive identification algorithms for Hammerstein systems. IEEE Trans. Autom. Control 2004, 49 (10), 1641. (7) Zhu, Y. C. Identification of Hammerstein models for control using ASYM. Int. J. Control 2000, 73 (18), 1692. (8) Greblicki, W. Continuous-time Hammerstein system identification from sampled data. IEEE Trans. Autom. Control 2006, 51 (7), 1195. (9) V€or€os, J. Recursive identification of Hammerstein systems with discontinuous nonlinearities containing dead-zones. IEEE Trans. Autom. Control 2003, 48 (12), 2203.

(10) Gomez, J. C.; Baeyens, E. Identification of block-oriented nonlinear systems using orthonormal bases. J. Process Control 2004, 14 (6), 685. (11) Bai, E. W.; Li, D. Convergence of the iterative Hammerstein system identification algorithm. IEEE Trans. Autom. Control 2004, 49 (11), 1929. (12) Lakshminarayanan, S.; Shah, S. L.; Nandakumar, K. Identification of Hammerstein models using multivariate statistical tools. Chem. Eng. Sci. 1995, 50 (22), 3599. (13) Pawlak, M. On the series expansion approach to the identification of Hammerstein systems. IEEE Trans. Autom. Control 1991, 36 (6), 763. (14) Goethals, I.; Pelckmans, K.; Suykens, J. A. K.; De Moor, B. Subspace identification of hammerstein systems using least squares support vector machines. IEEE Trans. Autom. Control 2005, 50 (10), 1509. (15) Goethals, I.; Pelckmans, K.; Suykens, J. A. K.; De Moor, B. Identification of MIMO Hammerstein models using least squares support vector machines. Automatica 2005, 41 (7), 1263. (16) Haber, R.; Unbehauen, H. Structure identification of nonlinear dynamic systems - A survey on input/output approaches. Automatica 1990, 26 (4), 651. (17) Piroddi, L.; Spinelli, W. An identification algorithm for polynomial NARX models based on simulation error minimization. Int. J. Control 2003, 76 (17), 1767. (18) Lind, I.; Ljung, L. Regressor and structure selection in NARX models using a structured ANOVA approach. Automatica 2008, 44 (2), 383. (19) Billings, S. A.; Chen, S.; Korenberg, M. J. Identification of MIMO non-linear systems using a forward-regression orthogonal estimator. Int. J. Control 1989, 49 (6), 2157. (20) Hong, X.; Mitchell, R. J.; Chen, S.; Harris, C. J.; Li, K.; Irwin, G. W. Model selection approaches for non-linear system identification: A review. Int. J. Syst. Sci. 2008, 39 (10), 925. (21) Billings, S. A.; Korenberg, M. J.; Chen, S. Identification of output-affine systems using an orthogonal least squares algorithm. Int. J. Syst. Sci. 1988, 19 (8), 1559. (22) Chen, S.; Chng, E. S.; Alkadhimi, K. Regularized orthogonal least squares algorithm for constructing radial basis functions networks. Int. J. Control 1996, 64 (5), 829. (23) Chen, S. Local regularization assisted orthogonal least squares regression. Neurocomputing 2006, 69 (46), 559. (24) Hong, X.; Harris, C. J. Nonlinear model structure detection using optimum experimental design and orthogonal least squares. IEEE Trans. Neural Netw. 2001, 12 (2), 435. (25) Hong, X.; Harris, C. J. Nonlinear model structure design and construction using orthogonal least squares and D-optimality design. IEEE Trans. Neural Netw. 2002, 13 (5), 1245. (26) Hong, X.; Harris, C. J.; Chen, S.; Sharkey, P. M. Robust nonlinear model identification methods using forward regression. IEEE Trans. Syst. Man Cybern. Syst. Hum. 2003, 33 (4), 514. (27) Chen, S.; Hong, X.; Harris, C. J. Sparse kernel regression modeling using combined locally regularized orthogonal least squares and D-optimality experimental design. IEEE Trans. Autom. Control 2003, 48 (6), 1029. (28) Chen, S.; Billings, S. A.; Luo, W. Orthogonal least squares methods and their application to non-linear system identification. Int. J. Control 1989, 50 (5), 1873. (29) Chen, S. Multi-output regression using a locally regularised orthogonal least-squares algorithm. IEE Proc. Vis. Image Signal Process. 2002, 149 (4), 185. (30) Chen, S.; Hong, X.; Harris, C. J. Sparse multi-output radial basis function network construction using combined locally regularised orthogonal least square and D-optimality experimental design. IEE Proc. Contr. Theor. Appl. 2003, 150 (2), 139. (31) Qi, C. K.; Li, H.-X. A time/space separation based Hammerstein modeling approach for nonlinear distributed parameter processes. Comput. Chem. Eng. 2009, 33 (7), 1247. (32) Sj€oberg, J.; Zhang, Q.; Ljung, L.; Benveniste, A.; Delyon, B.; Glorennec, P.; Hjalmarsson, H.; Juditsky, A. Nonlinear black-box 11168

dx.doi.org/10.1021/ie102273c |Ind. Eng. Chem. Res. 2011, 50, 11153–11169

Industrial & Engineering Chemistry Research

ARTICLE

modeling in system identification: A unified approach. Automatica 1995, 31 (12), 1691. (33) MacKay, D. J. C. Bayesian interpolation. Neural. Comput. 1992, 4 (3), 415. (34) Chan, K. H.; Bao, J.; Whiten, W. J. Identification of MIMO Hammerstein systems using cardinal spline functions. J. Process Control 2006, 16 (7), 659. (35) Nie, J.; Loh, A. P.; Hang, C. C. Modeling pH neutralization processes using fuzzy-neural approaches. Fuzzy Set. Syst. 1996, 78 (1), 5. (36) Nowak, R. D. Nonlinear system identification. Circ. Syst. Signal Process. 2002, 21 (1), 109. (37) Barker, H. A.; Tan, A. H.; Godfrey, K. R. Design of multilevel perturbation signals with harmonic properties suitable for nonlinear system identification. IEE Proc. Contr. Theor. Appl. 2004, 151 (2), 145. (38) Barker, H. A.; Godfrey, K. R.; Tucker, A. J. Nonlinear system identification with multilevel perturbation signals. In Proceeding of the 2000 IFAC System Identification, Santa Barbara, CA, 2000; p 1175. (39) Braun, M. W.; Rivera, D. E.; Stenman, A.; Foslien, W.; Hrenya, C. Multi-level pseudo-random signal design and “model-on-demand” estimation applied to nonlinear identification of a RTP wafer reactor. In Proceedings of the 1999 American Control Conference, San Diego, CA, 1999; p 1573. (40) Haber, R.; Keviczky, L. Nonlinear System Identification: InputOutput Modeling Approach; Kluwer Academic Publishers: Boston, 1999. (41) Godfrey, K. R. Perturbation Signals for System Identification; Prentice Hall: New York, 1993.

11169

dx.doi.org/10.1021/ie102273c |Ind. Eng. Chem. Res. 2011, 50, 11153–11169