Sequential Parameter Estimation for Large-Scale Systems with

5850

Ind. Eng. Chem. Res. 2003, 42, 5850-5860

Sequential Parameter Estimation for Large-Scale Systems with Multiple Data Sets. 1. Computational Framework Richard Faber, Pu Li,* and Gu 1 nter Wozny Institute of Process and Plant Technology, Technical University of Berlin, KWT9, 10623 Berlin, Germany

A sequential approach to solving large-scale parameter estimation problems with multiple data sets is presented. A nested three-stage computation framework is proposed to decompose the problem. The upper stage is an NLP with only the parameters to be estimated as optimization variables. The middle stage consists of multiple sub-NLPs in which the independent variables of each data set are treated as optimization variables. In the lower stage the dependent variables and their sensitivities are computed through a simulation step. The sensitivities for the upper stage NLP and for the middle stage sub-NLPs can be easily computed by employing the Jacobians of the model equations at the lower stage. This approach is an extension of the work by Dovi and Paladino (Comput. Chem. Eng. 1989, 6, 731-735) and Kim et al. (AIChE J. 1990, 36, 985993). The advantage of this approach lies in the fact that it needs very few mathematical manipulations and can be easily implemented with standard NLP software. And the existing simulator can be connected for the function evaluation and sensitivity computation. Three practical examples are used to demonstrate the performance of this approach. 1. Introduction Model-based optimization has been widely used to exploit economical and environmental potentials of industrial processes. The quality of the process model is crucial to the reliability of the optimization results. Even most rigorous process models need certain parameters which have to be estimated based on measurement data. Many of these parameters may change during process operations due to effects such as fouling, corrosion, and foaming. They may depend on the changing operating point and the current plant configuration. Especially for real-time applications such as online optimization, it is desirable to adjust these parameters on-site to obtain a reliable process model that is able to reproduce the observed process as accurately as possible. Parameters obtained with shortcut models under laboratory conditions may not be enough for modeling industrial processes, if they are related to the process equipment.1,2 A rigorous model of an industrial steadystate process usually leads to a large-scale nonlinear equation system with thousands of variables. Moreover, multiple data sets have to be considered for the estimation and the measured data are usually model-inconsistent. Therefore, an estimation approach being able to handle large-scale systems to solving data reconciliation and parameter estimation problems is required. In addition, it is desired that the approach be easily implemented and connected with the existing process monitoring and simulation environments. On the basis of the process model, either a leastsquares objective function or the maximum likelihood principle can be used, together with optimization techniques, to estimate the process parameters. To obtain more reliable parameter estimates, usually a series of measured data sets from different operating points will be used for the estimation. The size of the optimization * To whom correspondence should be addressed. Tel.: +49-30-31423418. Fax: +49-30-31426915. E-mail: [email protected].

problem increases linearly with the number of the data sets. Special attention has to be paid to the influence of measurement errors. If the independent variables are error-free, the optimization is to be carried out only in the space of the parameters when using a sequential approach. In this case the number of data sets affects only the number of the constraints of the optimization problem. On the other hand, if the independent variables (i.e., the input variables) are also subject to errors, the optimization problem becomes more complicated. In this case the individual data sets are coupled over the parameters and the evaluation of the independent variables for each data set results in a data reconciliation problem nested in the parameter estimation problem. The independent variables apply only to the individual data sets, but as the solution for the data reconciliation step also depends on the parameters, the two optimization problems cannot be completely disconnected. The so-called errors-in-variables (EVM) method addresses both of these problems. Parameter estimation has probably been first investigated by Deming,3 who used a Lagrange method with linearization of the model about the measured data points. Similar approaches with refinements and extensions were developed in several studies afterward, for example, York4 and Peneloux et al.5 One important issue is the convergence of this estimation procedure which can be addressed by using a general NLP solver. Britt and Luecke6 presented general algorithms for the parameter estimation and showed that Deming’s method may not converge to an optimal solution. They refined the method by linearizing the model at the values obtained from the previous iteration. Two major issues in parameter estimation have been studied in the past. The issue of finding the global optimum has been investigated by several authors (e.g., Esposito and Floudas7 and Gau and Stadtherr8) since the problem is generally nonconvex. Another important issue, which is the concern of this study, is the high dimension of the estimation problem. As errors are

10.1021/ie030296s CCC: $25.00 © 2003 American Chemical Society Published on Web 10/15/2003

Ind. Eng. Chem. Res., Vol. 42, No. 23, 2003 5851

allowed for all (dependent and independent) variables, the degrees of freedom as well as the number of constraints of the optimization problem become high and they increase with the number of data sets involved. Thus, even for medium-sized systems, the optimization problem can become too large to handle with standard optimization software. Therefore, in a series of previous studies decomposition algorithms have been used. Rod and Hancil9 proposed an iterative algorithm, where the parameters and the independent variables are evaluated separately. The independent variables and the parameters are fixed in solving each single problem, respectively. Reilly and Patino-Leal10 used a method in which the data reconciliation is nested within the parameter estimation step. The constraint equations are successively linearized at the measured data points. A similar approach was used by Anderson et al.11 as well as Sutton and MacGregor.12 Tjoa and Biegler13,14 proposed an approach to simultaneously handle all variables of all data sets. The problem is implicitly decomposed. It first decouples the equations of the individual data sets by introducing new variables and adding additional constraints to each data set. To further exploit the structure in each data set, a range and null space decomposition strategy is used. The whole problem is solved by a reduced Hessian SQP method. Although this approach is efficient in solving large-scale problems, as indicated in ref 14, it needs much mathematical manipulation for implementing the algorithms. Kim et al.15 compared the simultaneous strategy using a two-stage nonlinear programming (NLP) approach with a nested nonlinear EVM approach. The twostage method is similar to the iterative algorithm of Rod and Hancil9 with the difference that, for the data reconciliation step, a series of decoupled NLP problems are solved with a standard SQP algorithm and, for the parameter estimation step, a linear approximation of the constraints is used. The nested nonlinear EVM is similar to the approach of Reilly and Patino-Leal10 with the difference that, as in the two-stage EVM, a set of decoupled NLP problems are solved for the inner data reconciliation step. In this nested approach, the total variables are decomposed into parameters in the upper NLP and the independent as well as state variables in the lower NLPs. However, if large-scale systems are concerned, the individual sub-NLP problems will become so large that they cannot be handled efficiently. Dovi and Paladino16 presented a constrained variation approach in which, to decouple the problem, the state (dependent) variables are eliminated by solving the model equations. The independent variables are eliminated by the optimality condition of the subproblems. Thus, the problem can be solved only in the space of parameters. The elimination of the state variables requires a simulation step with given values of the parameters and independent variables. Therefore, it makes it possible to treat large-scale estimation problems without limitations on equality constraints. However, second-order derivatives are required in the decoupling step, which will be prohibitively expensive, especially for large problems. In this study, we present a sequential approach to solving large-scale estimation problems with multiple data sets. It is an extension of the work by Dovi and Paladino16 and Kim et al.15 A nested three-stage computation framework is proposed. The parameters to be

estimated are treated in the upper stage by solving an NLP problem. The middle stage consists of multiple subNLPs nested in the upper stage, representing the data reconciliation step for each data set, where only the independent variables are considered as optimization variables. The lower stage is a simulation step for computing the state variables and their gradients to the independent variables. According to the optimality condition in the middle stage, only the gradients of the dependent variables to the parameters are required in the upper stage, which can be easily computed by employing the Jacobians of the model equations to the dependent variables and to the parameters, respectively. Therefore, the second-order derivatives of the model equations are avoided. Thus, the computation expense is significantly reduced. Since the degree of freedom is limited to the number of parameters in the upper stage and the number of independent variables in the middle stage, any standard NLP software can be used to solve the problem. Moreover, the existing simulator can be directly used for the function evaluation and sensitivity computation. Furthermore, very few mathematical manipulations are required. Three practical estimation examples are used to demonstrate the effectiveness of the approach. 2. Estimation Problem and Approach Development To estimate parameters Θ in a nonlinear implicit equation system, we usually have several measured data sets of some output (dependent) variables, y, and some input (independent) variables u. For instance, if we estimate the model of a distillation column, y may consist of the measured temperatures on some trays and top and/or bottom compositions, while u may consist of the measured feed flow, reboiler duty, and reflux flow as well as the column pressure. In this study, we assume both y and u are subject to measurement errors. In addition, there are many unmeasured state variables in the model, x, for example, the temperatures and compositions on the trays inside the column. A general parameter estimation problem with multiple sets of data can be formulated as follows: I

min f )

I

fi ) ∑(yi - yˆ i)TWY-1(yi - yˆ i) + (ui ∑ i)1 i)1 ˆ i) u ˆ i)TWU-1(ui - u

s.t. gi(xi,yi,ui,Θ) ) 0

i ) 1, ..., I

(1)

hi(xi,yi,ui,Θ) g 0 Θ L e Θ e ΘU where

x ∈ X ⊆ R n, y ∈ Y ⊆ R m, u ∈ U ⊆ R l,Θ ∈ R p, g ⊆ R n+m, h ⊆ R k WY and WU in the objective function are the known covariance matrices of the measurement errors for the dependent and independent variables, respectively. yˆ i and u ˆ i are the measured values of the dependent and independent variables for data set i. g is the vector of

5852 Ind. Eng. Chem. Res., Vol. 42, No. 23, 2003

the model equations and h the vector of inequality constraints. The inequality constraints represent the process restrictions which should be considered in the parameter estimation. For instance, the physical bounds of the vapor and liquid load in a distillation column should not be violated. In most previous studies, however, inequality constraints have not been included in the problem formulation. Then the model with the resulted parameters may not correctly describe the process. Moreover, all variables (x, y, u) have to be inside in their physical region. The total number of degrees of freedom in this problem is I‚(n + m + l) + p. The straightforward approach to addressing problem (1) is to include all variables of I data sets in an NLP solver (usually an SQP algorithm) and solve them simultaneously. In this way, for large-scale systems, the problem will be so large that it may not be efficiently solved with the existing NLP solvers. It is noted that the dependent y as well as the state variables x vary with each individual data set, while the parameters Θ are independent of the data sets. Moreover, in most chemical engineering processes, the dimension of state variables x is usually much greater than that of the independent variables u. As mentioned before, Kim et al.15 proposed, with a nested two-stage framework, to decompose the total variable space into a space of Θ, which are handled in the upper NLP stage, and a space of all other variables x, y, u, which are handled in lower sub-NLPs for individual data sets. Thus, the dimension of each subNLP is n + m + l. For large-scale problems, this number may be still too large for a standard NLP solver to deal with. Dovi and Paladino16 proposed a decomposition of the variables into a space of Θ, a space of u, and a space of x as well as y. The dependent variables x and y will be eliminated by solving the equation system, while the independent variables u will be eliminated by employing the optimality conditions of the subproblems for each data set. Then the remaining degrees of freedom are only the parameters Θ and the problem can be solved with an unconstrained NLP. They have not considered inequality constraints. Variables (dependent and independent variables) of a practical process are always constrained. While the independent (input or control) variables can be restricted based on the real process limitations, the dependent (output or state) variables are implicitly enforced by the physical model of the process. This implicit enforcement may not hold because models are approximations of real processes. In such cases, we need to introduce inequality constraints on dependent (state) variables to reconcile the model with the process measurements. Another shortcoming of this approach is its requirement of computing the secondorder derivatives of the model equations to all variables. In this work we follow and extend the ideas of Kim et al.15 and Dovi and Paladino16 and propose a sequential three-stage estimation framework, as shown in Figure 1. The upper stage solves the following problem: I

min f ) Θ

I

fi ) ∑(yi - yˆ i)TWY-1(yi - yˆ i) + ∑ i)1 i)1 ˆ i)TWU-1(ui - u ˆ i) (ui - u

s.t.

(2) L

U

Θ eΘeΘ

where only the parameters are treated as optimization

Figure 1. Framework of the three-stage sequential approach.

variables and the variables y and u are considered as functions of Θ. The size of this problem is p. In the middle stage, the sub-NLPs for each data set are nested in problem (2). With given values of the parameters from the upper stage, the sub-NLP problem for data set i,(i ) 1, ..., I), has the following form:

min fi ) (yi - yˆ i)TWY-1(yi - yˆ i) + ui

(ui - u ˆ i)TWU-1(ui - u ˆ i) s.t.

(3) hi(xi,yi,ui,Θ) g 0

where only the independent variables u are optimization variables and thus the size of this problem is l. Note that the inequality constraints have to be satisfied in this stage. In the lower stage, which is nested in the sub-NLP solver, the measured dependent variables y and the unmeasured state variables x, representing the largest part of the whole variable space (n + m), will be computed by solving the model equations with the Newton algorithm:

gi(xi,yi,ui,Θ) ) 0

(4)

This is in fact a simulation step with given Θ and ui. An existing simulator can be directly used here to gain x and y required for computing the objective function values required for solving problem (3). To reduce the computation expense, in case of a large-scale process, the sparse structure of the equation system should be employed. After the convergence of the Newton step, the total differential of the equation system will be

φi(ui,Θ) )

∂gi ∂gi dx˜ i + )0 ∂ui ∂x˜ i dui

(5)

And thus the sensitivities of the dependent to the independent variables required for solving problem (3) can be computed by

[ ]

∂gi dx˜ i )dui ∂x˜ i

-1 ∂g

i

∂ui

(6)

where x˜ i ) [xi,yi]T and ∂gi/∂x˜ i, ∂gi/∂ui are the Jacobian of the equation system to x˜ i and ui, respectively. For systems with a high dimension of x˜ i, the inversion of the Jacobian ∂gi/∂x˜ i (with the dimension of (n + m)2) will be expensive. Thus, in the numerical implementation, the gradients will not be computed by matrix inversion and multiplication, rather by a step of Gauss elimination.


The sub-NLP problems of the middle stage are solved sequentially for each data set. Due to the reduced dimension of each sub-NLP, any standard NLP solver can be used to carry out this task. Since the sub-NLPs are independent of each other, a parallel computation scheme can be adopted. The objective function values of all data sets will be summed up and supplied to the upper stage, as shown in Figure 1. In addition, for the upper stage NLP to update Θ for the next iteration, the sensitivities of both the independent variables u and the dependent variables x to the parameters Θ are required. To compute these sensitivities, the optimality condition of each sub-NLP in the middle stage can be utilized, namely, at its convergence (i.e., ui ) u/i , x˜ i ) x˜ /i ) with given Θ, we have the optimality condition of the reduced sub-NLP problem:

φi(u/i ,Θ) )

∂fi ∂fi dyi + )0 ∂ui ∂yi dui

(7)

For computing the sensitivities, Dovi and Paladino16 proposed to use the first-order Taylor expansion of the equations including both (5) and (7) with respect to both x˜ i and ui. As derived in ref 16, this leads to the requirement of computing the second-order derivatives of the model equations gi to all variables, that is,

∂2gi

, 2

∂x˜ i

∂2gi ∂2gi ∂2gi ∂2gi , , , ∂x˜ i∂ui ∂u 2 ∂x˜ i∂Θ ∂ui∂Θ

(8)

i

The computation of these terms is very expensive, especially the first one in (8) that has the dimension of (n + m)3. For systems that have a large number of state variables, this will be prohibitive. However, if we analyze the gradients of the objective function of the upper stage (problem (2)), we have

df

I

)

dΘ that is

df dΘ

I

)

∂fi ∂ui

∑ i)1∂u

(

∂fi

+

i

(

∂fi ∂yi dui ∂yi ∂ui dΘ

∂Θ

i

∑ i)1 ∂u

+

)

∂fi ∂yi dui ∂yi ∂ui dΘ

+

+

)

∂yi ∂Θ

∂fi ∂yi

(9)

dΘ

I

)

∂fi ∂yi

∑ i)1∂y

i

(12)

then

( )

∂gi ∂x˜ i )∂Θ ∂x˜ i

-1∂g

i

∂Θ

(13)

In the case of having inequality constraints, if they are not active at the convergence of the middle stage sub-NLPs, these constraints will have no effect on the convergence of the whole nested procedure. As analyzed before, the aim of the introduction of inequality constraints is to confine the model parameters in the region of model consistency. It means they should not be active at the convergence of the estimation problem (i.e., with the estimated parameters these variables in the model should not always be at the boundary). However, these constraints may be active at some data points for some iterations of the upper stage NLP, if poor guess values of the parameters are used. In this case, the first term of eq 10 will not vanish. Then the gradient computation of the upper stage NLP of the proposed approach (where this term is set to zero) will not be accurate. This will lower the speed of convergence but will not lead to incorrect search direction. This is because the second term in (10), which is the sensitivity of the total function value to the parameters conveyed through dependent variables, will be dominant if at some data points the constrained states are at their boundaries. 3. Numerical Implementation The computation framework of this three-stage nested sequential approach is outlined as follows: • Choose a starting point for Θ0. • Set the iteration counter for the NLP in the upper stage k ) 0. • At each iteration k: (A) Evaluate the function I

f)

fi ∑ i)1

and gradients

(10)

∂yi ∂Θ

df

If the sub-NLPs converge, according to the optimality condition (7), the first term in (10) is vanished, and then the gradients required will simply be

df

∂gi ∂gi ∂x˜ i + )0 ∂Θ ∂x˜ i ∂Θ

(11)

∂Θ

Therefore, the values of dui/dΘ are not required and then all second-order derivatives shown in (8) can be avoided. Through this consideration, the computation expense will be significantly reduced. Since the partial derivatives of the objective function in (11) to the dependent variables are easy to receive, what we need now is only to compute ∂yi/∂Θ. To compute these derivatives, we resort to the model equation system at the convergence of the middle stage sub-NLPs. They can be obtained by the partial derivatives of eq 4 about x˜ /i , Θ:

dΘ

I

)

∂fi ∂yi

∑ i)1 ∂y

i

∂Θ

by successively calling the sub-NLPs in the middle stage: (i) Choose a starting point for ui0. (ii) Set the iteration counter for the sub-NLP i as j ) 0. (iii) At each iteration j with Θk and uij: (a) Evaluate the function fi and gradients

∂fi ∂fi dyi dfi ) + dui ∂ui ∂yi dui as well as

dhi ∂hi ∂hi dx˜ i ) + dui ∂ui ∂x˜ i dui by calling the lower stage:


Figure 2. Flow diagram of the sequential estimation approach.

(I) Solve the model equations with Newton to receive x˜ i. (II) Compute the Jacobian

∂gi ∂gi , ∂ui ∂x˜ i (b) Call a constrained NLP routine to update uij. (c) j r j + 1. (vi) Compute the gradients

∂yi ∂Θ at convergence of all sub-NLPs. (vii) Return to the upper stage. (B) Call an unconstrained NLP routine to update Θk. (C) k r k + 1. • Stop, at convergence of the upper stage NLP. It is noted that the variables in different stages have different functionalities. Both the dependent and the independent variables in the upper stage are functions of the parameters, while the dependent variables in the middle stage are the functions of the independent variables. Thus, total differentials of the objective function are required for the sensitivity computation in both of these two stages. In the lower stage, the state (dependent) variables are functions of the independent variables in solving the sub-NLPs in the middle stage, and they are functions of both the independent variables and the parameters in computing sensitivities for the upper stage. As shown in Figure 2, the implementation of this computation framework is quite simple. We need a standard NLP solver for solving the upper NLP problem (problem (2)) as well as the middle stage sub-NLPs (problem (3)) and a simulation step for the lower stage. If available, a simulator can be used in the lower stage for steps (I) and (II) in the above framework. Since the state variables that represent the majority of the total variables are eliminated through simulation, the approach has no limitation on the dimension of the estimation problem. In addition, if some commercial simulators can provide Jacobians of the equation system to different types of variables, they can be directly used for the sensitivity computation. Another advantage here

is that it needs only to initialize the parameters and the independent variables (it is reasonable to use the measured values as initial values) and does not need to initialize the state variables in the NLP. In addition, a routine for Gauss elimination is needed for solving (6) and (13) for the sensitivity computation. The constrained NLP used for solving the sub-NLP problems in the middle stage is usually an SQP algorithm. Moreover, due to the separate treatment of the individual data sets, a parallel computation scheme can be used. The shortcoming of a sequential approach is known to have a low computational efficiency. Iterative calculations are required for solving the model equations in the simulation stage and for solving the sub-NLPs in the middle stage. As the measured values are used for the initial values for the independent variables, which are often in the vicinity of the solution of both the subNLPs and the simulation step, only a few iterations are usually needed to reach the convergence of these two stages. In addition, to speed up the convergence of the sub-NLPs in each iteration of the upper stage with updated parameters Θk, the Hessian of the sub-NLPs from the last iteration can be used for initializing the Hessian for the new iteration. The reason to do this is as follows. We make a perturbation of the parameters on the optimality condition (7) about u/i ,Θk, that is,

∂φi ∂φi dui + )0 ∂Θ ∂ui dΘ

(14)

Since the Hessian is the second-order total derivative of the Lagrange function at the convergence of problem (3), both the inequality and equality constraints are satisfied. The Hessian of the sub-NLP is then the partial derivative of eq 7 to ui:

Hi(u/i ,Θk) )

∂φi ∂ui

(15)

From (14) and (15) we have

[ ]

∂φi dui )dΘ ∂ui

-1∂φ

i

∂Θ

) -[Hi(u/i ,Θk)]-1

∂φi ∂Θ

(16)

This means that the sensitivity of the independent variables to the parameter change depends on the Hessian of the sub-NLP. Thus, the convergence of the sub-NLPs will be accelerated, if we initialize their Hessian with Hi(u/i ,Θk) for iteration k + 1 of the upper NLP. However, the guess values of the parameters in the upper stage are often difficult to select. They have important impact on the computation of the middle and lower stage. Unsuitable guess values may lead to difficulties for these two stages to reach the convergence. In fact, this is a problem not only of the proposed approach but also of other estimation methods. Extra constraints on some dependent variables which are sensitive to the parameter values can be introduced in the upper stage (problem (2)) or in the middle stage (problem (3)) to moderate this problem. This means that in the search for the optimal parameters the variable restrictions should be taken into consideration. Another method to relieve the parameter initialization problem is to adjust the initial Hessian of the sub-NLPs at the first iteration of the upper stage (i.e., k ) 0). According


to (16), to reach the new optimum due to a change of the parameters, the required change of the independent variables is proportional to the inverse of the Hessian; it means

||dui|| ∝

adj[Hi]T det[Hi]

||dΘ||

4. Computational Studies To test the proposed approach, three practical estimation problems from the literature are considered. All computations were done in double precision using the COMPAQ VISUAL FORTRAN compiler on an AMD ATHLON 1600+ personal computer. To solve the NLP problems from the upper and middle stages, a standard SQP solver from the IMSL library was used. 4.1. Vapor-Liquid Equilibrium Model. This example is taken from Kim et al.15 and has also been considered by other authors (e.g., Esposito and Floudas7 and Gau and Stadtherr17). The two parameters in the Van Laar equation for liquid-phase activity coefficients are to be estimated based on binary VLE data for the system methanol(1) and 1,2-dichloroethane(2). Measurement data for the variables pressure P (mmHg), temperature T (K), liquid mole fraction x1, and vapor mole fraction y1 were taken from Reid et al.18 The measurement values of the five data points used are given in Table 1. The model equations are as follows:

γ1x1p01(T) - y1P ) 0

(18)

γ2(1 - x1)p02(T) - (1 - y1)P ) 0

(19)

with the pure component vapor pressures p01 and p02 given by the Antoine relationships

3626.55 [ T - 34.29] 2927.17 p (T) ) exp[16.1764 T - 50.22] p01(T) ) exp 18.5875 0 2

y1

x1

Θ1

Θ2

483.8000 493.2000 499.9000 501.4000 469.7000

0.591 0.602 0.612 0.657 0.814

50. 50. 50. 50. 50.

0.3 0.4 0.5 0.7 0.9

1.94b

1.61b

estimated values

483.9852 493.2762 499.6644 501.2884 469.7045

0.596 0.612 0.624 0.667 0.810

49.94 49.97 50.08 50.04 50.00

0.299 1.9117 1.6083 0.400 0.500 0.700 0.901

)] )]

A x1 A 1+ γ1 ) exp RT B 1 - x1

-2

B B 1 - x1 1+ RT A x1

-2

a

Taken from Reid et al.18

b

T [°C]

Given in ref 18.

temperature Tr, equal to 323.15 K. The parameter vector to be estimated in the upper stage is defined as

Θ≡

[

A B RTr RTr

]

T

(24)

Using the proposed framework, the pressure P and vapor mole fraction y1 are treated as dependent variables and are evaluated in the simulation layer. The independent variables are temperature T and liquid mole fraction x1, which are estimated in the middle stage. The measured values of the variables

ˆ i]T ) [P ˆ i, yˆ 1,i, T ˆ i, xˆ 1,i]T zˆ i ≡ [yˆ i u

i ) 1, ..., I

(25)

are assumed to have the following standard deviations σZ ) [0.75, 0.015, 0.1, 0.005]T, respectively. The objective function of the upper stage is defined as 5

f)

5

4

fi ) ∑∑ ∑ i)1 i)1 j)1

( ) zj,i - zˆ j,i σj,i

2

(26)

The boundaries of the parameters were set at ΘL1 ) ΘL2 U ) 1 and ΘU 1 ) Θ2 ) 2. The boundaries of the independent variables were set at u ˆ ( 0.5σ. The initial values for both parameters were chosen as 1.5 and the independent variables were initialized at their measurement values. In this case it took 22 iterations of the upper stage NLP to converge to the solution given in Table 1. The CPU time was 0.03 s. If bad guess values for the independent variables in the middle stage are used, it is desirable to increase the search region by relaxing their boundaries in the subNLP problems. This may result in a convergence problem in the simulation layer. To avoid this problem, an inequality constraint for the vapor mole fractions is added to each sub-NLP to stabilize the simulation:

(20)

hi ) 1 - y1,i g 0

(21)

The gradients of the inequality constraints needed in the optimization procedure can simply be taken from the gradient computation in (6):

and the activity coefficients γ1 and γ2 defined by the twoparameter Van Laar equation:

γ2 ) exp

P [mmHg] measured valuesa

(17)

where “adj” and “det” are the adjunct matrix and determinant of the Hessian, respectively. ||‚|| represents the norm of the related vectors. It can be seen that if the guess values of the parameters are selected far away from the optimum, large variations of the independent variables will be required. This may lead to divergence of the sub-NLPs and/or the simulation step. According to (17), we can choose an initial Hessian for the subNLPs with a large value of determinant to allow a moderate step change of the independent variables. It is noted that the initial Hessian can be defined in standard software (default Hessian is usually a unity matrix).

[ ( [ (

Table 1. Parameter Estimation for the VLE Model

(22)

(23)

The temperature values are scaled by a reference

∂y1,i dhi )dui ∂ui

(27)

(28)

With the inequality constraints included, the search boundaries for the independent variables can be extended to T ˆ i ( 5 K for the temperature and x1 ∈ [10-5,0.999] for the liquid mole fraction. With the extended search region exactly the same, optimization results can be achieved and only 12 iterations were


Figure 3. Scaled temperature value for data set 1 during optimization of state variables after an update of the parameter values.

required in the upper stage NLP. The overall computation time increased slightly to 0.05 s, which indicates that it needs more iterations for the individual subNLPs to find the optimal solution for the independent variables. In many parameter estimation problems, only rough guesses of the parameter values are available. As noted in section 3, this may lead to difficulties in the middle and lower stages to reach convergence. To overcome this difficulty, we can modify the initial Hessian of the individual sub-NLPs. To test the robustness of the parameter initial values, we use the following initial Hessian for each sub-NLP,

H0i ) 30*E

Biegler and Tjoa.19 It was also studied by Gau and Stadtherr.8 The model describes the heat exchanger network shown in Figure 4. Stream A is heated by streams B, C, and D through four heat exchangers. Constant heat capacity for all streams is assumed and all flow rates and temperatures are measured where indicated. The four heat exchanger rating parameters Θ ) [UA1, UA2, UA3,UA4]T are to be estimated based on measurement data given by Biegler and Tjoa.19 Twenty sets of measurement data were created by Biegler and Tjoa19 from parameter values Θ* ) [4.85, 4.00, 6.80, 5.35]T with added noise. The model includes mass and heat balances around the heat exchangers, splitting and mixing junctions as well as rating equations for the four heat exchangers. Each data set consists of 19 variables and the model consists of the following 10 equations:

(29)

where E is the unity matrix. With this initialization of the Hessian, stable convergence can be achieved for guess values of the parameters anywhere between 1 and 8, and it results in the same estimates both for parameters and state variables shown in Table 1 with the objective function value f ) 3.32196. Depending on the starting values, the number of iterations needed in the upper stage varied between 12 and 22. The overall computation time increased slightly but never exceeded 0.1 s. This is because with a larger Hessian a moderate step length will result and more iterations will be needed in solving the middle stage sub-NLP problems. It is noted that the overall computational time is insignificant for solving this estimation problem by using the proposed approach. The CPU time to solve the same problem with a global search approach (e.g., in refs 7 and 17) was much larger. In Figure 3 the scaled temperature values T/Tr in the individual iterations during optimization of the independent variables for data set 1 in the middle stage, after an update of the parameters in the upper stage, is plotted. It shows the results with different scaling of the initial Hessian of the sub-NLP. If large variations of the parameters are provided from the upper stage, as selected in this test case, the middle stage will not converge if the initial Hessian is set to unity matrix. The convergence can be reached by scaling the Hessian. It can be seen that by increasing the value of det[H0] the step size will be moderated and thus the search for the optimum is stabilized. The Hessian matrix and the variance-covariance matrix of the parameters for the upper stage NLP at convergence are given in Appendix A1. 4.2. Heat Exchanger Network Model. The second example is a heat exchanger network model taken from

FA1 - FA3 - FA6 ) 0

(30)

FA1(TA2 - TA1) - FB1(TB2 - TB3) ) 0

(31)

FA3(TA4 - TA2) - FB1(TB1 - TB2) ) 0

(32)

FA3(TA5 - TA4) - FC1(TC1 - TC2) ) 0

(33)

FA6(TA7 - TA2) - FD1(TD1 - TD2) ) 0

(34)

FA1TA8 - FA3TA5 - FA6TA7 ) 0

(35)

(TB2 - TA2) - (TB3 - TA1) UA1 - FB1(TB2 - TB3) ) 0 TB2 - TA2 ln TB3 - TA1 (36)

(

)

(TB1 - TA4) - (TB2 - TA2) - FB1(TB1 - TB2) ) 0 UA2 TB1 - TA4 ln TB2 - TA2 (37)

(

)

(TC1 - TA5) - (TC2 - TA4) - FC1(TC1 - TC2) ) 0 UA3 TC1 - TA5 ln TC2 - TA4 (38)

(

UA4

)

(TD1 - TA7) - (TD2 - TA6) - FD1(TD1 - TD2) ) 0 TD1 - TA7 ln TD2 - TA6 (39)

(

)

Thus, there are 9 independent variables. In this study, the set of independent variables is chosen as u ) [FA1, FB1, FC1, FD1, FA3, TA1, TB1, TC1, TD1]T that represents the input variables of the network. The standard deviations are set to 0.5 and 0.01 for flow rate and temperature measurements, respectively. To avoid failure in the simulation layer, the following four inequality constraints are added to each sub-NLP to ensure that


Five measurements of process variables were gained from simulated data with added noise: two independent variables (inlet temperature T0 and inlet concentration A0) and three dependent variables (outlet temperature T and outlet concentrations A and B). The model consists of simple mass and energy balances around the reactor:

Figure 4. Heat exchanger network flow sheet.

flow A is always the cold steam in every heat exchanger:

TB2,i - TA1,i g 0 TB1,i - TA4,i g 0

k1

1 (B - B) - k1A ) 0 τ 0

(43)

-∆Hr 1 (T0 - T) + (k1A) ) 0 τ FCp

(44)

k1 ) c1 exp (40)

In Biegler and Tjoa,19 five versions of the parameter estimation problem were solved, with the number of data sets varying from I ) 4 to I ) 20. The initial values for the parameters for each version were set to Θ0 ) [5.0, 4.0, 7.0, 5.0]T and the guess values of the other variables were set to the measurement values times 1.001. Using the proposed approach in this work, the boundary on all parameters was set at Θ ∈ [1,10] and those on the independent variables were set at the measured value (1.5 times their standard deviation. The Hessian of the individual sub-NLPs was initialized with 10*E. The results of the parameter estimation for all five versions using the sequential approach are given in Table 2. The estimated parameter values correspond well with the results given by Biegler and Tjoa.19 The sum of squared residuals displayed in Table 3 also show good agreement between measured and estimated values. The Hessian matrix at convergence of the upper stage NLP given in Appendix A.2 shows a strong curvature at the solution point. The variance-covariance, also given in Appendix A.2, indicates a high reliability of the parameter estimates. As seen in Figure 5, the overall computation time increases linearly with the number of data sets involved. With 20 data sets, the total number of variables to be optimized is 384. Using the sequential approach, the upper stage NLP has only 4 parameters as optimization variables, each middle stage sub-NLP has 9 independent variables to be optimized, and in the lower stage 10 state variables are computed by solving the 10 model equations. The total CPU time is less than 1 s in solving the problem, which indicates the efficiency of the proposed approach. 4.3. CSTR Model. This example is taken from Kim et al.,15 Esposito and Floudas,7 and Gau and Stadtherr.17 The model represents a steady-state adiabatic CSTR with an irreversible first-order reaction:

A 98 B

(42)

where τ is the residence time of the reactor (100 s), ∆Hr is the heat of reaction (-4180 J/mol), F is the density of the reaction mixture (1.0 g/L), and Cp is the heat capacity of the reaction mixture (4.18 J/(g K)). The reaction rate constant is expressed as

TC1,i - TA5,i g 0 TD1,i - TA7,i g 0

1 (A - A) - k1A ) 0 τ 0

(41)

( ) -Q1 RT

(45)

where c1 and Q1 are the Arrhenius constants. These parameters have to be estimated based on measurement data. Following Kim et al.15 parameter transformations are used, resulting in the following rate equations

[ (

k1 ) Θ1 exp -Θ2

)]

Tr -1 T

(46)

with

Θ1 ) c1 exp Θ2 )

( ) -Q1 RTr

-Q1 RTr

(47)

(48)

where Tr is the reference temperature (800 K). Experimental data sets were generated by adding random noises to 10 data sets simulated by Kim et al.15 around the parameter values Θ1 ) 0.01717 s-1 and Θ2 ) 12.58. The measured values of the variables are obtained by

ˆ i]T ) [A ˆ 0,i, T ˆ 0,i, A ˆ i, B ˆ i, T ˆ i]T ) zˆ i ≡ [yˆ i u i ) 1, ..., I (49) z˜ i + rand‚σz where “rand” is a random number generated on the interval [-1,1] and z˜ i are the true variable values. They are assumed to have the following standard deviations σZ ) [σ1,σ2,σ1,σ1,σ2]T. Three different noise levels have been chosen to test the robustness of the approach.

noise level 1 σ1 ) 0.01 σ2 ) 1.0 noise level 2 σ1 ) 0.05 σ2 ) 5.0 noise level 3 σ1 ) 0.1 σ2 ) 10.0

(50)

The initial guess values for the parameters are set to Θ01 ) 0.1 and Θ02 ) 10.0 and the independent variables are initialized at their generated measurement values. Boundaries on the parameters are set to Θ1 ∈ [0.0001,0.1]

5858 Ind. Eng. Chem. Res., Vol. 42, No. 23, 2003 Table 2. Results for Heat Exchanger Network Problem no. of data sets:

4

8

12

16

20

UA1 UA2 UA3 UA4

4.847538 3.995819 6.789496 5.353738

4.847261 3.997588 6.790113 5.352291

4.848286 3.998738 6.795360 5.351097

4.848336 3.9994349 6.797531 5.351325

4.848167 3.999198 6.797092 5.356499

objective function value

1.424 × 10-2

2.257 × 10-2

9.456 × 10-2

0.104946

0.108201

CPU time (s)

0.160

0.320

0.511

0.651

0.861

Table 3. Sum of Squared Residuals for HEN Data (20 Data Sets) F(A6)

T(A2)

T(A4)

T(A5)

T(A7)

T(A8)

T(D1)

T(C1)

1.39 × 10-2

5.46 × 10-7

2.32 × 10-7

7.58 × 10-7

1.38 × 10-6

1.18 × 10-6

5.92 × 10-8

1.94 × 10-7

T(B1)

T(B2)

F(A1)

F(A3)

F(C1)

F(D1)

F(B1)

T(A1)

1.08 ×

10-8

3.97 ×

10-8

2.22 ×

10-4

5.39 ×

10-5

3.81 ×

10-4

9.51 ×

10-5

2.02 ×

10-5

T(B3)

T(C2)

T(D2)

1.68 × 10-7

2.10 × 10-7

2.74 × 10-8

1.45 × 10-7

the scatter of the parameter values increases with higher standard deviation of the measurements. Whereas the objective function value, which is evaluated according to eq 26, is unaffected by the noise level, the sum of squared residuals (SSRES) defined below increases at higher noise levels. I

SSRES )

Figure 5. CPU time by the sequential parameter estimation approach for 4, 8, 12, 16, and 20 data sets. Table 4. Parameter Estimates for CSTR Model Θ1

Θ2

objective

SSRESa

CPU time [s]

(a) Noise Level 1, Five Estimation Runs smallest value 0.017027 12.46108 7.06166 2.6637 largest value 0.017541 12.687 12.5422 5.4658 mean value 0.01727 12.5836 9.5097 4.0993

0.36 0.48 0.416

(b) Noise Level 2, Five Estimation Runs smallest value 0.0166 12.2975 7.254 75.365 largest value 0.018729 13.2111 12.9541 154.49 mean value 0.01759 12.7242 9.95654 103.1968

0.41 0.48 0.442

(c) Noise Level 3, Five Estimation Runs smallest value 0.017406 12.4865 7.7529 339.71 largest value 0.01943 13.28635 12.6697 483.38 mean value 0.01818 12.7753 9.8389 397.4083

0.43 0.54 0.476

a

Sum of squared residuals.

and Θ2 ∈ [5.0,15.0]. The boundaries for the variables are set to zˆ i ( 3σz. The Hessian of the individual subNLPs is initialized with 30*E. The proposed approach converges in all three stages, without adding inequality constraints to the middle stage NLP. Since the measurement values are created by adding random noises to the true values for each optimization run, the estimated values of the independent variables in the middle stage as well as the estimated parameters in the upper stage NLP vary with each individual run. To test the robustness of the proposed approach, five parameter estimation runs were performed, each with different randomly generated measurement values, for each noise level. Table 4a-c shows the results of the parameter estimation. For all estimation runs satisfactory parameter estimates are obtained for all noise levels, although

I

5

SSRESi ) ∑ ∑ (zj,i - z˜j,i)2 ∑ i)1 i)1j ) 1

(51)

which defines the sum of squared deviations between reconciled data z and the true values of the variables z˜ . The large number of this value refers mostly to deviations in temperature estimates. This effect can also be seen in Figure 6a,b, where the outlet temperature T and outlet concentrations A and B are shown at different noise levels. The plots show the difference between the true values of the dependent variables by true values of the parameters and independent variables, and the measured values after parameter estimation and reconciliation of the independent variables. It can be seen that in both cases good agreement of estimated and true values can be achieved. However, at higher noise levels, larger deviations remain, especially for temperature estimates. The influence of the noise levels can also be seen by analyzing the Hessian matrix and the variance-covariance matrix of the parameters at convergence. In Appendix A.3 these two matrices are presented for two different noise levels. It can be seen that, with a higher noise level on the measurements, the curvature of the parameters will be smaller and the scattering of the parameters around the estimated solution will increase, indicating poorer reliability of the estimated parameters. As seen in Table 4a-c, the CPU time is almost unaffected by the noise level. This example shows that the proposed approach is robust and able to provide good parameter estimates for problems with large standard deviations of measurement values. 5. Conclusions Nonlinear parameter estimation with a large number of data sets is a challenging task. As measurement errors are also assumed for the independent variables, not only the number of constraints increases with the number of data sets but also the number of degrees of freedom. For large-scale systems and many data sets


The proposed approach was applied to three practical estimation problems with up to 384 variables. To achieve a high robustness, inequality constraints are added to the sub-NLP problems for certain sensitive state variables. Larger values of the initial Hessian for the sub-NLPs lead to a more robust convergence, if poor starting values for the parameters were used, because this moderates the step change of the independent variables. The CPU time required to solve these three estimation problems is less than 1 s, which is much smaller than those reported in the literature in solving the same problems. The overall computation time increases only linearly with the number of data sets involved. In part 2 of this paper, we will apply this sequential approach to a model estimation problem for an industrial cock-oven-gas purification process. It consists of a combined absorption and desorption process which leads to a rigorous model with several hundreds of variables. Fifteen experimental data sets will be used to estimate seven parameters in the model. In addition, reactive absorption/desorption takes place in the process and a transport model is used to describe the column behaviors, which result in a complex, strong nonlinear equation system. Satisfactory estimation results have been received by using the approach proposed in this paper. Acknowledgment Financial support of this work from the Deutsche Forschungsgemeinschaft (DFG) under the project contract LI806/4-1 is gratefully acknowledged. Figure 6. (a) Outlet data and measurements for noise level 2. (b) Outlet data and measurements for noise level 3.

involved, the problem has to be decomposed to make it accessible for standard NLP software. Known decomposition strategies use extensive mathematical manipulation to tailor the NLP solver to the problem or need second-order derivatives in the decoupling step. In extension of the work by Dovi and Paladino (Comput. Chem. Eng. 1989, 6, 731-735) and Kim et al. (AIChE J. 1990, 36, 985-993), a sequential approach to solving large-scale parameter estimation problems with multiple data sets is proposed in this paper. It is a nested three-stage computational framework. The parameters to be estimated are optimized in the upper stage with NLP. The middle stage consists of multiple sub-NLPs nested in the upper stage, where only the independent variables are considered as optimization variables. The lower stage is a simulation step for computing the state variables and their gradients to the independent variables. As no second-order derivatives and no extensive mathematical manipulations are necessary, the algorithm is easy to implement. Since the total number of variables is decomposed to the number of parameters in the upper stage and the number of independent state variables for a single data set in the middle stage, a standard NLP solver can be used in this estimation approach. Furthermore, a simulation environment can be used in the estimation, with only minor adjustments for computing the Jacobian of the equation system to different types of variables. Finally, the simulation step eliminates the largest part of the variables as well as the equality constraints; thus, there will be no limitation on the dimension of the state variables and model equations.

Appendix A: Hessian Matrices and Variance-Covariance Matrices of Parameters at Convergence To analyze the quality of the parameter estimates, the Hessian matrix and the variance-covariance matrix of the parameters have been evaluated at convergence of the upper stage NLP for all observed examples. The Hessian matrix values H * can be received from the SQP solver. The variance-covariance matrix of the parameters can be computed using the first-order approximation, I

V/Θ )

∑ i)1

( )( ) ∂Θ* ∂zˆ i

Vz

∂Θ*

T

∂zˆ i

where zˆ is the vector of the measured variables and Vz the variance-covariance matrix of the measurement errors. The gradients ∂Θ*/∂zˆ can be computed numerically with the step length ∆zˆ ) 0.01σ for all variables. The values of the elements in the Hessian are expected to be large, meaning a sharp local minimum. But the values of the elements in the variance-covariance matrix of the parameters should be small. The computed matrices for the three examples are given in the following. A.1. Vapor-Liquid Equilibrium Model

H* ) V/Θ )

(

(

391.5165 746.3171 746.3171 1647.23

)

4.1172 × 10-3 -1.907 × 10-3 -1.907 × 10-3 1.0067 × 10-3

)


A.2. Heat Exchanger Network Model (20 Data Sets)

(

H* ) 1055.4347 57.2277 -13.6732 136.3528

(

57.2277 15027.5469 -238.1515 -46.7350

-13.6732 -238.1515 391.7447 100.5692

136.3528 -46.7350 100.5692 3372.9510

V/Θ ) 1.0342 × 10-2 5.2645 × 10-3 7.5819 × 10-3 1.1078 × 10-3 5.2645 × 10-3 5.3308 × 10-3 6.7432 × 10-3 7.2835 × 10-5 7.5819 × 10-3 6.7432 × 10-3 2.3535 × 10-2 6.7607 × 10-4 1.1078 × 10-3 7.2835 × 10-5 6.7607 × 10-4 7.8128 × 10-3

)

)

A.3. CSTR Model Noise level 1

H* ) V/Θ )

(

(

195274.34 -431.9643 -431.9643 1.0853

)

1.00235 × 10-7 3.93698 × 10-5 3.93698 × 10-5 1.731218 × 10-2

)

Noise level 2

H* ) V/Θ )

(

(

78628.25 -183.826 -183.826 0.48669

)

1.3254 × 10-6 5.2212 × 10-4 5.2212 × 10-4 0.2379

)

Literature Cited (1) Yang, N. S.; Chuang, K. T.; Resetarits, M. A new criterion for modeling distillation column data using commercial simulators. Ind. Eng. Chem. Res. 2000, 39, 3308-3313. (2) Kister, H. Z.; Fluor, D. Can we believe the simulation results? Chem. Eng. Prog. 2002, 98 (10), 52-58. (3) Deming, W. E. Statistical adjustment of Data; Wiley: New York, 1943.

(4) York, D. Least-squares fitting of a straight line. Can. J. Phys. 1966, 44, 1079-1086. (5) Peneloux, A. R.; Deyrieux, E.; Neau, E. The maximum likelihood test and the estimation of experimental inaccuracies: Application to data reduction for vapour-liquid equilibrium. J. Chem. Phys. 1976, 73, 706-716. (6) Britt, H. I.; Luecke, R. H. The estimation of parameters in nonlinear, implicit models. Technometrics 1973, 15, 233-247. (7) Esposito, W. R.; Floudas, C. A. Global optimization in parameter estimation of nonlinear algebraic models via the errorin-variable approach. Ind. Eng. Chem. Res. 1998, 37, 1841-1858. (8) Gau, C.-Y.; Stadtherr, M. A. Deterministic global optimization for error-in-variables parameter estimation. AIChE J. 2002, 48 (6), 1192-1197. (9) Rod, V.; Hancil, V. Iterative estimation of model parameters when measurements of all variables are subject to error. Comput. Chem. Eng. 1980, 4, 33-38. (10) Reilly, P. M.; Patino-Leal, H. A Bayesian study of the errorin-variables model. Technometrics 1981, 23 (3), 221. (11) Anderson, T. P.; Abrams, D. S.; Grens, E. A. Evaluation of parameters for nonlinear thermodynamic models. AIChE J. 1978, 24, 20-28. (12) Sutton, T. L.; MacGregor, J. F. The analysis and design of binary vapour-liquid equilibrium experiments. Part I: parameter estimation and consistency tests. Can. J. Chem. Eng. 1977, 55, 602-608. (13) Tjoa, I. B.; Biegler, L. T. Simultaneous strategies for data reconciliation and gross error detection of nonlinear systems. Comput. Chem. Eng. 1991, 15 (10), 679-690. (14) Tjoa, I. B.; Biegler, L. T. Reduced successive quadratic programming strategy for errors-in-variables estimation. Comput. Chem. Eng. 1992, 16 (6), 523-533. (15) Kim, I.-W.; Liebmann, M. J.; Edgar, T. F. Robust errorin-variables estimation using nonlinear programming techniques. AIChE J. 1990, 36 (7), 985-993. (16) Dovi, V. G.; Paladino, O. Fitting of experimental data to implicit models using a constrained variation algorithm. Comput. Chem. Eng. 1989, 13, 731-735. (17) Gau, C.-Y.; Stadtherr, M. A. Reliable nonlinear parameter estimation using interval analysis: error-in-variable approach. Comput. Chem. Eng. 2000, 24, 631-637. (18) Reid, R. C.; Prausnitz, J. M.; Poling, B. E. The properties of gases & liquids, 4th ed.; McGraw-Hill: New York, 1987. (19) Biegler, L. T.; Tjoa, I.-B. A parallel implementation for parameter estimation with implicit models. Anns. Oper. Res. 1993, 42 (1), 1-23.

Received for review April 7, 2003 Revised manuscript received July 29, 2003 Accepted August 21, 2003 IE030296S

Sequential Parameter Estimation for Large-Scale Systems with

Recommend Documents