Simultaneous solution and optimization strategies for parameter

Simultaneous solution and optimization strategies for parameter estimation of differential-algebraic equation systems. Iauw Bhieng Tjoa, and Lorenz ...
0 downloads 3 Views 1MB Size
376

Ind. Eng. Chem. Res. 1991, 30, 376-385

PROCESS ENGINEERING AND DESIGN Simultaneous Solution and Optimization Strategies for Parameter Estimation of Differential-Algebraic Equation Systems Iauw-Bhieng Tjoa and Lorenz T. Biegler* Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213

This paper addresses the development of a new hybrid successive quadratic programming (SQP) method for solving nonlinear parameter estimation problems. Here, we show that the Hessian information for quadratically convergent algorithms can be constructed using only first derivatives for small residual problems. Furthermore, it has been combined with a tailored quasi-Newton update formula to deal with large residuals. Moreover, the properties of reduced spaces have been exploited to deal very efficiently with large-scale problems. These strategies have also been coupled t o a large-scale modeling system (GAMS) which allows for the easy construction of complex process models. Here, a dynamic model is used t o illustrate one of the possible applications of the method. The model is discretized by orthogonal collocation on finite elements. With this approach, both solution and optimization occur simultaneously, and constraints on the state variables can be imposed directly. 1. Introduction

Mathematical models are often used to describe chemical processes. Many of the more challenging problems involve the use of computationally intensive models, such as those described by ordinary differential equations (ODE’s). Developing an efficient method for solving parameter estimation problems for these models is an important part of the development and improvement of process models. For example, a better knowledge of kinetic rate constants in the modeling of chemical reactions will help in choosing operating conditions that favor the desired products. Also, heat of reaction and raw material conversions can be integrated more efficiently into process flowsheet optimization. A set of measurements is needed to estimate parameters in the model. These measurements often contain inherent errors. Here, maximum likelihood estimation (MLE) is often used as a method of estimation. Moreover, if the errors have a normal distribution with know covariance, MLE reduces to weighted least squares. Under these conditions, we show that efficient optimization techniques can be developed by exploiting the least-squares structure for constrained parameter estimation. Using a weighted-least-squares objective function, the parameter estimation problem involving an ODE model can be formulated as follows: min 0

s.t.

@(e)=

2b,ePTWe, r=l

x(0) = xg

(P1)

where e,, p x, - f,; e, = error or residual a t pth experi-

* Author to whom correspondence

should be addressed.

0888-5885/91/2630-0376$02.50/0

mental measurement; x = s-dimensional state variable vector; x, = x at pth experimental measurement; f , = experimental measurement of x,; Q = objective function; b, = weighting factor for each experimental measurement; W = s by s weighting matrix among the state variables, such as the inverse of the covariance matrix; B = p-dimensional parameter vector; t = independent variable (time); and x o = s-dimensional initial value of the state variable vector. Traditionally, this problem is solved by minimizing the objective function using an unconstrained optimization method in the outer loop, while the evaluation of the objective function and its gradients is done by numerical integration of the ODE’s in the inner loop. This approach is also called a feasible path approach because the estimated parameters always satisfy the model. Several methods based on this approach have been reported by Nowak and Deuflhard (1985), Caracotsios (1986), and Varga et al. (1988). Also, additional feasible path methods have been reviewed by Bard (1974) and Biegler et al. (1986). A commonly used optimization strategy in this approach is the Gauss-Newton (GN) method. This method is essentially a Newton method where the calculation of the Hessian has been simplified by assuming that the residual (error) terms are small at the solution and thub can be neglected. This simplification allows us to calculate the Hessian based only on first-order derivatives, and yet a q-quadratic rate of convergence of Newton method can be retained. However, the assumption of small residuals is not always valid. When the residuals are not small, a Gauss-Newton method may only have a linear rate of convergence. Thus several modifications of this method have been made in order to deal with unconstrained least-squares problems efficiently. Such alternative methods involve Levenberg-Marquardt, rotational discrimination, and hybrid methods, and have been developed by Dennis et al. (1981), Fariss and Law (19791, Al-Baali and Fletcher (1985), and Fletcher and Xu (1987). 0 1991 American Chemical Society

Ind. Eng. Chem. Res., Vol. 30, No. 2, 1991 377 With the conventional approach, intermediate parameters during the optimization can make the ODES become stiff and often make the numerical integration of the ODE’s more difficult and inefficient. Furthermore, most conventional approaches cannot directly handle process constraints, other than bounds on the parameters. Here in the best case, the model may return a flag indicating an infeasibility, which must be handled in some manner by the estimation method. On the other hand, the simultaneous method incorporates additional constraints directly into the optimization problem. For the simultaneous approach in this study, the ODE model is discretized using orthogonal collocation on finite elements. The result is a set of algebraic equations that can be embedded directly into the optimization together with any other process constraints. Thus the problem becomes a constrained least-squares problem. Simpler collocation approaches for parameter estimation have been applied by Hertzberg and Asbjerrnsen (1977) and Baden and Villadsen (1982). Although this problem can be solved by use of general constrained optimization methods, such as reduced gradients or successive quadratic programming (SQP), it is desirable to use a special-purpose method that can take advantage of the least-squares structure of the objective function. Unlike unconstrained optimization methods, a special constrained optimization method for dealing with constrained least-squares problems is not generally available. Here, we develop a hybrid SQP method. The treatment of the Hessian approximation for small residuals is parallel to that of an unconstrained GN method, which can have a q-quadratic rate of convergence, while a structured BFGS (see Dennis et al., 1981) update formula is used for large residuals, which gives the method at best a superlinear rate of convergence. On the basis of the outcome of a switching rule, GN or BFGS steps are taken in the hybrid SQP method. Thus our hybrid SQP method combines the best of the Hessian approximation and is expected to perform more efficiently than that of a general SQP method. In the next section, a reformulation of the original problem into a nonlinear programming (NLP) problem using orthogonal collocation is discussed, followed by the development ofthe hybrid SQP method in section 3. In section 4, the effectiveness of the hybrid method is demonstrated on several example problems, while the results of a comparison study are discussed in section 5. The conclusion of the results is presented in section 6. Finally, additional information for the example problems is placed in the Appendix (Tables 111-XI). 2. Reformulation of ODE-Based Parameter Estimation An ODE-based optimization problem can be converted to an algebraic optimization problem by discretizing the model using orthogonal collocation on finite elements. The resulting algebraic equations can be written as equality constraints in an NLP formulation. This formulation allows process constraints to be embedded directly. Orthogonal collocation at Gaussian points has been shown equivalent to fully implicit, symmetric, algebraically stable Runge-Kutta (RK) schemes (Ascher and Bader, 1986). In establishing the equivalence between this method and a RK method, a number of important theoretical properties of RK, normally associated only with numerical integration schemes, can be applied directly. Further discussion on stability and accuracy of this method can be found in Cuthrell and Biegler (1989). Here, the discussion on collocation is parallel to that given by Cuthrell and Biegler (1987), and a more detailed discussion can be found in this

reference. We approximate the state profiles as K xiK+l(t)

=

(1)

Cx[ijl‘k[ijl(t)

j=O

+ T j ( a i + l - ai) = ai + ~ j A a i ~ =+ (K ~ +( 1)th t ) order Lagrange polynomial at

t ~ i j l=

ai

where ~ ~ ith finite element; xpl = polynomial coefficient at ith finite element and j t h co location point; ‘kIv1(t)= a polynomial of degree K; k = O j means lz = 0, ..., I - 1,j + 1, ..., K , ai = location of the ith knot or element boundary; and T,, i k = the collocation points which are the shifted roots of the orthogonal Legendre polynomial of order K. Using Lagrange polynomials (l), the ODE’s are discretized by forming residual equations at the collocation points and forcing them to satisfy the model exactly. These equations can be written as K R ( t [ i j ] ) = C X [ i k ] @ k ( T j ) - A a i F ( ~ [ i j l , B , t [ i j l= )

k=O

i = 1, 2,

O

..., N E ; j = 1, 2, ...,K

(2)

where @ = d‘k/d7; NE = the total number of finite elements; and K = the number of collocation points in each finite element. To ensure that the polynomials are continuous a t the interior knots, the polynomials between adjacent elements are forced to be continuous by imposing the following continuity constraints: K

io] =

x~i-lj*j(~=l) 1=

2, 3,

..., NE

(3)

j=O

Thus, problem P1 can be reformulated using orthogonal collocation on finite elements as follows: r

brepTWer

min @ ( x i ) = xi

p=l

s.t.

i= 1, 2, ..., NE; j = 1, 2, ..., K (P2) K

lo]

-,Fo~i-lj‘kj(~=l) X[lO]

= xo

X [ i j ] L IX [ i , ] eL

1 = 2, 3, ..., NE

Iqijl”

IB IBU

In this formulation, the location of the knots is determined based on the profile of the experimental data; “steep” regions are approximated using more elements, while “flat” regions are approximated using less elements. This is an easier approach than adaptive adjustment. Problem P2 contains n variables which consist of m [=s(NE)(K+ l)] collocation coefficients and p parameters; all of these variables need to be optimized subject to m equality constraints as the result of discretizations. However, the degrees of freedom for optimization remain as many as the number of parameters, which is generally small. This property is also exploited in the development of the hybrid SQP method. 3. Hybrid SQP Method for Constrained Least-Squares Problems Problem P2, in general, can be written as follows:

378 Ind. Eng. Chem. Res., Vol. 30, No. 2, 1991

u"l[d;J

[IrY :TBZ ZTSY ZTSZ O

min +(z) = CbpepTWe, Z

p=l

s.t.

h(z) = 0, h: 3"

-

Bm

(8)

d, = -

ZTv@

where

R = YTVh where

On the basis of this decomposition, the reduced QP subproblem in the absence of inequalities constraints is given at iteration i as

The SQP method is known as an efficient optimization strategy for solving small to moderate-sized NLP problems; it usually takes fewer iterations to converge than other methods. For this reason, we develop a special-purpose method based on SQP method. The basic idea for SQP is that we try to solve a quadratic programming (QP) subproblem at every iteration to find a new search direction (d). This QP subproblem is formed by making a quadratic approximation of a Lagrange function subject to linearized constraints, and it can be written as follows for iteration i: min VaT(zi)d+ '/dTBd d s.t.

h(Zi) + VhT(zi)d = 0

zL Izi + d

Izu

(P4)

where B is an approximation to the Hessian of the Lagrange function. For inactive bounds, the first-order necessary (KKT) conditions for (P4) are given in matrix form as (4)

Solving the QP subproblem is the most time-consuming step, and solving large-size QP subproblems can be very expensive. The size of problem P 3 is generally large but with few degrees of freedom. Thus a decomposition strategy can be used to reduce the size of the QP so that it is formed only in the space of independent variables (i.e., the null space of the projected Hessian). Here, a range and null space decomposition (RND) strategy for large-scale problems is used to exploit the relatively small size of the reduced space (Vasantharajan and Biegler, 1988). Defining a [n X p] matrix Z(z) whose columns span the null space of VhT and a [n X m] matrix Y (2) whose columns span the range space of Vh as shown in (5), the search direction, d, can be decomposed into the movement in the null and range spaces, d, and d, respectively, in (6). Z(z) =

[IA], Y(z) [I;"3, and =

YTZ = 0

(5)

where

A = (V,h(~~)~)-'V~h(z~)~ d = Yd,

+ Zd,

(6)

Substituting for d and then premultiplying (4) by matrix Q of order (n + m),as shown in (7), the KKT conditions of (P4) can be written as (8). (7)

+

min ( V @ ( z i ) BYd,)TZd, d* s.t.

zL Izi

+ f/,dZT(ZTBZ)d,

+ Zd, + Yd,

Izu

(P5)

with

d, = -[I - A(I + ATA)-'AT](VxhT)-'h(zi)

(9)

and multipliers simplified to

u = -(VXh)-' [I - A(I + ATA)-'AT]YTV@(~i)(10) For a large and sparse optimization problem, the Hessian generally is not positive definite, but the projected Hessian in the null space is at least positive semidefinite; it is positive definite when the parameters are independent. If a quasi-Newton formula is used to approximate B or Z%, then the method can achieve, at best, a q-superlinear rate of convergence. Due to the large size of the full Hessian, the RND method updates only the projected Hessian in the null space. Thus the projected Hessian in the range space, Z-Y, is neglected. If the constraints are linear, the term ZTBY is irrelevant because, given an initial feasible point, there will be no range space movement in all subsequent iterations. However, if the constraints are nonlinear, neglecting ZTBY deteriorates the rate of convergence to two-step q-superlinear; thus the general-purpose RND method can achieve a t best a two-step q-superlinear rate of convergence. Since the objective functions considered here have a least-squares structure, it allows us to make an analysis similar to that of the Gauss-Newton method for small residuals. The result is a method that can retain a qquadratic rate of convergence for small residual problems. The treatment for small residuals will be discussed first, followed by treatment of large residual problems. Partitioning the variables into dependent and independent variables x and 6 respectively, the Hessian can be presented as follows:

where L(x,B,u) = + ( x )

+ uTh(x,6)

Ind. Eng. Chem. Res., Vol. 30, No. 2, 1991 379 The first term ( l l d ) is the only term that does not contain the Lagrange multipliers. The multipliers given by (10) contain V@, given by r

UQ, = 2

b,,VePTWe,,

(12)

p=1

Assuming that the residuals, e,,, are small, VQ, can be neglected. Thus the Lagrange multipliers are also small and can also be neglected. Then the Hessian can be reduced to

1 On the basis of this approximation, the projected Hessian in both range and null spaces can be calculated analytically using only first-order derivatives. The coefficients of the objective function of the reduced QP subproblem (P5) are given by (V@

+ BYdy)TZ= -2 2b,,(e,,+ V,e,dy)TWV,e,A

(14)

p=l

r

G

ZTBZ = AT[2C b,,V,e,,TWV,e,]A

(15)

p=l

Here the reduced Hessian, G , is at least positive semidefinite. If the neglected terms of the Hessian converge to zero as the solution is approached, and if the Hessian is positive definite, then the use of (14) and (15) will lead to a qquadratic rate of convergence. However, when the residuals are not small, the GN approximation to the Hessian may not be adequate because the norm of the neglected terms can be significant compared to the BGNterm. When this is the case, this method may only achieve a linear rate of convergence. Since we also know that quasi-Newton update formulas, such as the BFGS update, can achieve superlinear rates of convergence, it is desirable to use this formula for calculating the Hessian for large residual problems. By decomposing the actual Hessian into two parts, the Hessian in (11) can be written as follows:

or B = BGN+ S (16b) Since BGNcan be calculated exactly, it is desirable to incorporate this structure for the development of BFGS update formula. This leads to the following tailored BFGS update formula for calculating the projected Hessian in the null space:

a more accurate approximation to the Hessian than the general BFGS update formula. Equation 17 is a constrained analogue of the unconstrained quasi-Newton update formulas developed by Dennis et al. (1981) and Al-Baali and Fletcher (1985). Moreover, a constrained extension of unconstrained quasi-Newton switching rules developed by Fletcher and Xu (1987) is described next. So far we have developed two ways for calculating the Hessian based on the magnitude of the residuals a t the optimum. Unfortunately, the magnitude of the residuals is unknown before solving the problem. Thus we combine both methods in the hybrid SQP method and use a switching rule to decide whether the Hessian should be calculated on the basis of a GN approximation or a BFGS update formula. Assuming that the parameters are independent, then the reduced Hessian generated by GN approximation is always positive definite. If GN approximation is used to generate the Hessian a t the first iteration, then the Hessian generated by using BFGS updates will always be positive definite. Thus the hybrid SQP method is a descent method. To make this method globally convergent, a linesearch method developed by Biegler and Cuthrell (1985) is used. This line search uses an augmented Lagrange function as the merit function for improvement:

L * ( z , ~ ,= Y )Q,(z) + uTh(z) + '/~llh(z)11~

where y is a penalty parameter. Here, the progress of the merit function is used to monitor the accuracy of the Hessian approximation. For small residuals, GN approximation to the Hessian is adequate and the method will make a good progress. However, the GN approximation may not be adequate for large residuals. Thus, switching to BFGS approximation is made in the next step to give a more accurate approximation. Here, we use the following switching rule: L*i - L*.1+1 (19) (= L*i Now suppose zi converges to z* with SQP, then the equality constraints converge to zero at a quadratic rate (O(lld11)2). Therefore, this condition also implies that ai - Q,i+l 0. Asymptotically, for nonzero residuals, the following limit holds:

-

However, for zero residuals (@(z*) = 0), the order of convergence is at least superlinear. This implies that the value of L*i+lfor i m is much smaller than and the following limit holds: L*. - L*.l + l lim ( = =1 i-m L*i

-

On the basis of these asymptotic behaviors, the following switching rule is used to decide if GN or BFGS steps should be taken. Bi+1 =

where si = vdr

yi = Zi+lTIVL(zi+l,ui+l)- VL(zi,ui+1)]= zi+ lGNZi+lSi- zi+lTVhiUi+l and 7 is a step size in the line search. This update formula is tailored to the least-squares objective function and gives

(18)

otherwise

(22)

where t E (0,l) is set as a parameter. From our experience, t = 0.1 is recommended (Fletcher and Xu recommended 0.2 for the unconstrained case). This algorithm has also been coupled to a large-scale modeling system (GAMS) which allows for the easy con-

380 Ind. Eng. Chem. Res., Vol. 30, No. 2, 1991

struction of complex process models. Furthermore, GAMS also provides analytical gradients; thus it allows the routine to perform more efficiently. The algorithm is also implemented such that the user can choose either the BFGS, the GN, or the hybrid method by specifying the option in the GAMS files. To demonstrate the effectiveness of the simultaneous strategies, several kinetic modeling problems with various complexities are solved in the next section. These problems have different magnitudes of residuals; thus the behavior of the hybrid SQP method can be studied.

4. Example Problems Eight examples of chemical kinetics modeling, with various degrees of complexity, and one nearly singular problem are used for a case study. The first three examples have zero residuals, the next two examples have small residuals, and the last three examples have large residuals. The type of objective function used for all examples is ordinary least squares, as has been suggested by literature sources. The description of the problems will be given briefly here; additional information can be found in the Appendix (Tables 111-XI). All problems are solved on a VAXstation 3200. Finally, a comparison study of the hybrid method is performed with a number of SQP-based methods, as well as with MINOS and the GREG package. The results from the hybrid method are discussed here, while the comparison study is discussed in the next section. Problem 1: First-Order Irreversible Chain Reaction. A simple first-order irreversible isothermal chain reaction starting from liquid reactant A to liquid product B and then to liquid product C in a batch reactor can be modeled by using the following equations:

The data, generated from the model, are fitted using third-order Lagrange polynomials on five finite elements. The method never switches to BFGS approximation; this shows that GN approximation is adequate for zero residuals with linear constraints. The objective function at convergence is 6.269 X lo4, and the optimal parameters for k, and k2 are 5.002 and 1.OO0, respectively. The method recovers accurately the actual parameters. Problem 2: First-Order Reversible Chain Reaction. If the reaction in problem 1 is extended to reversible reaction, the model can be represented as

I

1.09

0.6

Q

4

0.4

9

0.0

0.0

0.2

0.4

0.6

0.8

1.0

t

Figure 1. Comparison of the data and the fitted values for problem 3.

Hessian at convergence, which vary from about to lo3, compared to lo3 to lo4 for problem 1. Problem 3: Catalytic Cracking of Gas Oil. A - Q

kt

This reaction describes an overall reaction of catalytic cracking of gas oil (A) to gasoline (Q) and other products (S). This model, proposed by Froment and Bischoff (1979), is described by the following equations:

The data, also generated from the model, are fitted using fourth-order polynomials on three finite elements. Although the state variables appear nonlinearly in the model, this model is easier to solve than the previous problem because the reduced Hessian is well conditioned. The eigenvalues of the reduced Hessian at convergence vary from about lo1 to lo2. The method also never switches to BFGS approximation during the iterations; this shows that GN is also adequate for zero residuals with nonlinear constraints. The objective function at convergence is 8.221 X and the optimal parameters for k,, k,, and k3 are 11.948,7.993, and 2.024, respectively. The estimated parameters are very close to the actual parameters. The measurements and the fitted values are plotted in Figure 1. Problem 4: A Single-Equation Least-Squares Problem. The model, which describes a simple first-order nonisothermal reaction of a reactant A to a product B, is represented by the following equation (Bard, 1974):

_ dY dt - -ol The data, generated from the model, are fitted by using fourth-order polynomials on six finite elements. As expected, the method never switches to BFGS approximation during the iterations. The objective function converges to 4.125 X and the optimal parameters for kl, k,, k3, and k, are 3.997, 1.998, 40.538, and 20.264, respectively. The contours of the objective function near the local optimum are flat; thus k3 and k, can have slightly different values without significantly changing the objective function. This problem is harder to solve than the previous one because the reduced Hessian is not well conditioned. This can be seen from the eigenvalues of the reduced

YA-estimate YQ-estimate

0.2

3).

(25) (26)

YA-data

+ YQ-data

Yi

ex.(

Parameter 8, is the frequency constant, and parameter 4, which appears nonlinearly, is the activation energy. It should be noted, however, that the form of parametrization (where T , is an average temperature)

e’, exp(

2 $) -

(30)

does lead to better parameter estimates. Unlike the previous examples, the actual measurements for this problem are available. Bard solved this problem with the algebraic model, using an unconstrained optimization strategy to minimize a least-squares objective

Ind. Eng. Chem. Res., Vol. 30, No. 2, 1991 381 function. Here, we use the ODE model directly. Fourth-order polynomials on three finite elements are used to fit the data. After the first iteration, the BFGS update is used for the next two iterations due to slow progress. It switches back to GN approximation for one iteration before finally using BFGS update until it converges. This behavior is expected when the residuals are not zero. This problem is more difficult to solve than problem 3 because parameter O2 appears exponentially; the order of magnitude of the eigenvalues of reduced Hessian at convergence varies from to lo2. Problem 5: The DOW Chemical Problem. Unlike the previous examples, this example, formulated by the Dow Chemical Company (Blau et al., 1981), illustrates parameter estimation problems in the industrial environment. A case study of various methods for solving this problem can be found in Biegler et al. (1986). The original formulation consists of six differential and four algebraic equations. This formulation can be reduced to three differential equations by making further assumptions (Villadsen et a1.-see Biegler et al.): -2% + x4 + 2Yl - Y2 + Y3 k g l Y z ~ l + P1(-X3 + x4 + YI - YZ) + P o3

-dY1 =-

dt

(31)

% = - klyz(xz+ 2x3 - x4 - 2y1 + y2 - y3) dt -2% + x4 + 2Yl - Y2 + Y3 k2Y1Y2Yl

+ P1(-X3 + x4 + Y1 - Y2) + P o 3

+ k-101(-x3 +

-2% + x4 + 2Yl - Y z + Y3

x4

+ "

- '"y1

+ @1(-x3 + x4 + y1 - yz) + Po3 (32)

dY3

- = k,(x, - ~1 - Y ~ ) ( X Z+ 2x3 - 3 ~ 4- 2 ~ +1 ~2 - ~ 3 +) dt -2% + x4 + 2Yl - Y2 + Y3 kzYIYzyl+ P1(-x3 + x 4 + y1 - Y2) + P o3 -2% + x4 + 2Y, - Y2 + Y3 -1 (33) 2k-1Pfi3Yl + @1(-x3+ x4 + y1 - y2) + b o 3 where

Here, y i and x i represent state variables and constants, respectively. The parameters K,, K,, and K3 in this model cannot be estimated independently; instead the ratio of Kl/Kz and K3/K2will be estimated. Thus the model has five parameters (kl, k 1 , k2, &, and P2). The data from the measurements of species y1 and y3 at temperature 67 O C are used in the estimation. We use fourth-order polynomials on 15 finite elements to fit the data. During the course of attempting to solve this problem, the dependencies among the parameters are discovered, which cause singularity of the reduced Hessian. After fixing kl,the other parameters can be estimated successfully. The objective function a t convergence is 0.0328, and the optimal parameters for k,, k2, pl, and P2 are 1.726, 2.312,0.006, and 0.004, respectively. The measurements and the fitted values are plotted in Figure 2. This problem is difficult to solve because the model is highly nonlinear. Furthermore, the reduced Hessian is somewhat ill conditioned, the eigenvalues of the reduced Hessian a t the convergence vary from 10' to lo9. Problem 6: Bellman's Problem. This problem, taken from Varah (1982), describes a reversible homogeneous gas-phase reaction of nitrogen oxide. The forward reaction is third order and the reverse reaction is second order. The

t 0

Y1 (HA)-dala Y3 (HABM)-data Y4 (AB)-data + Yl-estimate +- Y3-estimate 0 Y4-estimate Q

+

*

r

0

- - - - -- . -200 -

100

-

1

300

time (hr)

Figure 2. Comparison of the data and the fitted values for problem 5.

model, after various normalizations, is described by the following equation (Bellman et al., 1967):

- y)' - P o 2 (34) dt Here, we use third-order polynomials on three finite elements to fit the data. The objective function at convergence is 23.00, and the optimal parameters for p1 and p2 are 4.604 X and 2.847 X respectively. The objective function at the optimum is fairly large due to the magnitude of the data which represent the observed pressure. However, the predicted values fit the measurements fairly well. Although the eigenvalues of the reduced Hessian a t convergence are on the order of 1010-1016,the method converges without difficulties. Problem 7: Barnes' Problem. This problem, also taken from Varah (1982), describes a chemical reaction. This is also known as Lotka/Volterra predator-prey model in ecology. The model is represented by the following equations: dy = pl(126.2 - y)(91.9

dY1

dt = P1Y1-

PzYlY2

(35)

dt = Poly2 - PLY2

(36)

dY2

The solutions to this system are oscillatory in nature and out of phase with each other; moreover the data are only accurate to about 10%. Thus we expect to have large residuals a t convergence; for this reason we also consider the initial conditions of the model as parameters. This problem is solved using fourth-order polynomials on four finite elements with the same location of the knots as suggested by Varah. The objective function at convergence is 0.102, and the optimal parameters for y,(O), yz(0),pl, p2, and p3 are 0.993, 0.217, 0.819, 2.299, and 2.008, respectively. Although this problem is relatively easy to solve because the model is linear and the Hessian is well conditioned (the order of the eigenvalues varies from 10' to lo4), further improvement cannot be obtained by using more finite elements due to large variance of the data. Problem 8: @-PineneProblem.

041th YS

This reaction describes the thermal isomerization of a-pinene (YJ to dipentene (Y2) and alloocimene (Y3), which in turn yields a- and 0-pyronene (Y4) and a dimer

382 Ind. Eng. Chem. Res., Vol. 30, No. 2, 1991 100 9

t

30

I

Yl-data -C Y2-data -8- Y3-data 9 Y4-dala + Y5-data Q Yl-estimate -& Y2-estimate Y3-estimate + Y4-estimate + Y5-estimate Q

60

Yi 40

* 20 0

0 0

20000

10000

30000

40000

Time (min)

1 0

,

,

2

.

I

4

r

,

6

.

, 8

. , . I 10

12

t

Figure 3. Comparison of the data and the fitted values for problem 8.

Figure 4. Comparison of the data and the fitted values for problem 9.

(YJ. This homogeneous chemical reaction is studied by Fuguitt and Hawkins (1947). Assuming that all reactions are first order, Box et al. (1973) used the following model:

Here, we reformulate the problem as a constrained least-squares problem. This problem has large residuals and highly nonlinear constraints and shows when GN approximation is not adequate to represent the Hessian. In this study, our GN SQP approximation fails to converge after 100 iterations, while the hybrid method converges successfully. The objective function a t convergence is 124.361, and the optimal parameters for both B1 and B2 are 0.258. The measurements and the fitted values are plotted in Figure 4. This problem illustrates the robustness of our method in dealing with nearly singular problems.

-dY2- - 4Y, dt

dY4 _ -003 dt

This problem has been solved by Box et al. (1973). They showed that there are dependencies in the data and thus only three independent linear combinations of the five responses are used in the estimation. Here we use the entire data set and the least-squares objective function to gauge the behavior of these methods with dependencies in the data. The data are fitted using fourth-order polynomials on 10 finite elements. The objective function at convergence is 19.869, and the optimal parameters for B1, B2, B,, B,, and B, are 5.926 X lo-,, 2.963 x 2.047 X lo-,, 2.744 X and 3.997 X respectively. These parameters are the same as the values reported by Box et al. using the same least-squares objective function. The measurements and the fitted values are plotted in Figure 3. These results show an excellent fit to the data. Here, we show the robustness of our approach of estimation. However, since Box et al. have shown the dependencies in the data, the estimated parameters using the entire data may not give meaningful results, in view of the broad confidence intervals calculated from these data. These results also show the important of analyzing the data before performing parameter estimation. Problem 9: Jennrich’s Problem. Jennrich and Sampson (1968) studied a nearly singular problem by fitting an exponential equation (42) to a data set which =

+ eW

(42) follows a straight line. To avoid failure in the optimization, Jennrich used stepwise regression, which is a modification to the unconstrained Gauss-Newton method. The purpose of including this problem in our case study is to show the case when our GN approximation fails to converge. It can be shown that, at the solution, parameter B1* should be equal to parameter 02*; thus the Hessian is nearly singular in the neighborhood of the solution. The model illustrates overparametrized problems. f

5. Discussion and Comparison of Results To show the effectiveness of the hybrid SQP method, we have solved the above example problems using various SQP-based methods where the Hessian is calculated using different approximations, such as BFGS update formula and GN approximation. We also used the MINOS 5.2 solver which is also available through GAMS. MINOS is an optimization software package developed by Murtagh and Saunders (1983) and based on a reduced gradient method. The basic idea of the reduced gradient method is to solve a sequence of subproblems with linearized constraints. The nonlinearities of the constraints are then adjoined to the objective function with Lagrange multipliers. A penalty parameter is also included in the objective function to enhance convergence. The optimization is only done in the reduced space and is very efficient for solving large problems with mostly linear constraints. Here, a major iteration refers to a nonlinear optimization with linearized constraints. Similar to SQP, MINOS can also be used as a simultaneous approach. The number of iterations reported in this study refers to the major iterations. Also, a comparison of the simultaneous strategies is done with the GREG package (Caracotsios, 1986) which is a very efficient implementation of a conventional parameter estimation strategy. The optimization is done a t each iteration by minimizing a quadratic approximation to the objective function. The algorithm is basically a modified GN method that has been extended to handle simple parameter bounds. Furthermore, a trust region strategy is also used to handle singular Hessians, which occur when there are parameter dependencies. This method is one of the most efficient current optimization strategies for parameter estimation. Here, the differential equations and the sensitivity equations are integrated by use of the DDASAC package, which can calculate the sensitivity coefficients very efficiently. The form of the objective function can be selected as either an ordinary least-squares or multiresponse Bayesian estimation. Finally, the function evaluations for GREG are not the same as for MINOS and SQP, since GREG repeatedly solves the model. The simultaneous methods solve the model

Ind. Eng. Chem. Res., Vol. 30, No. 2, 1991 383 Table I. Number of Iterations (Objective Function Evaluations) for the Case Study SQP-based methods problem GREG MINOS BFGS GN hybrid 8 (41) 8 (10) 8 (72) 14 (17) 8 (55) 30 (31) 8 (58) 13 (18) 15 (236) fail 7 (72) 13 (21) 10 (150) 24 (30) 21 (271) 37 (61) 33 (320) 19 (24)

Table VI. Additional Information for Problem 4 specifications description

Table 11. ComDarison of CPU Times for the Case Study SQP-based methods Droblem GREG MINOS BFGS GN hybrid 1.9 3.8 2.4 1.1 1.0 9.0 2.7 2.7 10.0 6.0 12.1 1.7 1.7 3.4 5.7 4.0 2.8 3.5 3.0 5.0 18.1 149.0 fail 50.8 51.1 2.1 0.9 0.9 3.2 2.9 7.7 2.8 3.3 7.6 9.4 90.8 23.5 23.6 5.4 64.8 0.9 5.6 1.7 fail 1.6

Table VII. Additional Information for Problem 5 description specifications

Table 111. Additional Information for Problem 1 description specifications

"Based on polynomial interpolation of data. bSee Blau et al. (1981).

total no. of variables total no. of equality constraints init conditions for Y A and yB init values for k, and k z init values for collocation coefP location of the knots data generated using k, = 5, kz = 1 a t t

Table VIII. Additional Information for Problem 6 description specifications

" Based on polynomial

52 50 1 3

" Based

1

750

1200

0.10 0.05 0.02

0.30 0.15 0.06

See Bard (1974).

on polynomial interpolation of data.

total no. of variables total no. of equality constraints parameter k-, fixed a t init conditions for y l , y2, and y 3 init values for k,, k,, &, and & init values for collocation coeff" location of the knots

230 226 240.5 1.6497 8.2262 0 2.821 2.821 0.1 0.4 50.0 230.0

1.5 80.0 250.0

0 4

interpolation of data.

total no. of variables total no. of equality constraints init conditions for y init values for p1 and p2 init values for collocation coeff" location of the knots datab

2.5 3.5 120.0 180.0 300.0

26 24 0 1.0 x 10*

1.0 x 10*

5

20

" Based on polynomial interpolation of data.

Table IV. Additional Information for Problem 2 specifications descrbtion

See Varah (1982).

Table IX. Additional Information for Problem 7 description specifications

0 10

30

30

0.28 0.10 0.35 0.60 0.85

0.42 0.15 0.40 0.65 0.90

0.6 0.20 0.45 0.70 0.95

total no. of variables total no. of equality constraints init values for y1 and y z init values for p l , p , , and p 3 init values for collocation coeff location of the knots datab

0.8 0.25 0.50 0.75 1.00

59 54 1.0 2

0.3 4

4

0.9

2.1

3.6

"Based on polynomial interpolation of data. bSee Varah (1982).

Based on polynomial interpolation of data.

Table V. Additional Information for Problem 3 description specifications

Table X. Additional Information for Problem 8 description specifications

total no. of variables total no. of equality constraints init conditions for and yq init values for k,, k,, and k, init values for collocation coefP location of the knots data generated using k l = 12, k , = 8, k, = 2 a t t

total no. of 245 variables total no. of equality 240 constraints init conditions for 100 Y1, Y2r Y3. Y.43 and y5 init values for 8,, e,, 6'3, Oqr and B5

" Based

67 64 1 6

0 41

0.1

0.3

0.025 0.150 0.300 0.550

0.050 0.175 0.350 0.650

0.1

datab

0.2 0.4 0.6 0.8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

total no. of variables 92 total no. of equality constraints 88 init conditions for Y A and 1 init values for k,, k,, k,, and k, 10 init values for collocation coefP 0.14 location of the knots data generated using k , = 4, k 2 = 2, 0.05 k, = 40, k4 = 20 a t t 0.30 0.55 0.80

53 51

total no. of variables total no. of equality constraints init conditions for y init values for 0, and 0, init values for collocation coeff" location of the knots a t T = 100 K location of the knots a t T = 200 K location of the knots a t T = 300 K datab

0.075 0.200 0.400 0.750

0.100 0.225 0.450 0.850

0.125 0.250 0.500 0.950

on polynomial interpolation of data.

only once. Unfortunately, the CPU times obtained from VAXstation 3200 have about 10% uncertainty depending on the load of the machine when we run the program.

init values for collocation coefP location of the knots

1230 8000 20000

0

0

0

0

2.65 x 10-5 2.777 x 10-4

5.84 x 10-5 1.63 x 10-5 4.61 x 10-5 3000

11000 25000

4500 15000 30000

datab (I

Based on polynomial interpolation of data. *See Box et al. (1973).

384 Ind. Eng. Chem. Res., Vol. 30, No. 2, 1991 Table XI. Additional Information for Problem 9 description specifications total no. of variables total no. of equality constraints init values for O1 and 8, init values for residual variables" datab

12 10 0.3

0.4

Based on the model using initial parameters. *See Jennrich and Sampson (1968).

Thus the CPU times also should not be interpreted strictly. The number of iterations and function evaluations are reported in Table I, while the CPU times are reported in Table 11. In the first three problems, where the residuals are zero, the hybrid method never switches to a BFGS approximation because GN approximation accurately represents the actual Hessian; thus the steps are exactly the same as the GN method. Although the hybrid method takes about the same number of iterations as the GN method for the next five problems, the hybrid method uses both GN and BFGS approximations because the residuals are not zero. In problem 4,after GN approximation is used in the first iteration, the BFGS update formula is used for the next two iterations due to slow progress. It switches to GN approximation for one iteration before finally using BFGS update until it converges. In problem 5, the method uses GN approximation for the first four iterations and BFGS approximation for the next three iterations, and then it switches back to GN approximation for one iteration and switches to BFGS approximation again until convergence. This behavior shows that the method is capable of switching back and forth between GN and BFGS approximations depending on the improvement of the merit function. In general, the hybrid method will use GN approximation for the first few iterations and switch to BFGS approximation when there is slow progress in the merit function. Finally, for Jennrich's problem, which not only has large residuals but also highly nonlinear constraints, our GN approximation is not adequate, and it fails to converge after 100 iterations; the hybrid method converges successfully. Again, we show the need to have the hybrid method which is more robust for handling a variety of problems. As expected, the performance of the hybrid method is better than the general SQP method where the reduced Hessian is approximated with the use of the BFGS update formula. Depending on the problems, the improvements are of a factor slightly more than 1 up to 10. The reason is that the general BFGS approximation usually starts with an identity matrix. As the iterations proceed, the updated Hessian eventually resembles the actual Hessian, while the hybrid method starts with a better approximation for least-squares problems. Thus the Hessian in the hybrid method resembles the actual Hessian at the earlier iterations and converges faster. The hybrid method is not only faster but also more robust than the BFGS approximation. This is true especially for highly nonlinear problems with an ill-conditioned Hessian, such as problem 5 where the BFGS approximation fails to converge after 100 iterations. Since MINOS and SQP methods solve different subproblems, the number of iterations should not be compared directly. Instead, the CPU times can be used to illustrate the performance of the hybrid method compared to that of MINOS. The CPU times for MINOS are roughly 2-3 times that of the hybrid method. The reason is that MINOS is not designed specifically for solving nonlinear least-squares problems; thus the Hessian is approximated with the use

of quasi-Newton update formulas. The purpose of solving the problems using the GREG package is to show that simultaneous strategies without a special-purpose optimization algorithm may not be faster than a conventional strategy which does use special-purpose optimization methods. These can be seen by comparing the CPU times for MINOS and general SQP to that of GREG. Here, we show that the performance of the simultaneous strategies with the hybrid method is competitive to that of GREG. 6. Conclusions Parameter estimation involving differential equation methods can be formulated as an NLP problem using orthogonal collocation on finite elements. This formulation allows other process constraints to be embedded directly. Also, solution and optimization can be obtained simultaneously. The practicality of the simultaneous strategies has been demonstrated on chemical kinetic modeling with various degrees of complexities. Although the formulation can be solved with general NLP methods, the results of the case study show that a special-purpose method, such as the hybrid SQP method, performs significantly better than any of the general methods used in the case study. Furthermore, the comparison study with the conventional approach also shows that, without a special purpose method, the simultaneous strategies can be slower than the conventional approach. The main contribution of this work is the development of the hybrid SQP method which exploits the structure of the least-squares objective function and optimizes the parameters only in the reduced space. The result is a robust and efficient nonlinear programming method for solving constrained least-squares problems.

Acknowledgment We acknowledge financial support from the Engineering Design Research Center, an NSF Engineering Research Center at Carnegie Mellon University. The authors are also grateful to Prof. W. E. Stewart for his GREG package. Appendix Tables 111-XI list the additional information for problems 1-9. Literature Cited Al-Baali, M.; Fletcher, R. Variational Methods for Nonlinear Least Squares. J . Oper. Res. SOC.1985, 36, 405-421. Ascher, U.; Bader, G. Stability of Collocation at Gaussian Points. SIAM J . Numer. Anal. 1986,23 (2), 412-422. Baden, N.; Villadsen, J. A Family of Collocation Based Methods for Parameter Estimation in Differential Equations. Chem. Eng. J. 1982,23, 1-13. Bard, Y. Nonlinear Parameter Estimation; Academic Press: New York, 1974. Bellman, R.; Jacquez, J.; Kalaba, R.; Schwimmer, S. Quasilinearization and the Estimation of Chemical Rate Constants from Raw Kinetic Data. Math. Eiosci. 1967, I , 71-76. Biegler, L. T.; Cuthrell, J. E. Improved Infeasible Path Optimization for Sequential Modular Simulators-11: The Optimization Algorithm. Comput. Chem. Eng. 1985,9 (31, 257-267. Biegler, L. T.; Damiano, J. J.; Blau, G . E. Nonlinear Parameter Estimation: A Case Study Comparison. AIChE J. 1986,32 (I), 29-45. Blau, G. L.; Kirkby; Marks, M. "An Industrial Kinetics Problem for Testing Nonlinear Parameter Estimation Algorithms"; Process Math Modelling Department, The Dow Chemical Company, 1981. Box, G. E. P.; Hunter, W. G.; MacGregor, J. F.; Erjavec, J. Some Problems Associated with the Analysis of Multiresponse Data. Technometrics 1973, 15, 33-51. Caracotsios, M. Model Parameter Sensitivity Analysis and Nonlinear Parameter Estimation. Theory and Application. Ph.D. Disser-

385

Ind. Eng. Chem. Res. 1991,30, 385-395 tation, University of Wisconsin-Madison, 1986. Cuthrell, J. E.; Biegler, L. T. On the Optimization of Differentiation-Algebraic Process Systems. AIChE J. 1987, 33, 1257-1270. Cuthrell, J. E.; Biegler, L. T. Simultaneous Optimization and Solution Methods for Batch Reactor Control Profiles. Comput. Chem. Eng. 1989, 13 (1/2), 49. Dennis, J. E.; Gay, D. M.; Welsch, R. E. An Adaptive Nonlinear Least-Squares Algorithm. ACM Trans. Math. Softw. 1981, 7 (3), 348-368. Fariss, R. H.; Law, V. H. An Efficient Computational Technique for Generalized Application of Maximum Likelihood to Improve Correlation of Experimental Data. Comput. Chem. Eng. 1979,3, 95-104. Fletcher, R.; Xu, C. Hybrid Methods for Nonlinear Least Squares. IMA J. Numer. Anal. 1979, 7, 371-389. Froment, G. F.; Bischoff, K. B. Chemical Reactor Analysis and Design; Wiley: New York, 1979. Fuguitt, R. E.; Hawkins, J. E. Rate of Thermal Isomerization of tu-Pinene in the Liquid Phase. J . Am. Chem. Soc. 1947,69,319. Hertzberg, T.; A s b j m " , 0. A. Parameter Estimation in Nonlinear Differential Equations. Proceedings of the Symposium CHEMDATA.77, 1977, Helsinki.

Jennrich, R. I.; Sampson, P. F. Application of Stepwise Regression to Non-Linear Estimation. Technometrics 1968, 10, 63-72. Murtagh, B. A,; Saunders, M. A. MINOS 5.0 User's Guide; Report SOL 83-20, Department of Operation Research, Stanford University: Stanford, CA, 1983. Nowak, U.; Deuflhard, P. Numerical Identification of Selected Rate Constants in Large Chemical Reaction Systems. Appl. Numer. Math. 1985, I , 59-75. Varah, J. M. A Spline Least Squares Method for Numerical Parameter Estimation in Differential Equations. SIAM J . Sci. Stat. Comput. 1982, 3, 28-46. Varga, K.; Marsi, I.; Tasi, G.; Kiricsi, I.; Fejes, P. Facilitation of Computer Estimation of Kinetics Parameters of a Heterogeneous Catalytic Reaction via the Use of Adsorption Data. Comput. Chem. Eng. 1988, 12 (2/3), 127-133. Vasantharajan, S.; Biegler, L. T. Large-Scale Decomposition for Successive Quadratic Programming. Comput. Chem. Eng. 1988, 12 (ll),1087-1101.

Received for review April 27, 1990 Revised manuscript received August 9, 1990 Accepted August 27, 1990

SEPARATIONS Multicomponent Dye Adsorption onto Carbon Using a Solid Diffusion Mass-Transfer Model Gordon McKay* and Bushra A1 Duri Department of Chemical Engineering, T h e Queen's University of Belfast, Belfast B T 7 l N N , Northern Ireland

The adsorption of basic dyes onto activated carbon has been studied for single and multicomponent systems. Three single-component systems, one binary dye adsorption system, and one ternary dye adsorption system have been investigated. Equilibria and kinetic studies have been performed and the effects of varying initial dye concentration and carbon mass have been investigated during the kinetic experiments. Prediction of experimental equilibrium data has been performed by use of the ideal adsorbed solute theory, the extended Redlich-Peterson isotherm, and a modified extended Redlich-Peterson isotherm. The predictions from the modified extended Redlich-Peterson isotherm, incorporating an interaction factor, correlated experimental data well and also provided a reasonably simple format that could be incorporated into a kinetic model. The kinetic model developed for agitated batch adsorbers is based on the film solid diffusion model coupled with the modified extended Redlich-Peterson isotherm. Kinetic data for the binary and ternary systems are correlated with experimental data for up to 3 h over a wide range of dye concentrations and carbon masses.

Introduction Adsorption from multicomponent solutions plays an important role in a great number of natural and industrial systems such as fundamental biological studies, separation and purification processes, recovery of chemical compounds, catalysis, and waste treatment operations. Adsorption processes using activated carbon have proven efficiency and are economically feasible for the treatment of wastewater (Borowko and Jaroniec, 1983). Ideally the prediction of full scale adsorber performance should be based on experiments carried out rapidly using simple equipment. Unfortunately multicomponent adsorption is frequently complicated by interactions and competition between sorbates and sorbent in the system. However, an efficient, accurate and cost-effective design should account for the effects associated with multicomponent adsorption (Weber and Smith, 1990). A t present, adsorber design is 0888-5885/91/2630-0385$02.50/0

mainly based on extensive pilot plant scale experiments (Yen and Singer, 1984) which supply design parameters for the set of variables studied but are not applicable to different system conditions. Generally, to develop a mathematical model that describes or predicts adsorption dynamics, the following are required: (i) a full description of equilibrium behavior, that is, the maximum level of adsorption attained in a sorbatelsorbent system as a function of the sorbate liquidphase concentration; (ii) a mathematical characterization of the associated rate of adsorption, which is controlled by the resistances within the sorbent particles; (iii) a material balance for each component within the system. Steps i, ii, and iii are combined in a complete model that consists of a system of partial differential equations (PDE's) that describe the continuity of each component in each phase together with equations expressing system 0 1991 American Chemical Society