Large-Scale Dynamic Optimization Using the Directional Second

Truncated-Newton methods provide an effective way to solve large-scale ... powered” truncated-Newton method is a promising candidate for the solutio...
0 downloads 0 Views 95KB Size
1804

Ind. Eng. Chem. Res. 2005, 44, 1804-1811

PROCESS DESIGN AND CONTROL Large-Scale Dynamic Optimization Using the Directional Second-Order Adjoint Method Derya B. O 2 zyurt and Paul I. Barton* Department of Chemical Engineering, Massachusetts Institute of Technology, 66-0464, 77 Massachusetts Avenue, Cambridge, Massachusetts 02138

Truncated-Newton methods provide an effective way to solve large-scale optimization problems by achieving savings in computation and storage. For dynamic optimization, the Hessian-vector products required by these methods can be evaluated accurately at a computational cost which is usually insensitive to the number of optimization variables using a novel directional secondorder adjoint (dSOA) method. The case studies presented in this paper demonstrate that a “dSOApowered” truncated-Newton method is a promising candidate for the solution of large-scale dynamic optimization problems. 1. Introduction Dynamic optimization problems consist of an objective function(al) dependent on a set of state variables that are determined in general by differential-algebraic equations (DAEs). The efficient numerical solution of these problems is important in many applications where optimization of time-dependent performance is required, such as optimal control, parameter estimation, and dynamic system synthesis and analysis. Solution procedures for dynamic optimization problems often have to tackle large numbers of state variables, optimization parameters, and constraints on the overall dynamic system, and/or those that are generated by the solution approach. An approach which decouples the optimization problem from the solution of the embedded dynamic system can enable exploitation of the full advantages of state-of-the-art integration and large-scale nonlinear programming tools. There are two well-established methods that have proven to be effective for the solution of optimal control problems via this decoupling. The first, the single shooting approach, uses a numerical technique known as “control vector parameterization”1 that reduces the original infinite dimensional problem to a finite dimensional nonlinear program (NLP) with DAEs embedded by discretizing the control variables only. The second, called the multiple shooting method, divides the overall time horizon into subintervals over which the embedded dynamic system is integrated separately. In the optimization problem, continuity conditions are added as constraints to achieve state continuity between the subintervals. Control vector parameterization can be applied in each subinterval to improve the control discretization. The multiple shooting method is considered to be more robust than the single shooting approach, but its computational cost is higher, since at each optimization iteration many more sensitivity cal* To whom correspondence should be addressed. Tel.: 617253-6526. Fax: 617-258-5042. E-mail: [email protected].

culations are necessary.2-4 Therefore, both approaches will be affected by improvements in efficient computation of sensitivities or, more importantly, specific derivative information. For a given instance of the parameter values characterizing the control discretization, the embedded DAE system is solved to provide the objective function value and its derivative values to be utilized by a gradientbased NLP solution procedure. Derivative information can be extracted from the first-order sensitivities, and there are several efficient methods to calculate these sensitivities.5,6 It has also been shown that for DAE embedded functionals the first-order adjoint method can be more attractive when the number of parameters is large and there is one functional or a small number of functionals.7,8 This is because, except in pathological situations,9 the cost of the adjoint calculations relative to the cost of a simulation is insensitive to the number of parameters involved, whereas for sensitivities this relative cost scales linearly with the number of parameters. For large-scale problems, if second-order information is employed by the optimization procedure, directional second-order derivatives are often estimated using directional finite differences based on a first-order adjoint code. Specifically, the Hessian-vector product of the objective function

J(p) ) g(x(tf,p),p) +

∫tt h(t,x(t,p),p) dt f

0

(1)

at p* in the direction u is estimated by

∇2J(p*)u ≈

∇J(p*+u) - ∇J(p*) 

(2)

which requires two state and adjoint integrations, one at p* and one at p* + u. The cost of computing this estimate relative to the cost of a simulation is insensitive to the number of parameters, by virtue of the properties of the adjoint integration.

10.1021/ie0494061 CCC: $30.25 © 2005 American Chemical Society Published on Web 02/12/2005

Ind. Eng. Chem. Res., Vol. 44, No. 6, 2005 1805

Truncated-Newton methods are a class of optimization methods which employ directional second-order information in updating the current estimate of the solution by approximately solving the Newton equations

∇2J(p(k))u ) -∇J(p(k))

(3)

using an iterative procedure.10,11 Each inner iteration of the truncated-Newton method requires evaluation of the Hessian-vector product, ∇ 2J(p(k))u, in a given direction u. Incorporation of “exact” second-order information within truncated-Newton methods has been undertaken previously for data assimilation problems.12,13 This rather specific case of the general directional secondorder adjoint method presented here is implemented for nonstiff ordinary differential equation (ODE) embedded systems using a “direct” automatic differentiation (AD)14 approach. Instead of applying AD to a code that integrates the embedded system, the directional secondorder adjoint (dSOA) method introduced in this paper constructs differentiated subroutines to be integrated by an implicit integration scheme. This “targeted” AD approach reduces the computational cost, and accurate Hessian-vector products are calculated in less time than two gradient evaluations.9 In addition, except in pathological situations,9 the cost of the dSOA method relative to the cost of a simulation is insensitive to the number of parameters involved. Another approach to compute second-order information for truncated-Newton methods is calculating the Hessian-vector products from directional second-order forward sensitivities.15 However, the cost of calculating the directional secondorder sensitivities, relative to the cost of a simulation, scales linearly with the number of parameters, and for large-scale dynamic optimization problems, these sensitivity calculations can become very expensive. In this study, we assume that the dynamic optimization problem is given as

min J(p) ) g(x(tf,p),p) + p

∫tt h(t,x(t,p),p) dt f

0

x˘ + F(t,x,p) ) 0, x(t0) ) x0(p)

(4) (5)

subject to the following bounds on the optimization parameters L

U

p epep

(6)

This formulation can encapsulate both the single and multiple shooting approaches for discretizing optimal control problems. Obviously, for the multiple shooting method the equality constraints should be incorporated within the objective function, for instance via an augmented Lagrangian approach. In general, parameters can appear in the definition of the dynamic model and its initial and/or boundary conditions. For an optimal control problem, control variables can be approximated by a proper discretization such as using a piecewise linear profile or Chebyshev’s polynomials. Although the following presentation only considers stiff ODE systems, by subtle consideration of initial conditions and stability of the adjoint systems, the theory and application can be extended to DAE embedded systems.16 In the following section, a short introduction to the proposed adjoint method to calculate directional secondorder derivatives is given. This is followed by the

description of a general method and its implementation using Nash’s truncated-Newton method11 with dSOA as a directional second-order derivative evaluator. Several examples then demonstrate the promise of the proposed approach. We conclude with some final remarks and a description of work in progress. 2. Obtaining Directional Second-Order Derivatives via an Adjoint Method Analogous to the first-order adjoint method to obtain gradient information for problems with a large number of parameters, the second-order adjoint method can be devised by constructing the second-order adjoint system and corresponding quadrature equations for point ∂2g/ ∂p2 and/or integral ∂2G/∂p2 [≡(∂2∫tt0f h(t,x(t,p),p) dt)/∂p2] form objective functionals. The first-order adjoints, λ, of the sensitivity equations are obtained from

λ˙ T - λTFx ) -hx λT(tf) ) 0

(7)

where Fx ≡ ∂F/∂x and similarly hx ≡ ∂h/∂x. Subsequently, the first-order derivatives are calculated using

∂G ) ∂p

∫tt (hp - λTFp) dt + (λTxp)|t)t f

0

0

(8)

Directional second-order derivatives can be calculated by a directional version of this adjoint system,9 that is

µ˘ - FTx µ ) (λT X Inx)(Fxpu + Fxx(xpu)) hxx(xpu) - hxpu µ(tf) ) 0

(9)

where µ ≡ λpu, u is the direction for the directional derivative, and λp are the second-order adjoints. [X denotes a Kronecker product. If A is an r × s matrix and B is a t × u matrix, then their Kronecker product, an rt × su matrix, is formed by replacing each aij in A by aijB.17 Inx is an nx by nx identity matrix. Fxp ≡ ∂2F/∂p ∂x.] Hence, we can calculate the directional second-order derivative of G(p) by

∂2G u) ∂p2

∫tt {hppu + hpx(xpu) f

0

[FTp µ + (λT X Inp)(Fppu + Fpx(xpu))]} dt + [(λT X Inp)xppu + xTp µ]t)t0 (10) Here xpu is obtained by defining s ≡ xpu and solving directional first-order sensitivity equations

s˘ + Fxs + Fpu ) 0, s(t0) ) x0pu

(11)

The second-order sensitivity of a point functional g(x(tf,p),p) at the final time point, that is, ∂2g/∂p2|tf,p, can be obtained from the above result using the identity

| (

∂2g ∂p2



tf,p

d ∂ dtf

2

∫tt

f

0

)|

g(t,x(t,p),p) dt ∂p2

p

Moreover, by considering the integrand, h(t,x(t,p),p), as

1806

Ind. Eng. Chem. Res., Vol. 44, No. 6, 2005

an additional quadrature variable in the numerical integration, the objective function (eq 4) becomes a sum of two point form functionals. For a system of ODEs defined as x˘ + F(t,x) ) 0; x(t0) ) x0(p), the directional second-order derivative reduces to

∂2 G u) ∂p2

∫tt [hppu + hpxs] dt + f

0

[(λT X Inp)xppu + xTp µ]|t)t0 (12) A more specific case can be constructed when ∂hp/∂p ≡ hpp + hpxxp ) 0 and

xTp |t)t0 ) Inx

(13)

Now the exact Hessian-vector product can be evaluated by calculating the second-order adjoint at the initial time. This specific case is presented in refs 13 and 18. The evaluation of directional second-order derivatives by the dSOA method requires solution of the state equations

x˘ + F(t,x,p) ) 0, x(t0) ) x0(p) and directional first-order sensitivities

s˘ + Fxs + Fpu ) 0, s(t0) ) x0pu forward in time. At the final time point of the forward integration (tf), if the objective function consists of pointform functionals, initial values of the adjoint variables should be calculated. Then first-order adjoint

λ˙ T - λTFx ) -hx, λT(tf) ) 0 and directional second-order adjoint equations

µ˘ - FTx µ ) (λT X Inx)(Fxpu + Fxxs) - hxxs - hxpu, µ(tf) ) 0 must be integrated backward in time. The size of these four systems is independent of np, the number of parameters. On the other hand, the formulation requires evaluation of several vector-matrix, matrixvector, and vector-matrix-vector products. Therefore, a successful implementation of the dSOA method is achieved by calculating these terms accurately and efficiently using AD.14 The numerical procedure implemented involves a forward integration using the staggered corrector method5 to obtain the states and directional first-order sensitivities in one direction or a small number of directions. These values are stored, although a checkpointing approach can also be employed to reduce the storage required. Subsequently, a backward integration is performed of the first-order adjoint and second-order directional adjoint equations in one direction or a small number of directions, once again, with the staggered corrector method. During this backward integration, ∂2J/∂p2 u is evaluated at time t0 from a quadrature calculation. In both the forward and backward integrations, the staggered corrector method enables the directional first-order sensitivities and second-order directional adjoints, respectively, to be calculated at a small increment of the cost for computing the states and

adjoints, respectively, by exploiting the similarity of the matrices appearing in the corrector iteration. This briefly described dSOA method9 is implemented within a modified version of DASPKADJOINT,7,19 an adjoint sensitivity solver. Specifically, the aforementioned code is modified to accommodate sensitivity calculations during backward integration, to save and retrieve forward sensitivities, to calculate the secondorder derivatives at the final step, and to perform these calculations for several directions. The input information necessary to set up a given problem is similar to that of DASPKADJOINT. However, in addition to the state and first-order adjoint equations, directional first-order sensitivity and directional second-order adjoint equations along with the integrands of the quadrature are needed. Efficient implementations of these subroutines can be constructed automatically via AD. 3. “dSOA-Powered” Truncated-Newton Method Solution procedures for sufficiently smooth nonlinear optimization problems are more stable and converge faster if second-order derivative information can be exploited by the iterative procedure. The ideal quadratic convergence can be achieved by obtaining the exact Newton direction at each iteration, but this requires a high computational and storage cost. On the other hand, superlinear convergence can be achieved when the search direction approaches the Newton direction in the limit as the solution is approached.20 Therefore, by calculating an approximate Newton direction, a compromise between fast convergence and computational cost per iteration can be attained. Two basic approaches are to approximate the secondorder derivatives and solve the Newton equation exactly (quasi-Newton methods) or to solve the Newton equation inexactly with “exact” or approximate second-order derivatives (truncated-Newton methods). Both approaches extend the capability of second-order methods to tackle large problems. Limited memory versions of the quasi-Newton methods have been proposed that do not perform well for objective functions with illconditioned Hessians.21 Since dynamic optimization problems can often have ill-conditioned Hessians and truncated-Newton methods allow incorporation of “exact” directional second-order information, they may be promising NLP solvers for large-scale dynamic optimization problems. Truncated-Newton methods consist of an outer iteration for the nonlinear optimization problem and an inner iteration for approximate solution of the Newton equations. These methods provide an effective solution procedure for large-scale optimization problems if an efficient but approximate solution of the Newton equations can produce a “good” direction which can be used in the outer iteration with appropriate globalization strategies.10 Techniques to calculate the requisite secondorder information efficiently and accurately can improve the overall method by expanding its applicability to large-scale dynamic optimization problems. The directional second-order adjoint method introduced above along with its first-order counterpart computes the directional second-order derivatives and gradients of the objective function efficiently and accurately, improving both the outer and inner iterations of the truncatedNewton methods.

Ind. Eng. Chem. Res., Vol. 44, No. 6, 2005 1807

The truncated-Newton algorithm adapted to solve large-scale dynamic optimization problems can be stated as follows: 1. Set iteration number k ) 0 and specify an initial approximation p(0) to the optimal solution p*. 2. If the approximation p(k) is a local minimizer of J within a given tolerance, stop. 3. Solve eq 3 approximately by a modified-Lanczos algorithm11 using preconditioning22 to obtain a search direction, u. At each iteration of this inner loop, the Hessian-vector products required are calculated by the dSOA method. 4. Apply a line search to find R > 0 such that J(p(k) + Ru) < J(p(k)). Gradient evaluations are performed by a first-order adjoint method.7 5. Set p(k+1) ) p(k) + Ru and k ) k + 1 and go to step 2. This algorithm is implemented using Nash’s truncatedNewton code for optimization problems with bounds on variables.11 The original code is modified to include the dSOA method in step 3. Estimating the Hessian-vector product using directional finite differences (eq 2) requires two gradient evaluations at the first iteration of the first inner loop (one at p(0) and one at p(0) + u). However, at the second and subsequent iterations of each inner loop, only a single gradient evaluation is required, at p(k) + u, because p(k) remains fixed and only the direction u changes. Our numerical experience shows that evaluation of the directional second-order derivatives using dSOA will be roughly 10-30% more expensive than a single gradient evaluation using the first-order adjoint method.9 This means that the cost of inner iterations for the “dSOA-powered” truncated-Newton method will be only marginally different from the estimation using directional finite differences. The benefit of our proposed method, therefore, will be seen empirically in the overall time for optimization because of the higher accuracy of the second-order information, resulting in faster and/ or more robust convergence. Moreover, success in approximating the second-order derivatives via gradients requires a proper  value for the finite difference method. Elimination of this potentially failure prone or sometimes impossible selection is another advantage of the “dSOA-powered” truncated-Newton method. 4. Numerical Results Three case studies are described to showcase the effectiveness and efficiency of the directional secondorder adjoint method in conjunction with a well-known truncated-Newton method. Our aim is to present the behavior of a “dSOApowered” TN method in three different situations. The first case is when the number of state variables is very small compared to the number of parameters. In this worst case, because of a “weak” np-dependence,9 the cost of dSOA increases more than anticipated. The second case is when the optimization parameters are obtained from control vector parameterization. Here, the computational cost will increase proportionally only because of the need for restarting the integration at each parameterization grid point, and substantial improvements in computational cost can be realized. The final case is when the initial conditions are also added as optimization parameters. In this case, the advantage of employing dSOA as a directional second-order derivative provider can be seen instantly.

Table 1. Computational Results for the van der Pol Oscillator Example Using dSOAa np

NIT

NF

CG

J

CPU (s)

TPI (s)

25 50 100 200 500 1000 2000 2500

7 7 12 21 48 90 180 222

8 10 15 22 49 91 180 223

30 25 48 69 152 360 701 886

2.87631 2.86937 2.86778 2.86739 2.86728 2.86726 2.86726 2.86726

0.30 0.55 2.04 7.33 62.5 409 2470 4778

0.01 0.02 0.03 0.08 0.31 0.91 2.80 4.31

a n , number of parameters; NIT, number of iterations; NF, p number of outer iterations; CG, number of inner iterations; J, objective function value; CPU, overall CPU time; TPI, average CPU time per iteration.

Our main aim is to show that, for large-scale dynamic optimization problems, obtaining the second-order information via dSOA is advantageous to the overall optimization procedure over approximating it by directional finite differences. The performance of directional second-order forward sensitivities as an alternative to evaluate the second-order derivatives is also presented for the first two examples. This method is implemented as a sensitivity problem in DASPK23 using the staggered corrector method.5 The first-order and directional secondorder sensitivity equations for each parameter are augmented to the integration at the parameterization grid point where they become active. These equations are constructed using AD. Gradients of the objective function are evaluated from first-order sensitivities. Numerical experiments are performed on a Pentium IV/3.20 GHz Shuttle X machine with 1 GB memory and running Linux kernel 2.4. g77 from the GNU Compiler Collection (version 3.2.3) with the -O2 optimization flag is used as the FORTRAN 77 compiler. All automatic differentiation tasks are performed by the AD tool TAMC.24 4.1. van der Pol Oscillator. This example is taken from the optimal control literature.15 The objective function is in point-form,

J(p) ) x3(tf,p)

(14)

The embedded ODE system consists of the following three equations

x˘ 1 - (1 - x22)x1 + x2 - v(t,p) ) 0 x˘ 2 - x1 ) 0 x˘ 3 - x12 - x22 - v(t,p)2 ) 0 where the control variable can be parameterized using a piecewise constant function on a uniform partition of the time interval. The initial condition for the states is x(0) ) (0, 1, 0), with 0 e t e tf, where tf ) 5.0. For an initial guess v(t,p) ) 0.7, several optimization runs are performed with increasing number of control parameters. The integration tolerance is 10-7, and the optimization accuracy (oa) value is set to 10-5. The outer iterations reported in Table 1, column 3, correspond to line search iterations consisting of a gradient evaluation, whereas the inner iterations consist of a Hessian-vector product evaluation. It has been shown that, especially for large numbers of parameters, a Hessian-vector product evaluation with dSOA has comparable cost to a gradient evaluation using a first-

1808

Ind. Eng. Chem. Res., Vol. 44, No. 6, 2005

Table 2. Computational Results for the van der Pol Oscillator Example Using the Finite Difference Approximation of the Hessian-Vector Product np

NIT

NF

CG

J

CPU (s)

TPI (s)

25 50 100 200 500 1000 2000 2500

8 10 12 22 48 90 179 222

15 11 13 23 49 91 180 223

34 38 52 71 192 295 718 886

2.87632 2.86937 2.86779 2.86739 2.86728 2.86726 2.86726 2.86726

0.32 0.59 1.77 6.39 66.4 314 2174 4083

0.01 0.01 0.03 0.07 0.28 0.81 2.42 3.68

Table 3. Computational Results for the van der Pol Oscillator Example Using Directional Second-Order Forward Sensitivities np

NIT

NF

CG

J

CPU (s)

TPI (s)

25 50 100 200 500

7 7 13 21 46

8 10 16 22 47

32 25 50 71 184

2.87631 2.86937 2.86778 2.86739 2.86728

0.91 2.40 14.5 72.9 1022

0.02 0.08 0.25 0.89 4.92

Table 4. Data for the Canned Food Sterilization Example CN0 CM0 Fs,D T0 TM,ref ZM,ref DM,ref TN,ref ZN,ref DN,ref R L R

1 × 108 1.0 1200.0 71.11 °C 121.11 °C 10.0 °C 240.0 s 121.11 °C 25.56 °C 10716.0 s 0.04375 m 0.058 m 1.5443 × 10-7 m2 s-1

order adjoint method.9 Therefore, the average time for a single outer or inner iteration can be estimated by dividing column 6 by the sum of columns 3 and 4 in Table 1. As can be seen in Table 1, column 7, the average time spent for gradient (NF) or Hessian-vector product (CG) evaluation using first-order and directional second-order adjoint methods, respectively, more than doubles per 2-fold increase in the number of parameters, proving that the quadrature calculations are dominating the integration. For this example, using a finite difference approximation to the directional second-order derivatives via a first-order adjoint method results in slightly better computational performance (the  value is automatically set to (oa)1/2(1.0 + |p|) in the described truncatedNewton method implementation). As can be seen in Table 2, the same objective function value is attained in a similar number of iterations for slightly less CPU time. Applying a directional second-order forward sensitivities approach to the same problem proves that the same objective function values can be calculated at a higher computational cost (Table 3). Here, the average CPU time per iteration is calculated considering that the Hessian-vector product evaluations take twice the time of gradient evaluations (using first-order forward sensitivities). 4.2. Canned Food Sterilization. The problem of industrial sterilization15,25 of canned food is considered as the second example. This involves modeling heating and thermal degradation processes. Heating of the canned foods by conduction is modeled by employing

Fourier’s second law on cylindrical coordinates

1 Tt ) R Trr + Tr + Tzz r

(

)

with the following boundary conditions

T(R,z,t) ) v(t,p) T(r,L,t) ) v(t,p) Tr(0,z,t) ) 0.0 Tz(r,0,t) ) 0.0 and the initial conditions

T(r,z,0) ) T0 In addition, the thermal degradations of the microorganisms and nutrients are assumed to obey pseudo-firstorder kinetics:

( ) ( )

( (

) )

CMt ) -

T(r,z,t) - TM,ref ln 10 CM exp ln 10 DM,ref ZM,ref

CNt ) -

T(r,z,t) - TN,ref ln 10 CN exp ln 10 DN,ref ZN,ref

with initial conditions CN(0) ) CN0 and CM(0) ) CM0, respectively. We consider the final retention of a nutrient within the total volume of the container (VT) as our functional; that is,

J(p) )

1 VT

exp

∫0V

T

[

-ln 10 DN,ref

∫0t exp f

(

) ]

T(r,z,t) - TN,ref ln 10 dt dV ZN,ref

where v(t,p) ) Tretort(t,p) describes the parameterized control variable Tretort. Two end point constraints (at tf ) 7830.0) are imposed

T(0,0,tf) e T0, Fs,D e Fs(tf) where

Fs(tf) ) -log exp

[

[

∫0V

1 VT

-ln 10 DM,ref

T

∫0t exp f

(

) ] ]

T(r,z,t) - TM,ref ln 10 dt dV ZM,ref

Table 4 lists all data values for this example. Similar to the procedure described in ref 15, an approximate solution is obtained for the above stated problem by assuming both inequality constraints are active at the optimal solution and incorporating them as quadratic penalty terms within the objective function. Similar to the previous example, the integration tolerance is 10-7 and the optimization accuracy (oa) value is set to 10-5. In this much larger example (302 equations) than the van der Pol example, only restarting of the integration at the control parameter grid points contributes to the increase of the computational cost per iteration. For an

Ind. Eng. Chem. Res., Vol. 44, No. 6, 2005 1809 Table 5. Computational Results for the Canned Food Sterilization Examplea np

NIT

NF

CG

J

CPU (s)

TPI (s)

7 14 28 56 112

17 25 45 46 97

20 28 54 59 110

41 60 133 112 248

0.4678 0.4711 0.4721 0.4722 0.4717

172.4 387.6 1382.4 2385.1 9073.1

2.8 4.4 7.4 14.0 25.3

a

T(x,0,t) - λTy ) v1(x,t), x ∈ ∂Ω1 T(0,y,t) - λTx ) v2(y,t), y ∈ ∂Ω2 Tx(xmax,y,t) ) 0 Ty(x,ymax,t) ) 0

Parameters are defined in footnote a of Table 1.

Table 6. Last Successful Iteration Counts for the Canned Food Sterilization Example Using the Finite Difference Approximation of the Hessian-Vector Product [E ) (oa)1/2(1.0 + ||p||)] np

NIT

NF

CG

7 14 28 56 112

2 3 5 8 17

4 6 15 17 36

3 3 7 11 27

Table 7. Computational Results for the Canned Food Sterilization Example Using Directional Second-Order Forward Sensitivities np

NIT

NF

CG

J

CPU (s)

TPI (s)

7 14 28

17 22 37

20 25 51

40 48 105

0.4679 0.4702 0.4712

342.9 976.0 5698.5

6.9 16.1 43.7

initial guess, Tretort ) 100.0 (except for np ) 56 and 112, where the last 5 and 12 constant profile value guesses are set to 20.0), and increasing the number of control parameterization grid points (np), computational results are listed in Table 5. Computing directional secondorder derivatives with dSOA is advantageous considering that its average time per iteration increases less than twice for doubled number of parameters. Another reason for employing dSOA for the evaluation of secondorder derivative information is that the finite difference approximation to estimate the same second-order information often fails to give sufficiently accurate directions for the optimization and results in termination of the optimization procedure without finding a solution. Table 6 lists the last successful iteration numbers before the finite difference method encountered a numerical difficulty. In all cases, the gradient evaluation at the perturbed optimization parameter values resulted in physically unreasonable state values (e.g., negative temperatures) and therefore erroneous gradients, consequently failing the outer iterations. It was not possible to solve this problem using directional finite differences. Similar to the first example, the directional secondorder sensitivity-based approach can solve the canned food sterilization problem at a higher cost (Table 7). 4.3. Temperature Profile Matching. A two-dimensional boundary control heating problem adopted from ref 4 is considered as our third case study. A rectangular domain (see Figure 1) is heated by controlling the temperature on its boundaries so that a prespecified temperature-time trajectory is approximately followed within a specified interior subdomain (Ωc)

min J(v) ) v

R(T)[Txx + Tyy] + S(T) ) Tt, (x, y, t) ∈ Ω × [t0, tf]

∫tt ∫yy ∫xx f

0

max

c

0 e v1, v2 e vmax is reduced into an ODE using the numerical method of lines4 with R(T) ) 1.0. The internal heat generation is represented as

S(T) ) Smax exp

(

-β1 β2 + T

)

where β1 ) 0.2, β2 ) 0.05, and Smax ) 0.0, unless specified otherwise. For the instance considered, t0 ) 0, tf ) 2.0, xmax ) 0.8, ymax ) 1.6, xc ) 0.6, yc ) 1.2, and vmax ) 1.1. The control functions on the boundaries ∂Ω1 ) (x,y)|y ) 0 and ∂Ω2 ) (x,y)|x ) 0 are defined as

v1(x,t) )

v2(y,t) )

{( {(

v(t)

0 e x e 0.2 x - 0.2 1v(t) 0.2 e x e xmax 1.2

)

v(t)

0 e y e 0.4 y - 0.4 1v(t) 0.4 e y e ymax 2.4

)

The spatial integrations in the objective function are approximated using the rectangular integration rule. For an integration tolerance at 10-10 and the optimization accuracy value of 10-8, several cases with a gradually increasing number of parameters were tested. The number of parameters is increased both by finer control parameterization and by addition of initial conditions as degrees of freedom to the optimization. These latter additions are noted by a + sign in column 2 of Table 8; that is, 10 + 450 denotes 10 control parameters and 450 initial conditions considered as optimization parameters. Nonlinearity is introduced by setting Smax ) 0.5 and is noted by nl in column 2. From the last column of Table 8 we conclude that the computational cost per iteration of a “dSOA-powered TN” method stays constant when more parameters are added to the optimization problem. The only increase

ω(x,y,t)[T(x,y,t) -

max

c

τ(t)] dx dy dt ω(x,y,t) ) 0 for t ∈ [0, 0.2] and ω(x,y,t) ) 1 for t ∈ [0.2, 2] are chosen. The nonlinear parabolic partial differential equation (PDE) described as

Figure 1. Two-dimensional domain for the temperature profile matching example.

1810

Ind. Eng. Chem. Res., Vol. 44, No. 6, 2005

Table 8. Computational Results for the Temperature Matching Examplea

Table 10. Computational Results for the Temperature Matching Example Using DSOA-TNa

nx

np

NIT

NF

CG

J (×105)

CPU (s)

TPI (s)

nx

np

NIT

NF

CG

J (×105)

CPU (min)

45

40 40 + 5 40 + 20 160 320 320 (nl) 10 + 5 10 + 450 20 50

33 30 34 78 128 115 12 32 22 48

34 30 35 79 129 120 13 33 23 49

105 100 101 206 269 229 36 183 80 171

4.295 4.325 4.582 4.812 7.779 75.99 6.922 6.258 4.736 4.193

69.3 65.4 67.4 464.1 1108.8 983.4 1049.6 5079.1 4019.5 17660

0.5 0.5 0.5 1.6 2.8 2.8 21.4 23.5 39.0 80.3

5151

10 + 500 (nl) 10 + 1000 (nl) 10 + 2000 (nl) 10 + 4000 (nl)

15 14 14 15

16 15 15 16

41 34 37 39

64.10 64.10 64.11 64.11

1030 867 865 979

861

a

nx, number of equations. All other parameters are defined in footnote a of Table 1. Table 9. Computational Results for the Temperature Matching Example Using the Finite Difference Approximation of the Hessian-Vector Producta nx

np

NIT

NF

CG

J (×105)

CPU (s)

TPI (s)

45

40 40 + 5 40 + 20 160 320 320 (nl) 10 + 5 10 + 450 20 50

33 30 24 83 112 110 13 7 24 27

33 31 25 84 113 114 14 8 25 28

121 94 54 222 224 219 39 16 80 83

4.330 4.358 11.95* 4.600 10.80* 155.55* 6.922 16.86* 4.735 4.606*

61.1 49.0 31.5 412.8 883.4 899.7 988.1 444.9 3506.1 8116.8

0.4 0.4 0.4 1.4 2.6 2.7 18.6 18.5 33.4 73.1

861

a Asterisks mark the runs where the optimization procedure terminated because no significant improvement in the objective function value was achieved.

is caused by integration restarts at each control vector parameterization time point. Obviously, the total cost of optimization increases because the number of iterations increases. Finite difference approximation to the directional second-order derivatives does not result in a robust performance (Table 9). Especially for cases with larger number of parameters or nonlinearity, the dSOA-based method outperforms this approach in terms of attained objective function value at a comparable overall computational cost. 5. Conclusions An efficient method to calculate accurate directional second-order derivatives is incorporated into the truncated-Newton method to solve large-scale dynamic optimization problems. Since dSOA has only “weak” npdependence, as the number of parameters is increased, the relative cost of derivative evaluations for large dynamic systems with many parameters does not amplify, staying constant if no integration restarts are necessary. Although the computational costs of evaluating a gradient and a directional second-order derivative with dSOA are comparable for a large number of parameters, obtaining the latter accurately improves the computation time by reducing the total number of iterations. Therefore, the directional finite difference method using gradient evaluations can give similar optimization results with comparable computational cost and decreased accuracy provided that there are no numerical difficulties using the procedure. Except for the first example, using finite difference approximations for the Hessian-vector products obtained via a first-order

a

Parameters are defined in footnote a of Table 1.

Table 11. Computational Results for the Temperature Matching Example Using the First-Order Adjoint Method and SQP (SNOPT 6.1-1(4) Dated June 2001)a nx

np

maj

FC

J (×105)

CPU (min)

5151

10 + 500 (nl) 10 + 1000 (nl) 10 + 2000 (nl) 10 + 4000 (nl)

30 33 30 31

33 36 33 34

64.10 64.10 64.10 64.10

580 632 580 598

a maj, number of major iterations; FC, number of function calls; all other parameters are as defined previously.

adjoint method did not demonstrate a robust computational performance. In general, the truncated-Newton method gives similar computational results for different initial guesses; however, in cases when the initial guess indicates a Hessian that is not positive definite, inherently, the truncated-Newton method takes more Cauchy-like steps, which may slow the progress toward a local optimum. Overall, the sensitivity of the truncated-Newton method to the initial guess for the control profile is not dependent on the approach to calculate the second-order derivative information; that is, dSOA, directional secondorder forward sensitivities, or finite difference approaches based on either first-order adjoints or forward sensitivities show similar behavior to the same initial guess for the control profile. A comparison of different integration technologies is beyond the scope of the present paper. However, a brief study of the integration times for the examples with backward differentiation formula (BDF) based integrators LSODE,26 DASPK, and its earlier version DASSL27 reveals that, for the relatively small integration task of the van der Pol example, LSODE’s method takes less overall time steps, resulting in smaller CPU time. For larger ODE systems, the CPU times for integration are comparable for all three integration tools. The existing implementation of the “dSOA-powered” TN method can be improved in several ways. First, one dSOA call per overall iteration can be eliminated, since a gradient and directional second-order derivative information can be obtained simultaneously at the final iteration of the previous line search. This can result in a potential one-third reduction of the overall CPU time of the method. Therefore, especially for larger problems, the presented method can complement other wellestablished methods and their ever improving implementations, for example, sequential quadratic programming (SQP) or interior point methods. As an example, the first-order adjoint method combined with an implementation of SQP improved with a “loose tolerance” strategy for the solution of the quadratic programs (in the manner of the truncated-Newton method) and limited memory BFGS updates28 can only achieve less than one-half reduction in the overall CPU time for the temperature matching example with Smax ) 0.8 (compare Tables 10 and 11). Nevertheless, note that SQP is not recommended for large numbers of degrees of freedom (2000 or more)28

Ind. Eng. Chem. Res., Vol. 44, No. 6, 2005 1811 Table 12. Computational Results for the Canned Food Sterilization Example Using First-Order Sensitivities and SQP (SNOPT 6.1-1(4) Dated June 2001)a np

maj

FC

J

CPU (s)

7 14 28

56 137 183

68 151 196

0.4679 0.4714 0.4721

273 1395 4580

a

Parameters are as defined previously.

whereas there is no such limitation for the truncatedNewton method. In any case, the first-order sensitivitybased approach is not competitive with an adjoint-based approach and will result in higher overall CPU times because of the np-dependence (compare Table 12 with Table 5). Besides the aforementioned improvement, better globalization techniques can also advance the presented “dSOA-powered” truncated-Newton method. Moreover, since an additional direction can be obtained even “cheaper”,9 linear biconjugate methods can be used for the inner iterations to solve the Newton equations (eq 3). For problems with (in)equality constraints, penalty methods can be implemented using TN as the unconstrained (bound constrained) optimization solver. This extension to the existing truncated-Newton solver can potentially enhance implementations of the multiple shooting approach for optimal control problems, which generates a large number of equality constraints and requires directional second-order information with respect to a large number of initial conditions. The dSOA method to calculate directional second-order derivatives of ODE embedded functionals can be naturally extended to include differential-algebraic equations. Moreover, for larger systems constructed by PDE semidiscretizations, iterative linear solvers can be utilized to reduce the cost of integration. Acknowledgment This material is based upon work supported by the National Science Foundation under Grant No. OCE0205590. We would like to thank Eva Balsa Canto for her help in constructing the first two case studies. Literature Cited (1) Teo, K. L.; Goh, C.-J.; Wong, K.-H. A unified computational approach to optimal control problems; Longman Scientific & Technical: Essex, England, 1991. (2) Leineweber, D. B.; Bauer, I.; Bock, H. G.; Schlo¨der, J. P. An efficient multiple shooting based reduced SQP strategy for large-scale dynamic process optimization. Part 1: Theoretical aspects. Comput. Chem. Eng. 2003, 27, 157-166. (3) Leineweber, D. B.; Scha¨fer, A.; Bock, H. G.; Schlo¨der, J. P. An efficient multiple shooting based reduced SQP strategy for large-scale dynamic process optimization. Part II: Software aspects and applications. Comput. Chem. Eng 2003, 27, 167-174. (4) Petzold, L.; Rosen, J. B.; Gill, P. E.; Jay, L. O.; Park, K. Numerical Optimal Control of Parabolic PDEs using DASOPT. In Large Scale Optimization with Applications: Part II; Biegler, L. T., Coleman, T. F., Conn, A. R., Santosa, F. N., Eds.; SpringerVerlag: New York, 1997. (5) Feehery, W. F.; Tolsma, J. E.; Barton, P. I. Efficient sensitivity analysis of large-scale differential-algebraic systems. Appl. Numer. Math. 1997, 25, 41-54. (6) Maly, T.; Petzold, L. Numerical Methods and Software for Sensitivity Analysis of Differential Algebraic Equations. Appl. Numer. Math. 1996, 20, 57-79.

(7) Cao, Y.; Li, S.; Petzold, L.; Serban, R. Adjoint Sensitivity Analysis for Differential-Algebraic Equations: The Adjoint DAE System and Its Numerical Solution. SIAM J. Sci. Comput. 2003, 24, 1076-1089. (8) Bloss, K. F.; Biegler, L. T.; Schiesser, W. E. Dynamic Process Optimization through Adjoint Formulations and Constraint Aggregation. Ind. Eng. Chem. Res. 1999, 38, 421-432. (9) O ¨ zyurt, D. B.; Barton, P. I. Cheap second-order directional derivatives of stiff ODE embedded functionals. SIAM J. Sci. Comput., in press. (10) Nash, S. G. A Survey of Truncated-Newton methods. J. Comput. Appl. Math. 2000, 124, 45-59. (11) Nash, S. G. Newton-type minimization via the Lanczos method. SIAM J. Numer. Anal. 1984, 21, 770-788. (12) Le Dimet, F.-X.; Navon, I. M.; Daescu, D. N. Second-Order Information for Data Assimilation. Mon. Weather Rev. 2002, 130, 629-648. (13) Wang, Z.; Navon, I. M.; Zou, X.; Le Dimet, F. X. A Truncated Newton Optimization Algorithm in Meteorology Applications with Analytic Hessian/Vector Products. Comput. Optimization Appl. 1995, 4, 241-262. (14) Griewank, A. Evaluating derivatives: Principles and techniques of algorithmic differentiation; SIAM: Philadelphia, PA, 2000. (15) Canto, E. B.; Banga, J. R.; Alonso, A. A.; Vassiliadis, V. S. Restricted second-order information for the solution of optimal control problems using control vector parameterization. J. Process Control 2002, 12, 243-255. (16) O ¨ zyurt, D. B.; Barton, P. I. Directional Second-Order Adjoint Method for Differential-Algebraic Equation Embedded Functionals. In preparation. (17) Marlow, W. H. Mathematics for Operations Research; John Wiley & Sons: New York, 1978. (18) Wang, Z.; Navon, I. M.; Le Dimet, F. X.; Zou, X. The secondorder adjoint analysis: Theory and Application. Meteorol. Atmos. Phys. 1992, 50, 3-20. (19) Li, S.; Petzold, L. Description of DASPKADJOINT: An Adjoint Sensitivity Solver for Differential-Algebraic Equations; Technical Report; University of California, Department of Computer Science and Engineering: Santa Barbara, CA 93106, 2001. (20) Nash, S. G.; Sofer, A. A barrier method for large-scale constrained optimization. ORSA J. Comput. 1993, 5, 40-53. (21) Nash, S. G.; Nocedal, J. A Numerical Study of the Limited Memory BFGS Method and the Truncated Newton Method for Large Scale Optimization. SIAM J. Optim. 1991, 1, 358-372. (22) Nash, S. G. Preconditioning of Truncated-Newton methods. SIAM J. Sci. Stat. Comput. 1985, 6, 559-616. (23) Li, S.; Petzold, L. Design of New DASPK for Sensitivity Analysis; Technical Report; University of California, Department of Computer Science and Engineering: Santa Barbara, CA 93106, 1999. (24) Giering, R. Tangent linear and Adjoint Model Compiler: Users Manual 1.4; 1999, http://www.autodiff.com/tamc. (25) Banga, J. R.; Perez-Martin, R. I.; Gallardo, J. M.; Casares, J. J. Optimization of the Thermal Processing of Conduction-Heated Canned Foods: Study of Several Objective Functions. J. Food Eng. 1991, 14, 25-51. (26) Hindmarsh, A. C. ODEPACK, A Systematized Collection of ODE Solvers. In Scientific Computing; Stepleman, R. S., et al., Eds.; North-Holland Publishing Co.: Amsterdam, 1983. (27) Brenan, K. E.; Campbell, S. L.; Petzold, L. R. Numerical Solutions of Initial-Value Problems in Differential-Algebraic Equations; SIAM: Philadelphia, PA, 1996. (28) Gill, P. E.; Murray, W.; Saunders, M. A. SNOPT: An SQP algorithm for large-scale constrained optimization. SIAM J. Optim. 2002, 12, 979-1006.

Received for review July 6, 2004 Revised manuscript received January 10, 2005 Accepted January 11, 2005 IE0494061