An Inexact Trust-Region Algorithm for the ... - ACS Publications

Aug 17, 2010 - Globalization of the algorithm is ensured by using a composite step trust-region approach with the steps calculated by the approach of ...
1 downloads 0 Views 441KB Size
12004

Ind. Eng. Chem. Res. 2010, 49, 12004–12013

An Inexact Trust-Region Algorithm for the Optimization of Periodic Adsorption Processes Sree Rama Raju Vetukuri,†,‡ Lorenz T. Biegler,*,†,‡ and Andrea Walther¶ Collaboratory for Process and Dynamic Systems Research, National Energy Technology Laboratory, P.O. Box 880, Morgantown, West Virginia 26507-0880, Department of Chemical Engineering, Carnegie Mellon UniVersity, Pittsburgh, PennsylVania 15213, and Institut fu¨r Mathematik, UniVersita¨t Paderborn, Paderborn, Deutschland

Periodic adsorption processes have gained increasing commercial importance as an energy-efficient separation technique over the past two decades. Based on fluid-solid interactions, these systems never reach steady state. Instead they operate at cyclic steady state, where the bed conditions at the beginning of the cycle match with those at the end of the cycle. Nevertheless, optimization of these processes remains particularly challenging, because cyclic operation leads to dense Jacobians, whose computation dominates the overall cost of the optimization strategy. To efficiently handle these Jacobians during optimization and reduce the computation time, this work presents a new composite step trust-region algorithm for the solution of minimization problems with both nonlinear equality and inequality constraints, and combines two approaches developed in Walther1 and Arora and Biegler.2 Instead of forming and factoring the dense constraint Jacobian, this algorithm approximates the Jacobian of equality constraints with a specialized quasi-Newton method. Hence it is well suited to solve optimization problems related to periodic adsorption processes. In addition to allowing inexactness of the Jacobian and its null-space representation, the algorithm also provides exact second-order information in the form of Hessian-vector products to improve the convergence rate. The resulting approach3 also combines automatic differentiation and more sophisticated integration algorithms to evaluate the direct sensitivity and adjoint sensitivity equations. A 5-fold reduction in computation is demonstrated with this approach for two periodic adsorption optimization problems: a simulated moving bed system and a nonisothermal vacuum swing adsorption O2 bulk gas separation. 1. Introduction Periodic adsorption processes (PAPs) have gained increasing commercial acceptance as an efficient separation technique for a wide range of applications.4 Industrial applications of PAPs include vacuum swing adsorption to separate oxygen from air, pressure swing adsorption to separate hydrogen from hydrocarbons and simulated moving bed chromatography to separate enantiomers in the liquid phase. These processes consist of vessels or beds packed with a solid adsorbent. The adsorbent is brought in contact with a multicomponent feed mixture and the separation of the components is based on the difference in the affinity toward the adsorbent particles. Unlike most separation processes, e.g. distillation, which operate at steady state conditions, PAPs operate under periodic transient conditions with each bed repeatedly undergoing a sequence of steps. Because of the periodic operation, these processes reach a cyclic steady state (CSS), where the concentration profiles change dynamically and the profiles at the beginning of each cycle match with those at the end of the cycle. With increasing industrial applications, there is significant interest for an efficient modeling, simulation, and optimization strategy for PAPs. However, despite vast growth in the practical application of PAPs, design and optimization of PAPs still largely remain an experimental effort for several reasons. First, most practical PAPs are complex systems coupled by nonlinear partial differential and algebraic equations (PDAEs) in time and * To whom correspondence should be addressed. E-mail: biegler@ cmu.edu. † National Energy Technology Laboratory. ‡ Carnegie Mellon University. ¶ University of Paderborn.

space domains, with different initial and boundary conditions defining the steps of the process. For each step, conservation equations for mass, energy, and momentum are defined along with models for the equation of state, equilibrium, thermodynamic, and transport properties. The computational effort required to solve these systems is usually quite extensive and hence time-consuming. Second, multiadsorbent layers, nonisothermal effects and stringent product specifications lead to challenging nonlinearities and ill-conditioned matrices which lead to the failure of numerical solvers. Finally, the CSS operation yields dense constraint Jacobians, where the time required for the computation of the Jacobian and its factorization dominates the overall optimization process. For example, previous works on the optimization of simulated moving bed5 and pressure swing adsorption6 systems report that more than 90% of the CPU time was spent by the integrator in evaluating the dense constraint Jacobians. Hence, the efficient handling of constraint Jacobians, ill-conditioned matrices, and nonlinearities is essential to evaluate proposed designs and/or optimize a new design for existing installations, which is the main objective of this work. The dynamic optimization problem for PAPs can be expressed as miny,p f (y, p) F(z(t), dzd /dt, p, t) ) 0, zd(0) ) y s.t. w(y, p) e 0, css(y, p) ≡ y - zd(tcycle) ) 0 ymin e y,

L

p epep

(1)

U

The differential-algebraic equations (DAEs) in eq 1 are obtained by discretizing the PDAEs for the PAP in the spatial domain using the method of lines; z(t) are the DAE state

10.1021/ie100706c  2010 American Chemical Society Published on Web 08/17/2010

Ind. Eng. Chem. Res., Vol. 49, No. 23, 2010

Figure 1. Schematic of the sequential approach to dynamic optimization.

variables, that include concentration, loading, temperature, etc.; y ∈ R ny, are the initial conditions for the differential state variables (zd(t)). These initial conditions are decision variables in the optimization problem along with the design variables, p ∈ R np, that include flow rates, valve constants, bed dimensions, step times, etc.; the vector function w corresponds to the design constraints that can include purity or pressure specifications; css are the CSS conditions which enforce that the bed conditions at the beginning of the cycle match with those at the end of the cycle, tcycle; f is the objective function which can be minimizing power consumption or maximizing recovery/profit, etc. We apply the so-called direct methods7 to transform the infinite dimensional dynamic optimization problem (1) into a finite dimensional nonlinear program (NLP). Here we use a sequential strategy8 that falls under the class of direct methods to solve (1) by decomposing it into three components seen in Figure 1: (a) an initial value problem (IVP) which requires numerical solution of the discretized dynamic model to evaluate the state profiles as well as objective and constraint functions from a given set of inputs y and p, (b) a sensitivity component which evaluates first- and second-order derivatives for the constraint and objective function, and (c) an NLP solver, which accepts function and derivative information from the first two components and updates the values of the input variables y and p. In a gradient-based approach, function and derivative evaluations are obtained by combined solution of the IVP and related sensitivity components. In this way, the sequential approach decouples the solution of the optimization problem from solution of the embedded dynamic system, thus exploiting full advantage of state-of-the-art integration and NLP tools. However, the main bottleneck to using this approach for optimization of PAPs is the evaluation of direct sensitivities to obtain the dense constraint Jacobian and Hessian required by the NLP subproblem. For example, the evaluation of an exact Jacobian requires integration of O(ny(ny + np)) sensitivities. Moreover for accurate Hessian information that leads to rapidly converging iterative procedures,9 O(ny(ny + np)2) sensitivity equations have to be solved. Hence, the computational cost per NLP iteration is very high.10 An alternative to this approach is provided by simultaneous collocation methods and related quasi-sequential algorithms.11 In related studies12,13 we have applied simultaneous approaches to small-scale PAP applications. However, these problems have steep moving fronts and large differences in time scales, which require too many finite elements and make a simultaneous approach intractable with our current tools. Instead, for this work we consider a novel nonlinear programming algorithm for the sequential approach which overcomes the drawback of large computational times for Jacobian and Hessian evaluations. Here, the algorithm works with exact Jacobian-vector and Hessian-vector products rather than forming and factorizing the dense Jacobian and Hessian matrices

12005

at every NLP iteration. These matrix-vector products can be evaluated accurately and efficiently by exploiting the direct/ adjoint sensitivity equations and Automatic Differentiation. Furthermore, the Jacobian matrix is approximated using a quasiNewton update, namely the two-sided rank one (TR1) update. Globalization of the algorithm is ensured by using a composite step trust-region approach with the steps calculated by the approach of Byrd and Omojukun.14 An exhaustive survey on composite step trust-region methods that employ exact derivative information can be found in Conn et al.15 Effects of inexact derivative information are studied in Byrd et al.,16 and Heinkenschloss and Vicente17 where the focus of inexactness is due to iterative linear system solves. However, the inexactness in our work arises due to the Jacobian approximation using a quasi-Newton update. The convergence analysis and performance results that focus on equality constrained optimization problems are presented in Walther,1 and Walther and Biegler.18 In this work, we extend the algorithm to solve more general nonlinear programming problems with both equality and inequality constraints. The paper has the following outline. Section 2 describes the optimization formulation and strategy related to PAPs. Subsequently, we discuss the nonlinear programming algorithm in section 3. The performance of the overall optimization strategy is illustrated on dynamic optimization problems for simulated moving bed (SMB) and vacuum swing adsorption (VSA) processes in section 4. Finally, section 5 contains concluding remarks and directions for future work. 2. Optimization Strategy In this paper we discuss a trust region based sequential quadratic programming (SQP) algorithm that forms the basis of this work. This algorithm is formulated for the solution of NLP problems of the form min

{x1,x2}

s.t.

f (x1, x2) c(x1, x2) ) 0 xL1

e x1 e

(2)

xU1

where the objective function f : RnfR and the nonlinear equality constraints c: RnfRm with m e n are assumed to be sufficiently smooth and at least twice differentiable functions in xT ) [xT1 , xT2 ], with x1 ∈ Rn-m. Note that only bounds on x1 are considered in this work, and, as seen in the next section, this allows us to take advantage of robust and efficient trust-region-based optimization methods. 2.1. Handling General Inequality Constraints and Bounds. We now consider the reformulation of eq 1 to eq 2 and carefully choose the inequality constraints and bounds on variables y and p. First, we note that without bounds on the state variables, y g ymin the DAE model may not be solvable over the course of the optimization (e.g., if the states are negative). However, since these bounds are unlikely to be active at the solution, we do not enforce them explicitly as variable bounds, but treat them with a simple constraint aggregation approach. Our constraint aggregation method uses the KS function19 to lump all the bound constraints into a simple composite function given by KS(y, F) ) ymin -

1 ln( F

∑ exp(-F(y

i

- ymin))) g 0

i

(3) where ymin is the lower bound of yi which keeps exp(-F(yi ymin)) e 1 and the KS function well-behaved. For active bounds,

12006

Ind. Eng. Chem. Res., Vol. 49, No. 23, 2010

yi becomes active, yi f ymin, exp(-F(yi - ymin) f 1 and these terms dominate in the KS function. Otherwise as F approaches infinity, we have exp(-F(yi - ymin) ≈ 0 for y i . ymin. With the KS function added to the inequality constraints, the NLP from eq 1, with DAEs solved implicitly for z(t), is now given by min {p,y}

s.t.

f (p, y)

min d

h(p, y) ) 0 g(p, y) e 0 p L e p e pU

(4)

which we convert to eq 2 by adding slack variables to the inequalities (e.g., purity and recovery constraints, KS function), that is, g(p, y) + sˆ ) 0

(5)

with non-negative bounds on the slack variables, sˆ. The augmented set of equality constraints now includes the original set of equality constraints, h, and eq 5. Finally, special care has to be taken on the bounds of the slack variables sˆ since only (n - m) decision variables can be bounded in eq 2. Here we choose a subset of decision variables, pj, that would normally be used to solve the equations of a design specification problem, which arises frequently in the modeling of PAPs. For example, a valve constant can be chosen for a pressure specification, or adjustment of a flow can be used to meet a purity specification. This design problem is posed as h(pj, y) ) 0 g(pj, y) + sˆ ) 0

(6)

and it is expected that the Jacobian of these equations is always nonsingular with respect to y and pj. In this way, we can bound the rest of the free variables, pˆ, and the slack variables, sˆ. The variables and constraints in problem eq 4 are now reassigned to conform to eq 2 with x1 :)

[]

[]

Jacobian matrix. To circumvent these dense calculations, we present a trust-region-based SQP algorithm that does not require the exact evaluation of the constraint Jacobian. Equation 2 is solved by a successive quadratic programming (SQP) trust-region algorithm that repeatedly solves the following quadratic program for the kth iteration:

[ ]

pˆ y h , x2 :) and c :) sˆ pj g + sˆ

(7)

2.2. Solution Strategy. Efficient solution of eq 2 can be obtained by current SQP algorithms for applications with wellstructured and sparse derivative matrices. However, the drawback of using these SQP methods for PAP optimization is due to time-consuming computations with dense and unstructured matrices. As an alternative, one may use iterative system solves up to a certain accuracy, for example, Krylov subspace16 or multigrid methods, for efficient step computation in each iteration. However, these iterative methods assume that the Jacobian can be obtained at low cost and focus only on the efficient solution of the linear system. On the other hand, in the case of PAPs, the formation of a Jacobian dominates the overall computation time. For example, optimization of a 5-bed 11-step PSA process with ny ) 1000 state variables6 using a reduced Hessian SQP approach20 took almost 200 h to converge and the formation of each Jacobian matrix at every iteration took 2 h on a 2.4 GHz processor. This matrix was calculated from the direct sensitivity approach applied to the PDAE system, and reduced Hessians were approximated using quasi-Newton updates. In short, the current NLP algorithms are prohibitively time-consuming and memory intensive for the optimization of PAP problems with only a few hundred variables. Therefore, the bottleneck to efficient calculations is the evaluation of direct sensitivities to obtain the dense

s.t.

1 ∇x f (xk)Τd + dΤB(xk)d 2 c(xk) + A(xk)d ) 0 xL1 e xk1 + d1 e xU1

(8)

||d|| e ∆k in order to compute a new step d k )[d1T, d2T]T for a given iterate xk, trust region radius ∆k, and Lagrange multipliers, λk. Here the exact matrix of the dense constraint gradients at xk is given by A(xk) ) (∇c1(xk), ..., ∇cm(xk))T ∈ Rm×n and B(xk) is the exact Hessian of the Lagrange function defined by L(x, λ) ) f (x) + λTc(x)

(9)

Furthermore, ||.|| denotes the Euclidean norm ||.||2. Since eq 8 may not have a feasible solution, we follow a composite step approach proposed by Byrd and Omojokun.14 For efficient step computation of dynamic optimization problems, the proposed algorithm exploits direct sensitivity equations and/or adjoint sensitivity equations to evaluate the products of exact Jacobian A(x) and a given vector V, A(x)V, and A(x)TV and the product of exact Hessian B(x) and a given vector V, B(x)V. Efficient computation of these matrix-vector products can also be computed by applying automatic differentiation (AD). However, the repeated computation of the full matrix, A(x), which requires many Jacobian-vector products, may be prohibitively timeconsuming. The nonlinear programming algorithm presented in this work uses exact Jacobian-vector and vector-Jacobian products and avoids forming the full Jacobian matrix. Instead of forming the full Jacobian matrix, A(xk) is approximated using a specialized quasi-Newton method. Also, in order to improve the convergence rate the algorithm uses exact Hessian-vector products. 3. SQP Algorithm with Inexact Jacobian We now consider an algorithm for solving the nonlinear programming problem of eq 2 which does not require construction of the exact Jacobian A(x). 3.1. Equality Constrained Optimization. The proposed SQP algorithm uses a trust-region globalization method to force convergence from distant starting points. Trust-region SQP methods have several attractive properties because they control the quality of steps even in the presence of ill-conditioned Hessians and Jacobians, and do not require that the reduced Hessian matrix be positive definite. The use of the trust regionconstraint in eq 8: ||d|| e ∆k guarantees boundedness of the computed step d k. A globally convergent SQP algorithm usually defines a merit function, φ(x; ν) which determines how accurately the model problem approximates the original problem at the new point xk + d k, and allows a decision to accept or reject the computed step d k to be made. The proposed algorithm uses the nondifferentiable l2 merit function

Ind. Eng. Chem. Res., Vol. 49, No. 23, 2010

φ(x;ν) ) f (x) + ν||c(x)||

(10)

with the penalty parameter ν > 0. The merit function is approximated at xk + d k using the same approximation that determines the model problem from the original problem. In our case, a model of φ around the current iterate xk is given by 1 mk(d) ) f(xk) + ∇f(xk)Td + dTB(xk)d + νk |c(xk) + A(xk)d| 2 (11)

This subproblem has a spherical trust region in R with a quadratic objective function. Efficient solution of this subproblem can be performed by the dogleg algorithm (see Nocedal and Wright21). 3.3. The Tangential Subproblem. Given the normal step, the tangential step is used to reduce a suitable model within the trust region by maintaining linearized infeasibility. The tangential step, tk ) Z(xk)pZ is obtained by solving the following subproblem:

Then with aredk(d) defined by aredk(d) ) φ(xk ;νk) - φ(xk + d;νk)

12007

n

min

pZ∈Rn-m

(12)

as the actual reduction in the merit function caused by a step d k, and predk(d) defined as the predicted reduction:

s.t.

(∇f (xk) + B(xk)nk)TZ(xk)pZ + 1 T kT k p Z(x ) B(x )Z(xk)pZ 2 Z |Z(xk)pZ | e ∆2

(16)

This subproblem is a small NLP of size (n - m) with a quadratic objective function and ellipsoidal trust-region constraint. The exact null space basis Z(xk) can be constructed from the exact Jacobian A(xk) such that A(xk)Z(xk) ) 0. This can be done by performing QR factorization on A(xk), which may however be expensive for large-scale problems. The composite trust-region step from eq 15 and eq 16 is for the model of the merit function caused by a step d k, the model is said to be accurate enough if k

dk ) nk + Z(xk)pZ and we set

k

ared (d ) g η, where 0 < η e 1 predk(d k)

xk+1 ) xk + dk

(14)

Note that predk(d) is positive for a nonzero step d k, and the sufficient decrease in merit funcion is obtained if condition 14 is satisfied. To describe the approach we first focus on equality constrained optimization problems, that is, we ignore bounds in eq 8. Computation of the step d k requires the solution of an equality-constrained subproblem (ECSP). One potential difficulty that can arise when solving ECSP is that the linearized constraint and the trust-region constraint may not intersect if ∆k is too small. Consequently, there will be no feasible region that satisfies both constraints and ECSP will not have a solution in the trust region. To overcome this difficulty, we follow the approach of Byrd and Omojokun14 and decompose the ECSP into two trust-region subproblems. The two underbraced terms in eq 13 correspond to these trust-region subproblems. The first term is minimized in the tangential subproblem while the second term is minimized in the normal subproblem. This allows us to compute the composite step,

if we obtain a sufficient reduction eq 14 in the merit function 10. Otherwise, the trust region is reduced and a new trial step is computed. Implementation of the Byrd-Omojokun approach for equality constrained NLPs is described in Lalee et al.22 and reviewed extensively in Conn et al.15 and Wright.21 The implementations and analysis in these works are based on an exact Jacobian evaluation and its null space. Since the optimization problems related to PAPs result in dense constraint Jacobians, evaluating an exact Jacobian at every iteration dominates the overall computational time. To avoid forming the dense constraint Jacobian, we adopt the inexact trust-region algorithm presented in Walther.1 This algorithm approximates the Jacobian of equality constraints with a specialized quasi-Newton method, namely, the two-sided rank one (TR1) update as given in Griewank and Walther.23 The TR1 update is defined by Ak+1 ) Ak +

d k ) nk + t k a combination of the normal step nk and the tangential step tk. The normal subproblem aims to attain feasibility of the constraints while the tangential subproblem aims to reduce the quadratic objective function in ECSP by maintaining linearized feasibility. By splitting the step taken by ECSP, we write the resulting subproblems as follows: 3.2. The Normal Subproblem. At each step of the algorithm we first solve the normal step nk which reduces the infeasibility of the linearized constraints by solving the following normal subproblem at the current iterate xk min n∈Rn

s.t.

|c(xk) + A(xk)n| 2, |n| e ∆1

(15)

(17)

(qk - Akδk)(µTk - σTk Ak) (µTk - σTk Ak)δk

(18)

where qk ≡ c(xk + δk) - c(xk),

µkT ≡ ∇xLT(xk + δk, λk + σk) ∇xLT(xk + δk, λk)

σk ≡ λk+1 - λk ∈ Rm

δk ) xk+1 - xk

One important feature of this update is that it is possible to reconstruct the exact Jacobian, A(xk) with at most m TR1 updates for a fixed iterate xk.18 The matrix vector product, Akδk and the vector matrix product, σTk Ak in the rank-one term of eq 18 can be computed efficiently and accurately using sensitivities and automatic differentiation. Also, the update uses the information of the objective function at the current iteration, xk unlike other

12008

Ind. Eng. Chem. Res., Vol. 49, No. 23, 2010

quasi-Newton Jacobian updates. Instead of factorizing Ak at every optimization step, we maintain a factorized null space representation Zk 24,25 of the approximated derivative information during the whole optimization procedure. The inexact algorithm represents a Byrd-Omojokun trustregion approach that takes the inexactness of the Jacobian and its null-space representation into account. The global convergence of the algorithm to first-order critical points is ensured by the theoretical results presented in Walther.1 In the inexact setting, the normal step 15 and the tangential step 16 are calculated based on the approximate Jacobian Ak and its null space Zk (with (Ak)TZk ) 0). Furthermore, the quality of the approximated constraint Jacobian can be adjusted by bounding the influence of the error in Zk on the predicted reduction of the function mk given by 1 k k T k k k T k k k predk(dk) ) -∇f (x ) (n + t ) - (n + t ) B(x )(n + t ) + 2 νk(|c(xk)| - |c(xk) + A(xk)(nk + tk)|) ) tpred (t ) + νknpredk(nk) + χk + errk(dk, νk) k k

where 1 χk ) -∇f (xk)Tnk - (nk)TB(xk)nk 2 errk(d k, νk) ) νk(|c(xk) + A(xk)nk | - |c(xk) + A(xk)d k |) To ensure that the trust-region algorithm is globally convergent, it has been shown1 that if -errk(d k, νk)