Ind. Eng. Chem. Res. 2007, 46, 7729-7738
7729
PROCESS DESIGN AND CONTROL Convergence Depth Control for Process System Optimization Kexin Wang, Zhijiang Shao,* Zhengjiang Zhang, Zhiqiang Chen, Xueyi Fang, Zhou Zhou, Xi Chen, and Jixin Qian State Key Lab. of Industrial Control Technology, Institute of Industrial Control, Zhejiang UniVersity, Hangzhou 310027, People’s Republic of China
Convergence and solution time are important considerations in process system optimization. Another nontrivial task is the definition of termination criteria. However, setting the convergence tolerance is difficult and bewildering for users. Observed behaviors of algorithms when solving many optimization problems include tardiness in deciding convergence or failure of the optimization, and incapability of giving approximate solutions as they fail to converge. Here, we propose convergence depth control (CDC) for process system optimization. It is designed to take advantage of the achievement estimation of the optimization process to discover the proper time to terminate the optimization algorithm. Criteria based on CDC prefer to provide an approximate solution with acceptable optimality. Achievability and rationality of the criteria have been analyzed. To demonstrate the effectiveness of this method, we apply the Reduced-Hessian Successive Quadratic Programming (RSQP) algorithm with convergence depth control and with traditional convergence criteria, respectively, to problems from the CUTE test set, the distillation sequence in ethylene production, and catalyst mixing problem in COPS collection. Numerical results of the comparison show significant advantages of convergence depth control. 1. Introduction and Motivation Optimization has an important role in process systems engineering. Process design, synthesis, and operations often involve the solution of large, complex optimization models.1 Convergence and solution time of the problems are important, especially for real-time operation optimization, which is intended to address disturbances and other external and internal changes of industrial processes to maintain the operation near the economic optimum.2 It is desired that optimization finishes within a reasonable time frame. Although the fast development of mathematics and computer technology enables one to address optimization problems of a larger scale and those involving more-complicated models, the consequent heavy computational burden makes it necessary to balance the requirements on real time and accuracy, and to manage to improve convergence of the problems as well. However, it is difficult to define an efficient termination criterion for optimization algorithms. A less-stringent criterion may substantially reduce the solution accuracy, while overly stringent criteria make the optimization too lengthy, in regard to time, with little improvement in the solution. The following questions then must be answered: (1) How does one determine whether/when the optimization can be stopped? (2) What is the proper approach to give up an optimization process that seems to be never-ending? * To whom correspondence should be addressed. Fax: +86-57187951068. E-mail address:
[email protected].
Process systems optimization usually requires the solution of nonlinear programming (NLP) problems of the form
min f(x)
(1)
x∈Rn
s.t. c(x) ) 0 x L e x e xU where f(x):Rn f R is a scalar objective function, c(x):Rn f Rm are equality constraints, and xL and xU are lower and upper bounds of variables. For simplicity, rewrite problem (1) into the more-general form
min f(x)
(2)
x∈Rn
s.t. ci(x) ) 0
(for i ∈ E)
ci(x) g 0
(for i ∈ I)
where E and I are finite sets for equality and inequality constraints, respectively. Hereafter, we make direct use of form 2 when necessary. Generally, the Lagrangian for problem 2 is defined as
L(x,λ) ) f(x) - λTc(x) The traditional criteria test convergence of problem 2, according to the first-order optimality conditions (KKT conditions),3 is given as
10.1021/ie070073s CCC: $37.00 © 2007 American Chemical Society Published on Web 10/04/2007
7730
Ind. Eng. Chem. Res., Vol. 46, No. 23, 2007
∇xL(x*,λ*) ) 0 ci(x*) ) 0
(for i ∈ E)
ci(x*) g 0
(for i ∈ I)
λ*i g 0
(for i ∈ I)
λ*ici(x*) ) 0
(for i ∈ E ∪ I)
(3)
where x* is a local solution and λ* is the corresponding vector of the Lagrange multipliers. Traditional criteria are widely used, and, when they are satisfied, the underlying theoretical foundation suggests that the iterate is an (near) optimal solution and is acceptable. However, the disadvantages of traditional criteria are also obvious. Let us review some examples to reveal the observed behaviors of algorithms with traditional criteria when solving many optimization problems. In these examples, the algorithm used is the Reduced-Hessian Successive Quadratic Programming (RSQP) algorithm, and the selected problems are maratos, orthrds2, and aVion2 from the CUTE test set. The problems selected have equality and possibly bound constraints. Figure 1 shows the illustrations of objective and feasibility of the three examples. ObserVation 1. Problem maratos converged by 104 iterations. However, actually, the algorithm required k0, such that
θKprog ) S(max {δKfeasChg,δKobjChg},var) g 1
(19)
Proof. Using assumptions 4.1(I) and 4.1(II) in their reported work, Biegler et al. have established that the penalty parameter µ in eq 10 will settle down for all k > k0.11 Because the merit function φµ(xk) is forced to decrease at each iteration, and it is bounded below for all xk ∈ D, φµ(xk) f φµ(xk-1). The definitions defined by eqs 10, 11, and 15 then imply that, for any arbitrarily small var > 0, there exists K > k0, such that
δKfeasChg ) |δKfeasErr - δK-1 feasErr| e var δKobjChg ) |fK - fK-1| e var
(20)
Therefore, the monotonic decreasing property of the sigmoid function described by eq 18 with the argument δk indicates that
θKprog ) S(max{δKfeasChg,δKobjChg},var) g S(var,var) ) 1
(21)
9 We now prove that the convergence depth θkconvg is an indicator of the degree of xk converging to a Kuhn-Tucker point. Assume
δk ) S-1(θkconvg,var)
(22)
where S-1 represents the inverse of the sigmoid function (eq 18) of the argument δk. The following proves this by establishing the relationship between δk and the Kuhn-Tucker error at xk. Theorem 3.2. If assumptions 4.1 by Biegler et al.11 on the convex set D that contained the sequence {xk} generated by the RSQP algorithm hold, then δk can be regarded as a measure of the Kuhn-Tucker error at xk. Proof. By applying Taylor’s theorem to the merit function (eq 10), we obtain
φµ(xk) - φµ(xk+1) e -RkDφµ(xk;dk) e -Dφµ(xk;dk)
(23)
7734
Ind. Eng. Chem. Res., Vol. 46, No. 23, 2007
Figure 5. Comparison of the Reduced-Hessian Successive Quadratic Programming (RSQP) algorithm with CDC and traditional criteria on the CUTE test set: (a) problem group (i) (deconVc, dtoc1nd, optcdeg3, and palmer7e); (b) problem group (ii) (aljazzaf, djtl, lakes, and orthregb); and (c) problem group (iii) (bdqrtic, nonmsqrt, palmer5e) and the exception (mancino)).
where Rk ∈ (0,1] is the step length in iteration k, and Dφµ(xk;dk) e 0 is the directional derivative of the merit function
in direction dk. For the l1 merit function (eq 10), Byrd and Nocedal12 established that
Ind. Eng. Chem. Res., Vol. 46, No. 23, 2007 7735
Dφµ(xk;dk) ) ∇fTk dk - µ|V(xk)|1
(24)
Hence,
φµ(xk) - φµ(xk+1) e -Dφµ(xk;dk) ) |∇fTk dk - µ|V(xk)|1| e |∇fTk dk| + µ|V(xk)|1 e δkobjErr + µ(m + 2n)δkfeasErr
(25)
Figure 6. Schematic depiction of the depropanizer and debutanizer distillation sequence.
According to the definitions given by eqs 16 and 22,
δk g δkobjErr
runs with traditional criteria, the specified thresholds on objective and feasibility are obj ) feas ) 10-6, which are used to get more tolerances such as those for the gradient of the Lagrange function and complementary slackness. The criteria are derived from the KKT conditions:
and
δk g δkfeasErr
max{|∇xL(xk,λk)|∞; |λk,i‚ci(xk)| (for i ∈ I)} e obj
Thus,
φµ(xk) - φµ(xk+1) e δkobjErr + µ(m + 2n)δkfeasErr e (1 + µ(m + 2n))δk (26) If assumptions 4.1 from their reported work hold, Biegler et al. have shown that, for all sufficiently large k, there is a positive constant γµ,11 such that
φµ(xk) - φµ(xk+1) g γµ[|ZTk ∇fk|22 + | V(xk)|1]
(27)
Therefore, it is easy to obtain, from eqs 26 and 27, that
δk g γ1[|ZTk ∇fk|22 + |V(xk)|1]
(28)
where the positive constant γ1 is given as γ1 ) γµ/(1 + µ(m + 2n)). Equation 28 indicates that δk is a measure of the reduced Kuhn-Tucker error at xk. In addition, if xk is a satisfactory approximationsthat is, θkconvg g θ0sthe monotonic decreasing property of the sigmoid function (eq 18), with the argument δk, then imply that there exists γ2 > 0, such that
γ1[|ZTk ∇fk|22 + |V(xk)|1] e δk e γ2var
(29)
where γ2 < 1 (or γ2 > 1) if θ0 > 1 (or θ0 < 1), and γ2 ) 1 if θ0 ) 1. 9 4. Numerical Experiments We present three numerical experiments in this section to compare the RSQP algorithm with CDC and with traditional convergence criteria for their efficiency and practicability. The first experiment includes examples from the CUTE test set, which is a collection of benchmark problems for optimization. The second experiment concerns optimization under changing load conditions of the depropanizer and debutanizer distillation sequence in the ethylene production. The last is the catalyst mixing problem from the COPS collection, which is a package of difficult constrained optimization problems. For the runs with CDC, the specified threshold on variables is var ) 10-6, and the acceptable convergence depth is θ0 ) 0.9 in the first experiment and θ0 ) 1 in the other experiments, in an attempt to attain the specified tolerance further. In addition, CDC starts from the beginning of optimization. For the other
(30a)
max{|ci(xk)| (for i ∈ E); -ci(xk) (for i ∈ I)} e feas (30b) All numerical results are obtained on the machine that is running Windows2000 and with an ∼2.8 GHz CPU and 1 GB of RAM. MATLAB7.1 is the computation environment and TOMLAB5.1 supplies the TOMLAB/AMPL interface for the problems formulated in AMPL in the first and last experiments. 4.1. Numerical Experiment on the CUTE Test Set. The examples selected from the CUTE collection have equality and/ or bound constraints. Table 1 gives detailed description of some representative examples, grouped by the converged runs of the RSQP algorithm. The data in each column respectively represent the number of variables (#var), the degrees of freedom (#free), the number of bounds on variables (#bnd), the number of linear equality constraints (#lin_eq), and the number of nonlinear equality constraints (#non_eq). Table 2 compares the results of these problems obtained by the RSQP algorithm with CDC and traditional criteria, respectively. The column “θconvg” presents the convergence depth of each problem under CDC, where the negative values come from the positive order of magnitude of the measure of errors k (max{δfeasErr ,δkobjErr}) and, thereby, indicate that the errors are far from being consistent with the specified tolerance. The columns marked “iter”, “CPU”, “obj”, and “feas” respectively list the iteration numbers, the time required to solve the problems, the objective values, and the constraint violations of the solutions. Problems converged by the runs with CDC or traditional criteria are noted by “CDC” or “Trad”, respectively, in the last column. The following summarizes the comparisons between the runs with CDC and traditional criteria of the RSQP algorithm on problems in each group: Problem group (i): For the problems converged in both runs, the run with CDC terminated earlier. The quality of the approximate solutions is similar to that of the solutions under traditional criteria, in terms of the feasibility and the objective value. Problem group (ii): For the problems converged in neither run, the run with CDC terminated quite earlier, because it has detected the futile effort to converge in the early term of optimization. Whereas the run with traditional criteria continued until beyond the maximum number of iterations.
7736
Ind. Eng. Chem. Res., Vol. 46, No. 23, 2007
Figure 7. Optimization of the multicolumn system under changing load conditions: graphs shows the amount of load change versus (a) the number of iterations, (b) the CPU time, (c) the objective value, and (d) the constraint violation.
Problem group (iii): Problems in this group converged only under CDC. In these cases, the run with CDC also terminated much earlier. The approximate solutions are satisfactory, in terms of the predefined acceptable convergence depth. Whereas the run with traditional criteria failed because the maximum number of iterations was exceeded. Exception: The problem mancino is an exception, for which only the run with traditional criteria converged. This is because, during the intermediate iterations, the optimization process experienced little improvement (k ) 98):
θkprog ) S(max {δkfeasChg,δkobjChg},var) ) S(max{0,1.3099 × 10-7},10-6) ) 1.0362 > 1 However, the process was far from being converged:
θkconvg ) S(max{δkfeasErr,δkobjErr},var) ) S(max{0,0.0376},10-6) ) 0.3778 < 0.9 This led to termination of the run with CDC. Figure 5a-c makes the comparison more intuitively, where each column corresponds to one problem, and the termination points of the runs with CDC or traditional criteria are marked by a star (f) or a solid circle (b), respectively. The plot at the bottom of each column shows the changing convergence depth of the iterates. 4.2. Numerical Experiment on Distillation Columns. The depropanizer and debutanizer distillation sequence is a very important part in the ethylene production. Figure 6 shows a flow
chart of the sequence. The first column (E-DA-404) is the depropanizer, which separates the light component of C3 from the heavy components of C4 and above. Feed S502 of E-DA404 is the stream from the bottom of the condensate stripper and the redistillation column, and feed S538 is the stream from the bottom of de-ethanizer. The top product of E-DA-404 is condensed: one part of the condensate is returned as reflux, and the other part is the product (S511) of the depropanizer. Stream S503, which consists of C4 and heavier components, discharges from the bottom of E-DA-404 to the debutanizer (E-DA-405). E-DA-405 separates the C4 component from the other components. The product that contains the C4 component exits at port S592 as the discharge product, and the other heavy components are discharged through the tower bottom (S581). We have established a rigorous model of the distillation sequence in the MATLAB environment. The profit from the multicolumn system is mainly determined by the turnouts and prices of the liquid-phase products that contain the C3 component of the depropanizer and the C4 component of the debutanizer. We define the objective to be optimized as13
max F )
(278.88 × f_dp) + (227.752 × f_db) 10000
(31)
where the units of F are 10 000 (∼1300 dollars/h); f_dp and f_db are the flow rates of the top products of depropanizer and debutanizer, respectively (both are given in units of kmol/h),
Ind. Eng. Chem. Res., Vol. 46, No. 23, 2007 7737
and the variables are weighted by their prices. The C3H6 content is assumed to be the purity index of the product that contains the C3 component, and the content of the C5 component is considered to be the impurity index of the product of the debutanizer. The optimization problem for this system can be described as follows:13
min(-F) s.t. mesh equations of the distillation columns connection equations of the distillation columns
(32)
[C3H6] g 0.93 [C5] e 0.0095 This problem has 4190 variables, including the components, the pressure and temperature of the stream, the stage temperature, the stage efficiency, the operating conditions of the columns, and so on. However, there are only two degrees of freedom, which are the reflux flow of the depropanizer and the debutanizer. In a single-column system, the time required to solve the problem increases with the increasing change of its load from the optimal amount.14 Similarity could be found in the multicolumn system, as previously noted. The RSQP algorithm with CDC and traditional criteria are respectively applied to optimize the system, with the load S502 being increasingly deviated from its optimal value. Figures 7a, 7b, 7c, and 7d show the results of the RSQP algorithm with CDC and traditional criteria; these figures illustrates the number of iterations, the time required to optimize the system, the objective values, and the feasibility of the solutions, respectively, corresponding to the changes of the load in the negative and positive directions. Comparison with the runs under traditional criteria shows that the runs with CDC have reduced the total number of iterations and the total time required to solve the system by ∼81.2% and ∼85.9%, respectively. Under CDC, the greatest decrease in the objective value (using eq 31) is 4.491 × 10-10, which means the profits will be a few cents less, even if the corresponding operating condition remains unchanged for 1 year. In addition, the greatest difference in feasibility between the runs with CDC and traditional criteria is 9.102 × 10-7. The experimental results demonstrate that, under CDC, the number of iterations and the time required to optimize the system do not change much with the increasing change of the load, and the optimality of the solutions is sufficiently satisfactory. Therefore, CDC has saved much computational cost by terminating the algorithm properly in this experiment. 4.3. Numerical Experiment on Catalyst Mixing Problem. The catalyst mixing problem determines the optimal mixing policy of two catalysts along the length of a tubular plug flow reactor involving the reactions15
S1 h S2 f S3
[
]
The nonlinear model that describes the reaction is given as15
x˘ 1(t) ) u(t)(10x2(t) - x1(t)) x˘ 2(t) ) u(t)(x1(t) - 10x2(t)) - (1 - u(t))x2(t) x1(0) ) 1 x2(0) ) 0
(33)
where x1 and x2 are the quantities of S1 and S2, respectively.
Table 3. Comparison of the RSQP Algorithm with CDC and Traditional Criteria, with Regard to the Catalyst Mixing Problem parameter
value
θconvg number of iterations, iter CDC Trad computational time required, CPU(s) CDC Trad objective value, obj CDC Trad constraint violation, feas CDC Trad
1.01 104 873 953.937 s 5189.875 s -4.619 × 10-2 -4.637 × 10-2 6.100 × 10-7 5.110 × 10-12
The control variable u represents the mixing ratio of the catalysts and must satisfy the relation
0 e u(t) e 1
(34)
The variable u should be determined to maximize the production of species S3, for example, to minimize
J(u) ) -1 + x1(tf) + x2(tf)
(35)
for a fixed final time tf ) 1. We solve the model of the problem that was provided by the COPS test set,16 where the problem is formulated with a threestage collocation method, and a uniform partition of the interval [0,1] with 100 intervals. Table 3 presents results of the both runs of the RSQP algorithm, labeled as “CDC” and “Trad”, respectively. The description of this table is the same as that for Table 2. The difference in the objective values shows that, with CDC, the production of species S3 decreased by ∼0.4%, but the time required to solve the problem was reduced by 81.6%. Furthermore, the feasibility of the solution under CDC is satisfactory. It is obvious that CDC has greatly reduced the computational efforts without losing much optimality of the solution. 5. Conclusions Optimization algorithms often make trivial improvements after significant convergence achievements or when the problem is not going to converge. It is important to improve the convergence and solution time of process system optimization problems. Convergence depth control (CDC) is proposed in this paper to estimate convergence achievement of the optimization process, to stop both the successful and unsuccessful problems much earlier, and to try to provide near-optimal solutions with acceptable optimality. In this paper, the quality of each iterate has been quantified to generate its convergence depth. Under proper assumptions, it has been proven that the convergence depth reflects the degree of the iterate converging to a Kuhn-Tucker point, and that criteria, based on CDC, can be finally achieved. The performance of the Reduced-Hessian Successive Quadratic Programming (RSQP) algorithm with CDC and traditional criteria is compared using three experiments: optimization of the problems from the CUTE test set, optimization under changing load conditions of the distillation sequence in ethylene production, and solution of the catalyst mixing problem in COPS collection. In all these experiments, CDC detected convergence failure much earlier than did tra-
7738
Ind. Eng. Chem. Res., Vol. 46, No. 23, 2007
ditional criteria, and it reduced computational costs significantly by providing an approximate solution with acceptable optimality. As demonstrated by the numerical results, the advantages of CDC are most obvious when the problems to be solved will not converge or the progress of optimization becomes less apparent as the solution is approached. However, the exception when solving the CUTE test set indicates that tardiness of the optimization process during intermediate iterations can lead to termination of the algorithm with CDC. Further study on the performance of CDC, combined with other process system optimization algorithms and techniques, is the subject of future work. Acknowledgment This work was supported by the 973 Program of China (No. 2002CB312200) and the 863 Program of China (No. 2007A04Z192). The authors would like to thank Prof. Arthur W. Westerberg of Carnegie Mellon University for his helpful suggestions and encouragements. Literature Cited (1) Biegler, L. T.; Grossmann, I. E. Retrospective on optimization. Comput. Chem. Eng. 2004, 28, 1169. (2) Tosukhowong, T.; Lee, J. H. Real-time economic optimization for an integrated plant via a dynamic optimization scheme. Proc. Am. Control Conf. 2004, 1, 233. (3) Nocedal, J.; Wright, S. Numerical Optimization; Springer: New York, 1999. (4) Onnen, C.; Babuska, R.; Kaymak, U.; Sousa, J. M.; Verbruggen, H. B.; Isermann, R. Genetic Algorithm for Optimization in Predictive Control. Control Eng. Pract. 1997, 5, 1363. (5) Sotiropoulos, D. G.; Stavropoulos, E. C.; Vrahatis, M. N. A new hybrid genetic algorithm for global optimization. Nonlinear Anal. Theory Methods Appl. 1997, 30, 4529.
(6) Schmid, C. Reduced Hessian Successive Quadratic Programming for Large-Scale Process Optimization. Ph.D. Dissertation, Carnegie Mellon University, Pittsburgh, PA, 1994. (7) Ternet, D. J.; Biegler, L. T. Recent Improvements to a MultiplierFree Reduced Hessian Successive Quadratic Programming Algorithm. Comput. Chem. Eng. 1998, 22, 963. (8) Itle, G. C.; Salinger, A. G.; Pawlowski, R. P.; Shadid, J. N.; Biegler, L. T. A Tailored Optimization Strategy for PDE-Based Design: Application to a CVD Reactor. Comput. Chem. Eng. 2004, 28, 291. (9) Arora, N.; Biegler, L. T. A Trust Region SQP Algorithm for Equality Constrained Parameter Estimation with Simple Parameter Bounds. Comput. Optim. Appl. 2004, 28, 51. (10) Alkaya, D.; Vasantharajan, S.; Biegler, L. T. Generalization of a Tailored Approach for Process Optimization. Ind. Eng. Chem. Res. 2000, 39, 1731. (11) Biegler, L. T.; Nocedal, J.; Schmid, C. A Reduced Hessian Method for Large-Scale Constrained Optimization. SIAM J. Optim. 1995, 5, 314. (12) Byrd, R. H.; Nocedal, J. An Analysis of Reduced Hessian Methods for Constrained Optimization. Math. Program. 1991, 49, 285. (13) Jiang, A.; Shao, Z.; Chen, X.; Fang, X.; Geng, D; Zheng, X; Qian, J. Simulation and Optimization of Distillation Column Sequence in LargeScale Ethylene Production. J. Chem. Ind. Eng. China (China) 2006, 57, 2128. (14) Zhang, Z.; Shao, Z.; Chen, X.; Zheng, X.; Geng, D. Unified Approach to Debutanizer’s Simulation and Optimization Based on Equation Oriented Method. J. UniV. Sci. Technol. China 2005, 35 (Suppl.), 272. (15) von Stryk, O. User’s Guide for DIRCOL (Version 2.1): A Direct Collocation Method for the Numerical Solution of Optimal Control Problems. Technical Report, Technische Universitat Munchen, Germany, 1999. (16) Elizabeth, D. D.; More, J. J.; Munson, T. S. Benchmarking Optimization Software with COPS 3.0. Technical Report, Argonne National Laboratory, Argonne, IL, 2004.
ReceiVed for reView January 12, 2007 ReVised manuscript receiVed July 19, 2007 Accepted July 27, 2007 IE070073S