2 A Review of Optimization Methods for Nonlinear
Downloaded by UNIV OF CALIFORNIA SAN DIEGO on November 21, 2015 | http://pubs.acs.org Publication Date: May 30, 1980 | doi: 10.1021/bk-1980-0124.ch002
Problems R. W. H . SARGENT Department of Chemical Engineering and Chemical Technology, Imperial College of Science and Technology, London SW7, England
The f i e l d of optimization is vast and all-embracing. Relevant papers are published at a rate of more than 200 per month, spread over more than 30 journals, without counting the numerous volumes of conference proceedings and special collections of papers. The Tenth International Symposium on Mathematical Programming held in August this year has alone added 450 papers to the list. Applications are equally varied and widespread. This review cannot therefore hope to be comprehensive and its scope is firmly restricted to general methods for dealing with nonlinear problems, both with and without constraints, since these are the most common i n chemical engineering applications. Integer programming methods are not reviewed, since most of the mathematical developments are concerned with mixed integer-linear problems which are of limited interest to chemi c a l engineers. Branch-and-bound techniques are still the basic tools for nonlinear integer problems, and since heuristics play such an important role the techniques can only be considered in relation to specific applications. Many specialized techniques exploiting particular problem structures are ignored, and fields which involve considerations outside the question of the optimization techniques themselves are also excluded. Thus for example the whole f i e l d of function approximation and model parameter f i t t i n g has been l e f t out. Although there have been significant theoretical advances in recent years, particularly in connection with s t a b i l i t y , sensitivity and convergence analysis, these also are largely ignored. The emphasis is on algorithmic developments because to the user the theoretical advances are of no account u n t i l they are embodied i n implementable algorithms.
0-8412-0549-3/80/47-124-037$05.00/0 © 1980 American Chemical Society In Computer Applications to Chemical Engineering; Squires, R., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1980.
COMPUTER APPLICATIONS TO CHEMICAL ENGINEERING
38
Unconstrained Minimization.
Downloaded by UNIV OF CALIFORNIA SAN DIEGO on November 21, 2015 | http://pubs.acs.org Publication Date: May 30, 1980 | doi: 10.1021/bk-1980-0124.ch002
The quasi-Newton or variable-metric methods introduced by Davidon {1} have now become the standard methods for finding an unconstrained minimum of a differentiable function f(x), and an excellent review of the basic theory has been given by Dennis and More {2}. These are iterative methods of the form V i S
\ " \
=
k 1 " +
S ( S
\
g
k
k' k l' W p
+
'
)
'
)
where χ is an η - v e c t o r , i ^} k=0,l,2, . . . is a sequence of iterates with an arbitrary starting point χ , g^ is the gradient of the function f(x) at x ^ p = - xj, q = 3
f e + 1
fc+1
and is a local approximation to the inverse of the Hessian matrix of f(x). Classically, the scalar is chosen to minimize the function f(x^ " ~ ^ δ ^ with respect to a. The methods differ i n the formula used to generate the sequence S , k=0,l,2, and after Fletcher and Powell's {3} analysis o ï Davidon s method a whole spate of formulae were i n vented i n the sixties. Broyden {4} introduced some rationalization by identifying a one-parameter family, and recommended a particular member, now commonly referred to as the BFGS (BroydenFletcher-Goldfarb-Shanno) formula. Huang {5} widened the family, but by the end of the sixties numerical experience was producing a consensus that the BFGS formula was the most robust of the formulae available. The formula is α
1
s
k i +
=
\
{ p
k i "
kVi
s
+
)
p
k i +
+
Pk i Vi k rVW +
(
p
+
},
(2.2) where S^S^,
^
+
1
=Ρ*
+
Λ
+
1
,
^
= Pk iq +
k +
l k l k k l . / q
+
S
q
+
A turning point dame with a theorem of Dixon {6}, ed that a l l quasi-Newton formulae (those for which
who show
p^ ^) i n Huang's family generate identical steps even for general functions, and this directed attention to a choice based on numerical s t a b i l i t y rather than on theoretical properties, such as maintenance of positive-definiteness of the S^ {7}. In fact Broyden {4}, Fletcher {8} and Shanno {9} a l l arrived at the choice of the BFGS formula from consideration of conditioning of the resulting matrices. Shanno and Kettler {10} specifically considered a quantitative criterion for optimal conditioning, while Fletcher {8} was the f i r s t to suggest varying the update formula from step to step in the light of such a c r i t e r i o n . The idea was further developed by Davidon {11} and by Oren and +
In Computer Applications to Chemical Engineering; Squires, R., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1980.
2. SARGENT
Optimization Methods
39
Spedicato {12}, but later Spedicato {13} noted that the c r i t e r i a used by these authors were identical. Clearly i s related to the function f ( x ) , and in partic ular i t must be scaled i n inverse proportion to any scaling of f ( x ) . This led Oren and Luenberger {14} to investigate the sym metric members of Huang s family for which ^^i^+^^^+lP^+l > with the scalar Ρ^..^ chosen to adjust the scaling ofS, ^. This "self-scaling idea was further developed by Spedicato {15} who considered formulae which were invariant to a scalar non linear transformation of f ( x ) , and this also generalizes other attempts to approximate f(x) using more general classes than quadratic functions {16,17,18,19}. Numerical comparisons of the optimal conditioning and selfscaling ideas with the classical formulae have been published by Spedicato {15,20}, Brodlie {21}, Shanno and Phua {22}, Zang {23} and Schnabel {24}. The evidence i s not conclusive, but i t seems that the classical BFGS formula i s hard to beat. Optimal con ditioning involves more arithmetic at each iteration, which pays off only on seriously ill-conditioned problems. There seem to be special types of functions for which self-scaling gives a marked improvement but i n general i t s performance i s inferior, and the same seems to be true of the methods based on nonlinear trans formations. The early analysis of Fletcher and Powell {3} interpreted Davidon's method as one which generates conjugate directions, which naturally gives rise to the idea of minimization along these directions. However i t was soon realized that minimiza tion to high precision is an unnecessary expense, and indeed i s not implied i f the formulae are interpreted as secant approxi mations to the inverse of the Hessian matrix. In fact true mini mization must be abandoned i n favour of a "descent test" to guarantee convergence i n a practical algorithm {25}, and various step-length rules are given by Sargent and Sebastian {7} who showed how algorithms can be designed to ensure global converg ence to a stationary point. Numerical experience also shows that the simple Armijo rule {26,25} coupled with a descent test i s more efficient than minimization, provided that step-length expansion i s also used i f the test i s satisfied immediately. For years everyone has been content with algorithms which produce a descent path to a stationary point, which can of course be a saddle-point rather than the desired local minimum. However McCormick {27} has put forward an idea, later developed by More and Sorensen {28}, for the use of directions of nega tive curvature coupled with descent directions to ensure con vergence to a local minimum. The goal of achieving the global minimum rather than just a local minimum s t i l l has i t s attractions. Various approaches are given i n the collections of papers edited by Dixon and S z ë g o { 2 9 } , 1
Downloaded by UNIV OF CALIFORNIA SAN DIEGO on November 21, 2015 | http://pubs.acs.org Publication Date: May 30, 1980 | doi: 10.1021/bk-1980-0124.ch002
+
In Computer Applications to Chemical Engineering; Squires, R., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1980.
COMPUTER APPLICATIONS TO CHEMICAL ENGINEERING
40
while the recent "tunnelling algorithm of Levy and Montalvo {30} seems to be an effective version of the function-modifica tion approach to the problem. An excellent discussion of the issues and the different approaches is given by Griewank {31}. As computers become more powerful the problems tackled be come ever larger, and inevitably storage problems arise. This has revived interest in the conjugate gradient methods, which require storage of only a few η - v e c t o r s rather than an nxn matrix, Powell {32} gives an interesting analysis yielding new insight into the working of these methods. He extends the work of Beale {33} and Calvert {34}, giving evidence for favouring a particular conjugate-gradient formula and providing an automatic test for restarting. Even so, conjugate-gradient methods remain less efficient and less robust than quasi-Newton methods, providing an incentive to apply sparse-matrix techniques to the l a t t e r . Now i f the Hessian matrix i s sparse i t s inverse is likely to be dense, so instead of (2.1), we use
Downloaded by UNIV OF CALIFORNIA SAN DIEGO on November 21, 2015 | http://pubs.acs.org Publication Date: May 30, 1980 | doi: 10.1021/bk-1980-0124.ch002
11
V i
=
\ \ -
\*i "
\ ~W
,
)
-«k
,
)
.
)
H (
V*WW
(
2
'
3
)
where H^ is an approximation to the Hessian matrix i t s e l f , and in order to solve for s, we store and update the triangular factors of H^. The techniques for updating sparse triangular factors are given by Toint {35}. There has been l i t t l e recent work on methods for differentiable functions which avoid e x p l i c i t evaluation of derivatives. Powell's conjugate direction method {36} i s s t i l l used, but the generally accepted approach is now to use standard quasi-Newton methods with finite-difference approximations to the derivatives. On the other hand there has been considerable interest i n methods for nondifferentiable functions, as shown by the collection of papers edited by Balinski and Wolfe {37}, i n which the technique described by Lemarechal is of particular interest. Other con tributions i n this d i f f i c u l t f i e l d are due to Shor {38}, Goldstein {39}, Clarke {40}, Mifflin {41,42, Auslender"{43} and Watson{44}. In general these problems are much more d i f f i c u l t to solve than those involving differentiable functions, but they are becoming increasingly relevant to optimum design problems involving tolerances {45,46}. Nonlinear Programming. The general nonlinear programming problem i s Minimize subject to
f(x) φ(χ) > 0 ψ(χ) = 0 A
, , ,
) ) )
(3.1)
In Computer Applications to Chemical Engineering; Squires, R., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1980.
Downloaded by UNIV OF CALIFORNIA SAN DIEGO on November 21, 2015 | http://pubs.acs.org Publication Date: May 30, 1980 | doi: 10.1021/bk-1980-0124.ch002
2. SARGENT
41
Optimization Methods
where f(x) is a scalar function of the η - v e c t o r χ , φ(χ) is an m-vector and ψ ( χ ) is a q-vector. The state of the art in 1974 in dealing with such problems is admirably summarized i n the collection of papers edited by G i l l and Murray {47}. At that time the middle ground was held by feasible-point projection or reduced-gradient methods, with a strong challenge from augmented Lagrangian methods. Fletcher himself was disenchanted with his "exact penalty-function" method and tended to favour the augmented Lagrangian approach, and there were s t i l l strong protagonists for the original penalty-function approach. The classical penalty-function methods have now finally be come part of history, the early promise of the augmented Lagrang ian approach has faded, and there has been a coalescence of the approach used in the projection methods with the exact penaltyfunction approach. The classical penalty-function idea was to convert the o r i g inal constrained problem into an unconstrained one by increasing the objective function a r t i f i c i a l l y i f the constraints were violated, adding a penalty term reflecting the magnitude of the constraint violations. The method originated with Frisch {48} and Carroll {49} but was mainly developed by Fiacco and McCormick {50}. Good reviews are given by Lootsma {51} and Ryan {47,ppl75-190}. The difficulty with the approach is that i t is by definition approximate, and to obtain good approximations the constraint violations must be heavily weighted in relation to the objective function, yielding an ill-conditioned unconstrained problem. The practical solution was to solve a sequence of un constrained problems with steadily increasing weight of the con straint violations, and methods were devized for extrapolating the sequence to infinite weight. In 1968, Powell {52} likened the process to shooting at a target i n a strong wind and suggest ed i t was better to "aim off" rather than wheel up heavier and heavier guns; he therefore introduced a shifting parameter for each constraint, adjusted so that the minimum of the penalty function actually satisfied the constraint. A sequence of mini mizations is s t i l l necessary to adjust the shifting parameters, but these subproblems are much easier to solve. The exact penalty-function" idea was to devize a penalty function which has an unconstrained local minimum exactly coin ciding with the constrained minimum of the original problem (3.1). This goal seems to have been consciously sought independently by Fletcher {53} and Pietrzykowski {54}, but the idea was already implicit i n the work of Arrow and Solow {55} and Zangwill {56}. The Zangwill-Pietrzykowski penalty function for problem (3.1) is q . m P(c,x)=f(x)+c{ Σ I ψ ( χ ) I + Σ max (0, - φ ( χ ) ) }. (3.2) j-l 3-1 !
3
3
In Computer Applications to Chemical Engineering; Squires, R., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1980.
Downloaded by UNIV OF CALIFORNIA SAN DIEGO on November 21, 2015 | http://pubs.acs.org Publication Date: May 30, 1980 | doi: 10.1021/bk-1980-0124.ch002
42
COMPUTER APPLICATIONS TO CHEMICAL ENGINEERING
This function is indeed an exact penalty function for a l l values of the scalar c above a certain finite threshold value. However i t is nondifferentiable, and hence i t s minimization presents even more severe d i f f i c u l t i e s than that of the classical penalty func tions. The general methods for nondifferentiable functions ref erred to i n Section 2 could be used, but specific methods for (3.2) have been proposed by Conn and his coworkers {57,58,59}, Bertsekas {60} and Chung {61}. More recently Charambalous {62,63} has proposed the use of the more general 1 -norm for the penalty term instead of the 1^-norm used i n (3.2),^and points out some advantages for a choice l < p « » , but the penalty function is s t i l l nondifferentiable. It is well known that when (3.1) contains no inequality con straints the Lagrangian function
L(x,y) = f(x) - Σ yV(x)
(3.3)
j=l has an unconstrained stationary point with respect to χ and μ at the constrained minimum. Unfortunately however, i f the functions ψ"^ (x) are nonlinear there is no guarantee that this stationary point is a local minimum - i t could be a saddle-point or even a maximum. Hence Arrow and Solow {55} suggested "convexifying" L(x,y) i n the neighbourhood of the stationary point to make this a local minimum by adding a quadratic penalty term: L(c,x,y) = f(x) - μ ψ ( χ ) + |c φ ( χ ) .(}.ψ(χ) , (3.4) where c is a scalar and Q a positive definite matrix. For a given Q this function has a local minimum for a l l values of c above a certain threshold, and hence is a differentiable exact penalty function. Moreover since c is finite the unconstrained problem is not usually ill-conditioned. In fact Arrow and Solow considered only Q=I, and they proposed a continuous descent method for the minimization; they also showed that inequality constr aints could be dealt with by the use of slack variables. Independently of this work, Fletcher {53} started with (3.4) and sought to make μ and Q continuous functions of χ which would converge to the required values at the stationary point. Later {64}he generalized the approach to deal with inequality constr aints, and showed that the Lagrangian function for (3.1): Τ
Τ
L ( x , X , ) = f(x) - λ φ ( χ ) Τ
y
μ ψ(χ)
(3.5)
Τ
is i t s e l f an exact penalty function i f the multipliers λ , μ are obtained at each iteration by solving the quadratic programme: Τ
Minimize f (x).