A REVIEW OF
Optimization Theoiy
L
1 /
FIGURE 1. A hypothetical plant containing a tubular reactor and associated equipment
18
I N D U S T R I A L A N D ENGINEERING C H E M I S T R Y
.
~~~
~
_
_
?
-
Ideally, operating decisions for a chemical plant result from a three-step process: quantitative analysis of the physical system involved; development of a numerical measure of system performance; determination of that combination of system variables which makes the measure as good as possible. The first two steps have always been within the province of the engineer and occupy the bulk of the engineering literature. In this review the author reflects the increasing interest in the third step, Le., in finding the optimum combination of system variables, by unifying a number of approaches to optimization theory. The utility of the unified approach is illustrated with the aid of a hypothetical chemical plant
This .article . ' . combines a survey of recent advances in opamzatlon theory, including some presented here for the first time, with descriptive derivations of the classical foundations of the newer theory based on the concept of a constrained derivative. We shall discuss only unconstrained and constrained optimization, including not only such well known techniques as Lagrange multipliers, optimum seeking methods and nonlinear programming, but also the elegant method of geometric programming-new to the chemical engineering literature-and the novel constrained derivative approach which helps unify these seemingly diverse concepts. All demonstrations are descriptive for clarity and simplicity; the reader is referred to primary sources for formal proofs. Where possible, concepts are illustrated by worked numerical examples involving a hypothetical chemical plant. Important parts of optimization theory are not discussed, however. The problems of optimizing large systems by such techniques as dynamic programming, decomposition principles, the maximum principle, and functional diagrams are not covered, nor are any problems in optimal control. These topics will be discussed in a book now in preparation by the author and C. S. Bdihtler, in which the material of this article will also be elaborated. A Hypo(h.lica1 Chomical Plant
Consider the problem of choosing the pressure, XI, and recycle ratio, x1, for the hypothetical chemical plant shown schematically in Figure 1. The required production rate is lo' pounds per year, and we must minimize the total annual cost, including both direct operating expenses and amortized capital expenditure. Raw materials are brought up to pressure in the compressor at an annual cost of $1000 XI, mixed with a recycle stream, and fed to a reactor operated at an annual cost of $4 X 1O9/xlxr. Product is removed in a separator at a
VOL 57
NO. 8
AUGUST 1965
19
S~O'XX,per year, and the unreactcd material is recycled in a recirculating compressor which consumes S1.5 X Wxr annually. We desire, then, to minimize the objective function defined by cost of
y
1000x1
+ 4 x 10sr1-~x*" + 2.5 x 10'XI
(1)
This function has no 6nite minimum because it can be made arbitrarily negative by selecting negative values of x i or XI. Such a choice would, however, be physically meaningless, and it is implicit that the variables must be positive. The Classic Approach
In general, let y(x) be any differentiablefunction of n variables x( (i = 1, 2, . . . , n), and let x* be a particular vector of values of xi such that for every point x in the neighborhood of x* Yh)
-Y W ) >0
(2)
Then, because every point near x* has a value of y greater than at I*,the point is called a local or relative minimum. Euler's (24) claggic way to find such a point is to consider the Taylor expansion ofy ahout x*
In this case, these can be solved to give the unique conditionsXI* = 1000 p.8.i.a. and XI* = 4. Stdonary Polnt
A point where the first derivatives vanish, called a stationary point, is not necessarily a minimum satisfying Equation 2, for the derivatives must also vanish at a local maximum, inflection, ridge, or saddle. Distinguishing between these types of stationary points r e q u k s aamination of the higher order terms of the Taylor expansion. A fairly simple and practical approach to this problem has been developed (2) but, although this technique is b a d on work of Lagrange (0, it has been shown by Genocchi and Peano (33)to be fallible in the exceptional case when all the second partial derivatives vanish (4). The simple Lagrange technique (35) is to examine the Taylor expansion at the stationary pqint, noting that the first-order terms vanish there, leaving the sccondorder terms to dominate the rest. In the example, this expansion would be
(8,
=
(Xi
- Xi*)'
y(x) (xi
- y(x*)
= [loo0
- 4 x loyxl*)+(x**)-1] x
- xi*) + [2.5 X 10' - 4 X
(XI
lOg(~i*)-'(~~*)-*] X
- x**)
+ O(9)
(4)
where O(9) represent&$ermsof degree 2 and higher in the expansion. Now, if x* is a relative minimum, the coefficients ax/bx, must vanish, for otherwise one could easily find points x a t which the first-order terms would be negative as well as close enough to x* so that they would dominate the higher order terms. Such a situation would contradict the definition of a relative minimum given in Equation 1. Thus, when the xi can be chosen at will, a point cannot be minimum unless all the first partial derivatives vanish there.
(5) In the example problem this condition can be invoked to give two nonlinear algebraic equations in the unknown minimizing values XI 4 and XI*.
20
INDUStRlAL A N D ENGINEERING CHEMISTRY
+ O(*)
- XI*)'
+ O(9)
+ 25O(Xi - xz*) -k 62,500 X (Xt
where the summations are understood to range from 1 to n, and the derivatives are evaluated at the point x*. In the hypothetical plant of Equation 1 this expansion about the as yet unknown minimum would be
- x,*)
Neglecting the remainder and completing the square one obtains Y(X)-Y(x*) = [(xi
- xi*) + i25(xI - xr*)r + 49,375(xr
- XI*)%
Because the right member is a sum of perfect squares, it must be positive for any and all values of xi and xz near XI* and x**, which satisfies the definition of a local minimum given in Equation 2. In this case, the Lagrange theory is adequate because the second-order terms do not vanish; because there are no other stationary points (for positive x i and XI), one can conclude that the local minimum in question is also the global minimum-Le., Equation 2 holds not only in the immediate vicinity of (1000,4), but for all positive p r e m e s and recycle rates. O.om.kic Pro#rammin#
The problem at hand can also be solved quickly and elegantly by a novel approach proposed by Zener (77-73) and Duffin (27, 22) called geometric programming. Instead of seeking the optimal values of the independent variables first, geometric programming finds the optimal way to distribute the total cost among the various term# of the objective function. Once these optimal allocations are obtained, often by inspection of simple linear equations, the optimal cost can be found by routine calculation. If this cost is suitably attractive, one can then 6nd the policy that will attain it. Although these last computations may involve solving some nonlinear equations, they are rarely as complicated as those en-
t
b
countered in the classic approach of Euler. Fermn proposed a related approach (25A). Duffin’s development of thii technique relies heavily on a certain inequality relating the arithmetic mean of a set of positive numbers to its geometric mean, which is why Duffin and Zener call the method geometric programming. The derivation here uses a different approach, for few engineers are acquainted with the geometric mean inequality. The present derivation establishes only the necessity of the conditions, while the Duffin-Zener proof shows their sufficiency as well. Consider then any objective function which is a sum of I terms c,p,(x), whae the c, are positive constants and thep,(x) are products of pavers of the xl. I
Y =
c
(7)
C&,W
I-1
remarkable, the distribution is totally unaffected by changes in the cost coefficients, c,, which do not appear in the right members of Equation 12. This separation of technological effects, as reffected by the exponents a,,, from the economic e5ects, as measured by the coefficients c,, is one of the attractive features of this approach. We can now find the minimum cost, s t i l l without knowing the optimal policy. Because the weights sum to unity
(14) But by Equations 8 and 12, the right-hand product is unity. =
II[P,(X*)]Wf
i
II I I ( X I * ) W j
i
=
II(X n 1, further steps must be taken to lind the optimum weights from among all those satisfying Equations 11 and 12. Further chemical engineering applications are given by Sherwwd (58A).
+
+
1
o
Simple inspection gives the solution as w1
= (27 X 10”)”a
In
(13)
Thus, before knowing the optimal pressure and recycle ratio, we can assert that the optimal policy must equalize the c o m p m r cpst, the reactor cost, and the combined separator and recirculation cost. More
Optimum-Seeking Methods
Many practical problems do not have the objective function formulated neatly as an algebraic equation, but rather as a complicated computer program which can calculate one value for each set of parameter values V O L 5 7 NO. 8 A U G U S T 1 9 6 5
21
chosen. h t i n g an optimum in this case is not a matter of differentiation, because there is no function to be diEerentiatcd, nor is geometric programming effective. All we can do is test a few parameter combinations and then use the results to decide where new calculations may lead to improvement. Such selfdirected techniques are called optimum-seeking methods, which will b e described here almost entirely in geometric terms. Their application to numerical problems involves translation of these geometric ideas into algebraic terms, a straightforward task whose details are described fuUy elsewhere (65).
-
Fibonacci Search
When the objective function has only one optimum and depends on a single independent variable, the highly effective Fibonacci search technique can be used. A special variant of it, useful as part of later multivariable methods, will be described here and demonstrated on the problem of finding the minimum cost for the chemical plant when the recycle ratio is fixed at unity (XS = 1). Let the range of preasure be from 0 to 3000 p.s.i.a., and suppose that six experiments are to be expended. Enter the first column of Table I at the row marked 6 and read the number 21 in the second column. This says to divide the interval into 21 equal parts. For convenience assume that the upper bound on the pressure XI is 3150 p.s.i.a., because t h i s is a multiple of 21 and makes each division equal to 150 p.s.i.a. The problem now is to find the best point among the 20 which separate the divisions, using only 6 measurements. The search is confined to the points 150, 300, . . .,2850,
3000. The second column of Table I shows that the first experiment should he located at the eiehth wint. namely, at 8(150) = 1200 p.s.i.a. The second is located symmetrically with respect to the lirst, 1200 below 3150 at 1950 p.s.i.a. Figure 2 show these experiments at the two points, marked, respectively, 1 and 2, whose outcomes amyl = 4.783 X 10' andys = 4.257 X 10'. The asrumption that the dependent variable is unimodal permits the elimination of points 1 through 8 as possible minima. The next experiment is placed symmetrically among the 12 points remaining with reapeet to the existing measurement at 1950 which, being the iiflh from the left, implies that the new one should be placed five from the right at point 16 (2400 p.s.ia.). The outcome is 4.517 X lo", which eliminates it and all points to its right as pogdble minima. The symmctty rule, applied on the set remaining, locates the fourth experiment at point 11 (1650 p.8.i.a.) wherey, = 4.324 X IW. The fifth is, therefore, at point 14 Q = 2100), and becauseyr = 4.255 X lo",the final experiment is placed at point 12(x1 = 1800) whereys = 4.272 X lo". Thus, the best point among the 20 original c a n d i i m is 13, where XI = 1950 p.s.i.a. and the cost is $4.251 X lo". The true optimum, obtainable in this example by duect differentiation, is at XI = 2000, where the cost is $4.250 X 1Wi only $1000 less. Table I indicates the procedure for a liner subdivision
-
22
.
I N D U S T R I A L A N D ENGINEERING C H E M I S T R Y
I
P i p r e 2. A s i k - e x p r i r n m ! (.U,= 7) Fi6onn:ri iio,ih,for opfirnurn o m a t i o n a/ !he bbotheticol p'nnt
D . J . WiMc is Associate Rofc$fc# of C h k d Enginmikg at Stanfwd Univnsity. The material contnimd in this article will constitute the* substance of a mmy c@tn in his fwthcming book on optimization. AUTHOR
-up to 1 part in lo". The method, discovered and proved optimal by Kiefer, is discussed along with variations of its applications to differmt situations by Wilde
meamuemcnt of all 6rst partial derivatives at the starting point. Let changes in the ith independent variable be'denoted by Ax1 and let A be a positive parameter. The negative gradient direction at the starting point is such that
(65). MuRivarbblo Teehniqumr
Although the Fibonacci technique does not extend directly to cases with more than one independent variable, most multivariable techniques break the search down into a sequence of one-diensional problems to which Fibonacci search is applicable. Such schemes terminate when no improvement is found along the search lines permitted. To guard against such a point's being a saddle or ridge rather than an optimum, every search should be finished by fitting a quadratic approximation and checking to see if it is definite (65). Users of search routines programmed by others should check to see if this h a 1 test has been incorporated, for many programs now available have overlooked it. The sectioning method (30, involves holding all independent variables but one constant and & d i g the best point along the line generated. At this new point, a new variable is altered, generating a new line upon which the best point is found. The technique is repeated until no further improvement is experienced. Sectioning is illustrated in Figure 3, which shows contours of the objective function y for the hypothetical chemical plant. Dotdash lines show the loci of high points along l i e s of search in each direction. The best point8 found on each line of search are connected by arrows to show the zig-zag progress of sectioning from one dotdash line to the other. This tendency to oscillate with steadily decreasing progress toward the peak is a disadvantage which often outweighs the method's simplicity. I t is possible for the loci of maxima to approach each other or even to touch far from the true optimum, and in such cases sectioning will stop short of the goal. Cauchy's gradient or steepest descent method (73), popularized by Box and Wilson (70), requires the
=
--x
(Z)
where the derivatives are evaluated at the starting point. As A varies from 0 to infinity, a straight line is generated and Fibonacci search can be u d to 6nd the value of A for which y is minimum on this line. At this low point a new set of derivatives is determined and the process repeated on the new gradient line. The result is often a zig-zag path as for the sectioning method (Figure 3), although oblique directions are permitted. Convergence of both methods may be speeded by incorporating acceleration into the search plan. For sectioning this would involve letting all the independent variables adjust once and then finding the optimum along the l i e joining the &st optimum and the last one, as shown in Figure 4 which also illustrates a similar acceleration scheme for the gradient method investigated by Forsythe and Motzkin (29). A generalization of this approach to more than two independent variables is the steep descent partan (an abbreviation of pardel tangents) technique of Shah, Buehler, and Kempthorne (58, 65). The pattern or direct search method of Hooke and Jeeves ( 3 9 65, 70) involves acceleration, but it does not estimate derivatives or seek optima on l i e s of search. Consequently, it is easy to program and has proved amazingly effective in practice, perhaps because it tends to follow the valleys leading to a minimum. Another valley-following scheme which is simple to program is the simplified technique of Himsworth (38). A more sophisticated valley- (or ridge-) following scheme is that of Rosenbrock (57, 65). Acceleration techniques have also been proposed by Gelfand and Tsetlin (32). When exact expressions for derivatives are available, faster convergence can sometimes be gained. The 8
x"
I
w A
5
1 0
I
400
I
800
I
lm, 1600 xI MESSUE I 1p.s.i.a.)
2ooo
Figwe 3. Sectioning and grodient mthoal. Contours me for fha objective function Q riqfmd by Egunlimt 1
'
I 0
xI RESSURE
1p.s.i.a.)
I m
Figwe 4. Acca[noling the mnwrgme of re&'oning o r g r d i m ~semch metkd VOL 57
NO. 8
AUGUST 1 9 6 5
23
I
Alternate solutions are possible for constrained problems Gauss method (36),which resembles the method of least squares, can be used when the objective function is the sum of squares of functions. Booth (9) and Levenberg (49) have also proposed schemes with quadratic convergence for such situations. Near an optimum, where the function may be approximated by a quadratic function, it is possible to achieve second-order convergence with only first-order information by Hestenes and Stiefel’s conjugate gradient method (37), and its extensions by Davidon (79), and Fletcher and Powell (28). Equality Conshoints4imd Substitution
where the derivatives are evaluated at ample these equations would reduce to
2Z(Xl
i. In the ex-
+ Pl(X2 - 2%)
- 21)
= 0
+
(22)
If n of the m n variablesf, and x, are assigned values in the neighborhood of i,the other m can be calculated from Equation 21, provided the equations are linearly independent. The former set may be considered as n independent variables; the latter, as m dependent variables, and no generality is sacrificed in treating the first m as dependent. Equations 3 and 21 become, upon rearrangement
Suppose in the hypothetical example that instead of allowing any positive pressure and recycle ratio, we must for technical reasons consider only those satisfying the following equality constraint.
xlxz = 9000 p.s.i.a.
(18)
The points satisfying this equation lie on the dashed line of Figure 3. The direct way to handle this is to use Equation 18 to eliminate one variable, say x1, from the objective function in Equation 1. y =
1000 xi
=
1000 x1
+ 4 X 10”/9000 + 2.5 X 1O5(900O/xi) + 2.25 x + 4.44 x 105 (19)
The minimum of this function of XI alone is found by the classical technique described already. The result is XI‘ = 1500 p.s.i.a., which, when substituted into Equation 19, gives the constrained minimum cost as y’ = $3.44 X loa. Substitution into the constraint gives the optimal recycle ratio as XZ’ = 6. This constrained optimum is shown in Figure 3 on the dashed line. The minimum cost has gone up because the unconstrained minimum has been forbidden by the constraints. Direct substitution, when it can be used, is effective because it reduces the number of independent variables and, consequently, the number of derivative equations which must be solved. I t does require, however, the initial solution of the constraint equations in closed form, often an impossible task. E q u a l i Conshoints-Jacobian
5 . . . afk-1 ax1
ax1
afl afk-1 ...
Solution
An alternate way to solve constrained optimization problem is to consider the linear terms of the Taylor expansions, not only of the objective function but also of each constraint. The derivation here is heuristic; for a rigorous demonstration see (67). Let there be m( n 1, the orthogonality and normality conditions do not furnish enough equations to give a unique solution, and not all the solutions solve the original optimization problem. Yet when the optimal weights are all non-negative (u = f ) , we can find the constrained minimum even when the solutions to Equations 46 and 47 arc not unique. This fact, of limited usefulnes when the constraints are equalities, proves to be valuable for handling inquality constraints. Let ut = 1, . . ., h(m)] be any solutions to the following analogs to Equations 46 and 47:
+
+
h(m)
ut?, = 0 ; -1
i
=
1,
. . ., n
(50)
=3x
.,(1 + v&)-'/:(l+Q (1 - 2u&)-v*(1-2ud
Next, the value of V I for which this function is stationary must be found by setting the first derivative with respect to v, equal to zero, a situation which arises even in unconstrained problems in which the number of terms exceeds the number of variables by more than one. Although the function is complicated, it involves only one variable, and computations are facilitated by working with the logarithm of g. The solution is v4 = 19/62, V I = Va = 27/62, V I = 4/31, and 20' = 3.44 X 10'. VOL 57
NO. 8
AUGUST 1965
27
Inequality constraints introduce “relative optimality” The constraint has forced a decrease in the weight of the reactor term and a corresponding increase in the other weights. Inequality Constraints-Kuhn-Tucker
Conditlons
Consider now the more general situation where the side conditions are inequalities rather than strict equalities. Let there be p such differentiable inequalities arranged as follows:
fh32
0 ; k = 1,
,
. .,!J
(55)
Assume further that negative values of the variables xi are forbidden. x1 2 0; i = 1,
W d M
. . ., &+qa&,g
2
9000
To get this into the form required by Equation 55, let fi(x1,
xs) E xiX2
- 9000 2 0
(57)
The points satisfying this inequality lie above and to the right of the dashed line of Figure 3, the line itself being composed of points where equality holds. Consider a point S! where exactly rn(o;i =
I,
..., m
(58)
It is possible to derive conditions which must hold at a constrained relative m i n i u m x’ by considering the Taylor expansion of the objective function in the neighborhood ofx‘ (Equations 20,31,32, and 33). 28
+
Moreover, if any of the xh’ are zero (h = rn 1, . . .,n), then only non-negative fluctuations xh are permitted, which implies that the corresponding constrained derivative cannot be positive.
($)‘>
0 ; h = rn
+ 1, ..., n
(61)
If any of the xh’ are positive, the difference xh - x,’ may be of arbitrary sign, in which case the constrained derivative must vanish. This relation between x,’ and (@/ax,) ‘ may be written
(2)’
xh’ = 0 ; h = rn
+ 1, . . ., n
(62)
Because both x,’ and (ay/axh)’ must be non-negative, Equation 62 implies that if either is positive, then the other must be zero. The corresponding condition for the constraint function is trivially true, because f1 = ... = f m = 0.
A
+
.
.
(56)
The inequality constrained optimization problem is to find values of the xi to satisfy Equations 55 and 56 for which the objective function y(x,) is optimum. When all the functions are given in algebraic form the problem is called nonlinear programming; if all functions are linear, it is called linear programming. As an example, let the equality constraint Equation 18 for the hypothetical chemical plant be replaced by the inequality xgt
At a relative minimum the left member must be nonnegative. Consider points x close enough to x‘ that all the dependent XI, . . ., xn remain feasible (non-negative) and that the higher order terms O(f:, xnp) are negligible in comparison with the linear terms. Only points I, where the independent fi(x), . . ., f&) and x M I, . . , x. are non-negative, are allowed. Hence, each constrained derivative (ay/af,)’, j = 1,. . .,rn must be non-negative at a relative minimum.
INDUSTRIAL A N D ENGINEERING CHEMISTRY
Equations 62 and 63 together are called the “complementary slackness principle.” Originally derived by Kuhn and Tucker (46),their expression here in terms of constrained derivatives has been described formally (8,64). Equations 60 to 63, a modified form of what are called the Kuhn-Tucker conditions, can be used to test a point for relative optimality. If the conditions hold, the point is a relative minimum; if not, then by using the reasoning employed above to derive them we can always find a point where the objective function is improved. Thii testing procedure is at the bottom of most algorithms now available for finding constrained optima. Consider their use in finding the minimum of Equation 1 subject to Equation 57 in the hypothetical chemical plant. As a first guess, choose any point which satisfia Equation 57; for example, xl = 2000; xp = 7. At this point the constraint is not binding, and so any uncon-
-
strained optimum-seeking method is applicable. Suppose sectioning is employed, so that x1 is allowed to find its minimum for x z held at 7. As shown in Figure 2, the unconstrained minimum on this line is at x1 = 380, but this point violates the constraint. Hence, we must take x1 = 9000/7 = 1283 p s i . , at which point the constraint is satisfied as an equality. Here m = 1 (one constraint vanishes) and one of the original variables (say X I ) must be treated as a dependent variable. The constrained derivatives are given by
i=l,
..., m;
h=m+l,.,., n
(67)
Equations 64 and 65 follow immediately from Equations 32 and 33 by formal replacement of y ( x , ) by fl(xi), because the latter functions may be considered variables which, like y , depend on the x,. Equations 56 and 57 are obtained by solving Equation 24 for the dependent x i - xi' (i = 1, . . ., m) by Cramer's Rule and differentiating the result partially with respect to x h (h = m 1, . . .,n). The Kuhn-Tucker conditions can be applied even to constrained optimum-seeking problems where the unconstrained derivatives must be estimated by evaluating the effects of perturbations on the objective and constraint functions. These estimates can be combined by the Jacobian formulas to give local values of the constrained derivatives needed to make the Kuhn-Tucker test and find better solutions (67). Although this suggestion is new and untried on optimum-seeking problems, the Kuhn-Tucker conditions form the basis of many algorithms applicable to problems where the unconstrained derivatives can be computed exactly. Such situations will be surveyed next-first linear programming, then nonlinear programming.
+
The latter derivative is negative, thus the Kuhn-Tucker conditions are violated and the point cannot be a relative minimum. Improvement can be obtained by following the constraint fl = 0 in such a way as to decrease x2. This problem was solved earlier, when the low point along the constraint boundary was at X I = 1500, x z = 6. At this point ay/axz vanishes, because it is the lowest point on the constraint. The other constrained deriva~ X117.2 ~ - ~ tive is ay/afl = 1 0 0 0 ~ ~-- ~4 X ~ O ~ X ~ - = > 0. Because f1 = 0, the complementary slackness conditions of Equations 62 and 63, as well as the nonnegativity conditions of Equations 60 and 61, are all satisfied. Hence the point is a local minimum. Certain other derivatives useful in the search for constrained minima are given below (67).
linear Programming
A linear programming problem arises when the objective and all the constraints are linear functions of nonnegative values. n
n fj(X)
=i = l xi
af,
alp,
2 0; j
2 0; i
=
1,
= 1,
.. .,p
. . .,n
(70)
where the ci and are constants. The linearity makes all of the unconstrained partial derivatives constant, which means that for each choice of independent variables and tight constraints, the constrained derivatives are also constant and independent of the point chosen. This makes it easy to follow the constraints, a fact exploited by G. B. Dantzig (76, 77), in his simplex algorithm. He proved, moreover, that at the optimum all of the independent nonbasic variables must be zero. This means that only a finite number of points need be considered and that at each such point, called a basic solution or a vertex, the complementary slackness conditions are automatically satisfied. The algorithm starts with any basic feasible solution and computes the constrained derivatives (called dual variables in linear programming terminology). In principle, this could be done by evaluating the Jacobians VOL. 5 7
NO.
8
AUGUST 1965
29
of Equations 2 6 , 2 7 , 64, 6 5 , and 6 7 , but the linearity permits the use of matrix inversion and multiplication, which accomplish the same task more efficiently and systematically. We then inspect the signs of the constrained derivatives. If they are all non-negative, then the Kuhn-Tucker conditions are completely satisfied, which means the point is the global minimum. If not, then one of the independent variables with a negative constrained derivative is increased in value until some dependent variable goes to zero. The old independent variable is then considered dependent (basic), while the formerly dependent variable is regarded as independent (nonbasic) . This procedure must decrease the objective function, so the new basic solution is necessarily an improvement over the old one. At the new point the process is carried out again. Iteration of the procedure eventually generates the minimum sought, usually after about twice as many points as there are constraint inequalities are examined. The best linear programming computer codes at present can handle problems with as many as 5000 variables and 1000 constraints. Numerical examples of the simplex algorithm in a chemical engineering context are available (5, 60, 61). Nonlinear Programming
Space does not permit more than a cursory description of the various algorithms that have been proposed to handle nonlinearities in constrained optimization. Wolfe noted that, when the objective function is quadratic and the constraints linear, the original quadratic programming problem can be expressed as a linear programming problem with more constraints than the original quadratic one (69). This is accomplished by treating the constrained derivatives as non-negative unknowns to be determined along with the optimal values of the original variables. Although it expands the size of the problem, Wolfe’s method permits the use of the simplex algorithm to solve the problem. Methods for solving quadratic programming problems without expanding the number of constraints appreciably have also been proposed (4, 6, 7‘7, 75). These techniques, which would seem most effective when the quadratic part of the objective does not involve many of the problem variables, use a logic like that of the simplex method, with special provisions for calculating the constrained derivatives from the linear derivatives of the quadratic objective. Dorn has reviewed quadratic programming (20). Rosen’s gradient projection method (53, 54) can deal with nonlinear constraints as well as objectives. It uses all of the constrained derivatives calculable by the Jacobian equations, but instead of adjusting only one variable at a time as in the simplex method, it advances along the constrained gradient. Zoutendijk’s method of feasible directions (74) linearizes the problem in a region containing each trial point and then advances to a better point by solving the resulting linear programming problem. Kelley’s cutting plane method (45) successively approximates each critical nonlinear constraint by a number of linear ones so as to converge to the correct solution using the simplex method. An un30
I N D U S T R I A L A N D E N G I N E E R I N G CHEMISTRY
published algorithm (Solver) by Wilson (68) recognizes that in nonlinear problems the task of following the curved constraints is difficult, and so it concentrates on handling the solution of the resulting nonlinear equations efficiently. Hadley’s text on nonlinear programming (34) merits the attention of anyone interested in the subject. Carroll (72) suggested approximating a constrained optimization problem by an unconstrained one which could be solved by optimum-seeking methods. As modified by Fiacco and McCormick, who added rigor to the theory and developed many extensions (26, 27)) the technique is to optimize the function
where ro is a positive constant chosen arbitrarily; y is the objective function; and the fh are the constraint functions of Equation 45. Let z be minimized with respect to the x t , and let the minimum occur at x*(rg). This point cannot be where any of the fk vanish, because f k - I would be infinite there, so x*(rO) will be a feasible point which cannot be optimal for the original problem if any of its constraints are binding-i.e., satisfied as equalities. To obtain a point better for the original problem, a constant rl( r l > r2 > . . . approach zero, the corresponding fictitious minima x*(ro), x*(rJ, x*(rg), . . . approach the true constrained minimum. Inequality Constraints-Geometric
Programming
The elegant techniques of geometric programming can be applied to nonlinear programming problems of a 1 Duffinrestricted but often useful form (23). Let m Zener generalized polynomials z k ( k = 0, 1, . . ., m ) be defined by Equations 39 and 40 and suppose we wish to find positive x i (i = 1, . . . , n) that minimize zo subject to the inequality constraint
+
fh
= 1
-
zh
20
Because the groundwork for this problem has already been laid in the development of Lagrange multipliers, constrained derivatives, and the Kuhn-Tucker conditions, the geometric programming formulation can be written with few preliminaries. By Equation 60, at the minimum the constrained derivative (bzO/dfk), which has already been shown to be the Lagrange multiplier Xk whenever the kth constraint is satisfied as an equality, must be nonnegative. Hence the weights of geometric programming, which have the same sign as their Lagrange multiplier when their constraint is an equality, must be nonnegative. Moreover, complementary slackness Equation 6 3 guarantees that if the constraint is not satisfied as an equality, then the constrained derivative, and hence the weights for that constraint, will be zero. Hence the equivalent problem formulated by geometric programming is to find the non-negative weights
D,
which satisfy the orthogonality and normality condi-
tiOnS h(4
Ca,?,=o; i =
1,
i-1
..., n
(8)
(73)
h(0)
C
i-1
w
j
=1
(74)
*
(75) or its logarithm. This is itself a nonlinear programming problem, but now all the nonlinearity is in the objective function rather than the constraints. If the linear equality constraints are used to eliminate variables in the auxiliary functions, then there are no constraintsleft except those requiring non-negativity of the weights Dynamic and l a n o Syshm Oplimiralion
CI
Omfmw Ru. l2 W, 295 (1964).
(11) C.ndlcr. W., Towuky, R I.,M#.&id.10 (3), 515 (1%)). (12) Curoll. C.W.. Opmdiom Ru. 9 (Z), 169 (1%1). (13) Cauchy, A,(kmpl. RnlSd. ( P d26,536 (1847). (14) Cnumt R., “DiBerenti.l a d Intqp.1 Cdeulw” VoL 11, p. 142, Inter. -N&Ymf1947.
UP
I
Bcmhole, E.
(9) Bmth. A. D.. *‘Nunm&d Metkds,’’ B u t t e Landon, 1957. (10) Bor, 0. E. P., Whn, K. E., J . Rq. sI.I.Sr. Bl6, 1 (1951).
(15) ~ w f ~ J . D . , M S . t b r d g U d w i t l o f T a u . A ~ ~ J m u u y 1 % 3 .
and which also maximize the auxiliary function
.
(6) B c t h t l a , C . S . , W d d c , D . J . , ~ P n r r r r . P I M . ~ H ( 2 ) , 1 1 1 ( 1 9 6 5 ) . S., “ A m WW d ; Rimeton ’ Udv.
& (d -RLXp
Space does not permit proper discussion of the problems of optimizing large structured systems or those which vary withdime. The basic idea of macrosystem techniques is to partition a macrosystem into smaller subsystems and then optimize each subsystem separately, taking proper account of how the subystems interact. Bellman’s dynamic programming (7,7) is useful when the subsystems are in series, either in time or in space, and it has been applied to chemical reactor design (Z), process design (a), maintenance (67), and catalyst replacement (52). Serial problems can also be handled on occasion by the classical calculus of variations (37) and its extensions (40,43),developed in the study of reactor design, which were related to the maximum principle of Swinnerton-Dyer (59) and Pontryagin (57). This has proved valuable to the theory of optimal automatic control. The version known as the discrete maximum prihciplt, while not entirely correct in the form given in the original work (43),is valid in certain simple practical cases (25). Before using it, however, a beginner should examine the counterexamples of Horn and Jackson (47). The serial techniques have been extended so that they can be applied to nonserial system with loops and branches (3, 42, 50, 6.3). Macrarystems organized as companies with branch plants lend themselves to the dewmpoaition principle when the optimization problem can be formulated in terms of linear programming (78, 62). Nonlinear situations can sometimes be treated by partition pmgramming (55) and the multilevel approaches of Lasdon (48). The logical techniques of Steward will be diantMed in a forthcoming book on process design by Rudd and Watson.
(16) Dantzig,G.B. “Asrivity~yi.ofRodunio.mdAUou.lon:’T.C.Kmp mans, d.,pp. 33647, WUq, N n r York, 1951. (17) Dultzig, 0. R, ‘‘b ~ ~ M d B x t ~ ” R i tmq c 1963. . u (IS) D . n t . i ( r , G . R , W o l f + P . , D ~ R u . ~ l O l(1960). (19) Dmidon, W. C ,A.E.C. Ru.6.. W .ANL-5990 (Ikambc.1959) (20) b, W. S.,
[email protected]. 9 (Z), 171 (1963). (21) Dv&, R. I.,J. Sr.I d a &A M . M.U. 10, 119 (1962).
(27)rbi&.,(4), p. mi. (28) altchr,R., M I , M.I. D., C e J . 7,163 (1964). (29) F e q t h q D . E..MotrLin,T. 8. M1.h.M&,Soe, 57,304 (1951). (20) FriedM. Sa” L. S. “&*EM T M u - of Su&tiul A d & M&w-liiil. N& Y%1947.
of Maxima and Minim*” Chap. 111, N,and V, mkkalcdilirm, 1917. (36) H u b # , A,, h.%.h,r.+n#.Sm. 50 (60). 35 (1964). (37) Hcltera. M . R . 8tidd.E.. J . Ru. Nd.Ba.SM.49.409 (1952). (38) -th, P.R.,~ n u . r u l~. aE .~ro,. 345 (1962). (39) Hmkq R., IccwT. A , J.rlnr. C w . M d . I (Z), 212 (1961). (40) H m , F., Ma.E q . S d . 15, 176 (1961). (41) H a P . J&, E.I=. &a. Cam.h. 4 (1). 110 (1%! (42) J&n, R, h. Eqg.Sd. 19,19 (1964). (43) G t z , S., Ann. N . P.A d . Sti. 14 (12). 441 (1960). (44) E.=, S, Im.E m . b. h. 1 (4), 226 (1962). (45) Kslky, I. E., SIAM J . 4 703 (1960). (46) Kuhn H W , Tucker, A. W., Roc, 2nd Ball S mp Math. Stat. Rob., pp.~11-ii,j.N~,ed.,w*lnl.~d~wof~~~11,i95i.
(54)
M., Y, 114 (1961). I.E., &ea, I. C.,
[email protected] (1). 160 (1963). I.E.. whi-, E. A,. “Colle6c +bra,’’ oion and Co., h
(51) R-, (16) 375 -..
1939:
m ,
REFERENCES (I) Ar% R, “ M t e S w d F T o s r a m m l ~ BhideU, ’~ N m Ymk, 1964. Ari., R.. ‘The Opurml D d p of Chcmiul RC.Et4’’ Aodcmis Rs Nwk, 1961. (3) Aria R.. N h , 0. L., W d e , D. I., kr. Iwl. Ma. E w . J . 10 (6). 913 (1964j. (4) &le, E. M. L.,Nw. Ru.L.#. Q&. 6.227 (1959). Febovy (5) Mghtkr, 0. S., Crawford, J. D., WddsD. I.. Uou.T m Td.w..,
“t
1963.
V O L 5 7 NO. 8 A U G U S T 1 9 6 5
51