Conjugate Gradient Method for Nonlinear Programming Problems with

Publication Date: February 1968. ACS Legacy Archive. Cite this:Ind. Eng. Chem. Fundamen. 1968, 7, 1, 142-151. Note: In lieu of an abstract, this is th...
0 downloads 0 Views 1MB Size
CONJUGATE GRADIENT METHOD FOR NONLINEAR PROGRAMMING PROBLEMS WITH LINEAR CONSTRAINTS DONALD GOLDFARB’AND LEON LAPIDUS Department of Chemical Engineering, Princeton Uniuersity, Princeton, N . J .

08540

A new method for solving nonlinear programming problems in which the constraints appear linearly, starting from an initial feasible point, locates new feasible points in an iterative manner by moving in a direction conjugate to previous directions of search. As such, it may b e looked upon as an extension of the work of Davidon, Fletcher, and Powell to problems involving linear inequality constraints. As the method proceeds, information about the local curvature of the objective function is incorporated into a matrix which determines the direction in which to move and acts as a variable metric for each maximization problem. Numerical examples illustrate its versatility as compared to other procedures.

of the major uses of nonlinear programming techniques in obtaining optimal solutions to economic problems. The chemical engineer, besides being concerned with the technical feasibility of a process or combination of processes, must also be aware of the economic aspects. For example, in the design of a refinery the profit expected is a function of the distribution of products, which is itself dependent upon the sizes of various process units incorporated into the over-all design. The functionality involved is, in general, nonlinear, while linear constraints may be present as a result of the existence of natural restrictions on certain physical parameters and the necessity of fulfilling certain customer requirements. Other types of economic problems solved by nonlinear programming techniques in the chemical and petroleum industries range from determining the optimal research policy for a company to finding the best solution to a gasoline blending problem to scheduling problems in underground oil production. I n addition, chemical equilibrium in complex mixtures (White et al., 1958), nonlinear parameter estimation (Box, 1965), and optimal control of dynamic systems are examples of areas where nonlinear programming has proved to be a fruitful tool. I n this paper a new method is described for solving the subclass of nonlinear programming problems characterized by linear constraints. I n general, the nonlinear programming problem is one of maximizing (minimizing) an arbitrary nonlinear objective function subject to a set of inequality and equality constraints. This method, the conjugate gradient method, is based upon Davidon’s variable metric method (1959) for unconstrained minimization. It is novel among nonlinear programming techniques in its ability to use the local quadratic properties of the general nonlinear objective function effectively and continually without requiring calculation of second partial derivatives. NE

0 has been

1 Present address, Courant Institute of Mathematical Sciences, New York University, New York, N. Y.

142

IhEC FUNDAMENTALS

Starting from an initial feasible point, the conjugate gradient method locates new feasible points with ever-increasing values of the objective function. At each iteration the algorithm moves in a direction obtained by operating on the vector gradient, g(x), with a matrix, H,, which acts as a variable metric for each maximization problem. Also, for successive steps taken without a change in the constraint basis, this matrix is modified after each iteration so as to produce mutually conjugate directions of search-that is, for a quadratic objective function whose matrix of second partial derivatives is denoted by G, the directions of search determined by H,g(x) are G-orthogonal so long as no constraints are added to or removed from the constraint basis. Necessary and sufficient conditions for a global maximum may be derived for the case of a concave objective function, and it can be shown that a modified version of the algorithm locates the solution of a quadratic programming problem in a finite number of steps (Goldfarb, 1968). Following the terminology of Fletcher and Powell (1963), this property is referred to as quadratic convergence. This paper concentrates on presenting a general description of the conjugate gradient method along with the results of some numerical experimentation. A detailed exposition of the theory surrounding the method is given by Goldfarb (1968). Previous Work

Interest in the theory of mathematical programming and in the solution of programming problems was kindled in 1947 by Dantzig’s discovery of a practical procedure for solving the linear programming problem (1951). This was the simplex method, which in its revised form remains probably the most successfully used technique for solving the completely linear case. In 1951 Kuhn and Tucker (1951), expanding on some earlier work of John (1948), related the nonlinear program-

ming problem to an equivalent saddle-point problem and formulated necessary and sufficient conditions for a constrained maximum. This paper spurred the development of methods for solving the general problem as well as its subclasses. Extensive bibliographies and general descriptions of these methods have been published (Arrow et al., 1958; Dorn, 1963; Hadley, 1964; Riley and Gass, 1958; Wolfe, 1963). For comparison to the present work, methods that may be categorized as large step gradient methods and those that are specifically designed to treat linear constraints are of interest. I n the second category fall all quadratic programming algorithms, the most popular of which seem to be those of Wolfe (1959) and Beale (1959). Of special interest is a quadratic programming procedure developed by Zoutendijk (1960, section 10.2) because of its relationship to the conjugate gradient method detailed here. Methods that fit both of the above categories include Rosen’s well known and successfully applied gradient projection method (1960), the techniques of Frisch (1958) and Lemke (1961), and the quadratic programming algorithm of Frank and Wolfe (1956). A number of constrained gradient schemes belonging to the first category have been developed by Zoutendijk (1960). A theoretical comparison of these methods with the conjugate gradient method is given by Goldfarb (1966, 1968). However, since the conjiugate gradient method is in many ways analogous to Rosen’s method, the two techniques are compared here in some detail. Unfortunately, numerical and computational evaluation of existing techniques has been limited. T h e most recent attempt is found in a numerical survey by Zoutendijk (1966). Some computational results have been reported by Rosen (1960) and Fiacco and McCoi-mick (1963). Witzgall has conducted tests to compare Rosen’s and Frisch’s methods with the simplex algorithm for the linear programming problem (1963).

convergence for a general nonlinear function. One such maximization (minimization) technique is the variable metric method developed by Davidon (1959) and later simplified by Fletcher and Powell (1963). T h e property of quadratic convergence is equally important to nonlinear programming procedures. Therefore, modification and extension of the work of Davidon and of Fletcher and Powell to the case of maximization under linear inequality constraints appeared to be a promising avenue of attack. Moreover, computational experience has indicated that this technique “is probably the most powerful general procedure for finding a local (unconstrained) minimum which is known at the present time” (Fletcher and Powell, 1963). A recently published numerical study by Box (1965), comparing several maximization algorithms, including Davidon s method, is typical of such experience. Problem Statement and Geometric Interpretation

The algorithm described in this paper is designed to solve the general problem of finding a local maximum (or minimum) of a function Ax) =

fhl,

’ ’



, xm)

(1)

of rn variables x t , i = 1, 2,. . . , rn subject to linear inequalities and equalities of the form m

m

ni,xj

- bi

= 0

i

+ 1,. . . , p

k

=

j-1

where the nil have been normalized by (4)

Motivation

In developing techniques for solving constrained problems, much can be gained from a knowledge of the established methods for unconstrained maximization (minimization) (Spang, 1962). Along these lines two possible approaches can be followed. One is to transform the constrained problem into an unconstrained maximization or series of such problems, followed by the use of known techniques to solve each maximization. This is the fundamental idea behind Carroll’s created-response-surface-technique (1 961), which was subsequently given a theoretical basis by Fiacco and McCormick (1963, 1964a, 1964b). This “penalty-function” type of approach for handling constraints is also illustrated in the procedure of Rosenbrock (1960). The other, and theoretically more satisfying, approach is to modify and extend methods of unconstrained maximization so that they can handle inequality constraints directly. Probably the best example of this approach is Rosen’s gradient projection method. This extension of the well known steepest ascent technique to maximization problems involving constraints has, however, i.he same deficiencies as its precursorLe., slow rate of convergence near the maximum and oscillation along steep ridges. Usually some antizigzag scheme must be employed to overcome these difficulties. Since near an unconstrained maximum (minimum) the second-order terms in the Taylor series expansion of the function dominate, only those methods that are quadratically convergent-that is, techniques that guarantee finding the maximum of a general quadratic form in a finite number of steps-will show fast

(3)

This form also includes as a special case the nonnegativity restrictions which so often appear in programming problems. I n terms of finite dimensional geometry, x i , i = 1,. . , , rn, are the coordinates of a point in the rn-dimensional Euclidean space, Em. Point x can also be represented by the column vector x =

{XI, x2,.

xm}

. . I

A row vector will be written as x ‘ where the superscript ‘ denotes the transpose. Whenever the construction xy ’ appears, it denotes a matrix operator with elements xiyr. Subscripts on vectors usually indicate components while superscripts in general differentiate between points. Where this is not true, the meaning should be obvious, as in the following. Let nt represent the vector ni = { n i l , . n i m ] . Then Equations 2 through 4 can be written as

..

ni’x

- bi

+ 1,. . . , p

0

i

=

k

ni’ni = 1

i

=

1,. . . , p

=

where

I n geometric terms these inequality and equality constraints represent k closed hdlf-spaces and (p-k) hyperplanes. T h e hyperplane corresponding to a strict equality for a particular i VOL. 7

NO. 1

FEBRUARY 1968

143

in Equation 5 is called the “defining hyperplane” for the associated half-space. T h e convex polyhedral region R in Em formed by the intersection of the closed half-spaces and hyperplanes corresponding to Equations 5 and 6 is assumed to be bounded. For this to be true, it is necessary t h a t p 2 m 1. Region R will be called feasible and any point x such that x E R will be called a feasible solution. The boundary B of R is itself a bounded region and is formed by the intersection with R of the union of the k supporting hyperplanes given by Equation 5 written as an equality. The term “constraint basis” refers to the set of linearly independent defining hyperplanes (constraints written as equalities) in whose intersection, M,, movement is restricted. At times reference is made to the linear manifold, M,, to emphasize the geometric aspects. A set of hyperplanes is a linearly independent set if the members of the corresponding set of unit normals to these hyperplanes (the nl in Equations 6 and 7 ) , are linearly independent. If f ( x ) is differentiable a t all points x in the m-dimensional Euclidean space, Em, each point x has associated with it a n

+

m-dimensional vector gradient, g ( x ) =

{y}.

T h e com-

ponents of the gradient g ( x ) can be considered as coordinates of a point in a second m-dimensional Euclidean space, the relationship between the two spaces (d/dx,:x + g ) , being in general a many-to-one transformation. Furthermore, if f ( x ) is twice differentiable, then in any small neighborhood about a point x in Em the Hessian matrix of second partial derivatives, G(x) =

,

defines a linear mapping of

changes in position, dx, onto changes in the gradient, dg, as given by : dg = G(x)dx

(7)

The vectors dx and dg will point in the same direction if and only if dx is an eigenvector of G ( x ) . They may, in fact, point in widely differing directions, depending upon the ratios of the eigenvalues of this matrix. Consider the problem of finding the point a t which a strictly concave quadratic function

is maximized. I n this case the matrix of second partial derivatives is equal to the constant negative-definite matrix G over the entire space Emand one can write (9) I n the absence of constraints-that is, where the feasible region R is all of Em-a necessary and sufficient condition for a global maximum is that g = 0. T h e point a t which this maximum occurs can then be found in one step by computing

where G-’ is the inverse of matrix G. This, of course, is just Newton’s method (sometimes referred to as the second-order gradient method), for solving the set of m equations in m unknowns, g j ( x l , . . ., x,) = 0, J’ = 1, 2,. . ., m, that result from the necessary and sufficient conditions for a global maximum of a strictly concave function. Here, the Jacobian of f ( x ) is just the Hessian matrix of the original function which is being maximized. It is obvious from the discussion following Equation 7 that the method of steepest ascent-Le., moving in 144

l&EC FUNDAMENTALS

the gradient direction a t each iteration-and Newton’s method will, in general, follow very different paths to a maximum. I n the preceding discussion it was assumed that f ( x ) was a quadratic function of x . If this assumption were true or nearly exact, then Newton’s method might be the optimal maximization scheme to use. However, in most actual problems f ( x ) is not quadratic and this assumption is reasonably accurate only in a small neighborhood of a point. Therefore, evaluation of the Hessian matrix and inversion of it a t each step might not be the best policy to follow, since computationally these procedures can be time-consuming. In fact, if the partial derivatives of f ( x ) are approximated by finite differences, l/?(m l)(m 2 ) function evaluations are required just to compute the Hessian matrix a t each iteration. A more serious drawback of second-order gradient methods is that for nonconcave functions, convergence cannot be guaranteed for starting points not close to the maximum. Davidon’s variable metric method (1959) and its reformulation by Fletcher and Powell (1963) take a more satisfactory approach. I n this method a n approximation to the inverse of the Hessian matrix of second partial derivatives is initially chosen. This approximation is then improved with each subsequent iteration according to the actual relationship between the changes in g and in x , resulting in a technique that combines the advantages of the method of steepest ascent with those of Newton’s method while avoiding their major disadvantages. As in the former method, Davidon’s technique generates directions of search that are always “uphill” and does not require the calculation of second partial derivatives, while as in the latter method, it exhibits quadratic terminal convergence. Consider now the problem of locating the point a t which the quadratic function f ( x ) given by Equation 8 assumes its maximum value in a linear manifold M , formed by the intersection of q linearly independent hyperplanes starting from a point x t in this subspace. As before, Equation 9 is valid. However, now a necessary and sufficient condition for f ( x ) to be the global maximum of f ( x ) over the linear manifold M , is that the gradient at x be orthogonal to M,-that is,

+

+

g = Nqa where N , = { nl,. . . , n,} is an (m X q) matrix, the columns of which are the q unit normals to the hyperplanes whose intersection is M, and where a is a q-dimensional vector. Substituting g f f l = N,a into Equation 9 and premultiplying both sides by G-l yields xf+1 -

xt

=

G-l(N,a

- gi)

(10)

The vector ( x i + 1 - xi) lies in the linear manifold, M,. Therefore, the left side of the above equation premultiplied by N,’ is zero. N,’(xi+l

- xi)

= 0 = hT,‘G-”,a

- N,‘G-lgi

Since the rank of N, is q and that of G-‘ is m 2 q, the inverse of N,’G-’N, exists (Courant and Hilbert, 1953) and a is given by a = (Nq‘G-lNq)-lNq’G-lg

Hence Equation 10 can then be written as %(+1=

Xf

+ [G-lN,(N “G-lNP )-1NP ’G-1 - G-11 g f

or more simply as xi+l

= xi

- P,G-lgt

(11)

where

is a projection operator that projects Em onto M , according to the non-Euclidean metric, G-I. An interpretation of this result and a comparison with the unconstrained case are instructive. First, consider the analog of the method of steepest ascent when constraints are imposed. If motion is restricted to the linear manifold, M,, the direction of steepest ascent in this manifold is Pqgzwhere

P, = Z - 2?‘q(Kq’Nq)-1N,‘ is the orthogonal projection operator that projects all vectors in Em onto M,. This type of approach leads to Rosen’s gradient projection method for solving nonlinear programming problems subject to linear constraints. Rosen’s method includes the necessary logic for determining when to add to or drop a constraint from the constraint basis as it proceeds toward a maximum and depends heavily upon recursion formulas for computing (4Vq7I ’ ~ V ~ + I and ) - ~ (4?’q-1’Kq-~)-1 from (Nq’iVq)-l, thus facilitating the computation of the associated orthogonal projection operators, P q t l and P,-I. T h e iterative scheme of Equation 11 is the analog of Newton’s method when linear equalities are present as side conditions. Recursion relations for computing (Kq+~‘G-lNq+1)-l and (.Y,-1’G-lAYq-1)-l from (Llrq’G-l~Vg)-l, when a constraint is either added to or removed from the constraint basis, are easily derived and a “second-order gradient projection method” could be developed that parallels Rosen’s method. However, such a technique would have the same drawbacks as the unconstrained Newton’s method-namely, the required computation of a matrix of second partial derivatives: the inversion of this matrix at each step, and the lack of a guarantee of convergence except for starting points close to the maximum. This method is ideally suited to quadratic programming problems and some algorithms based upon it have been developed (Goldfarb, 1966). As in the unconstrained case, a better method for the nonlinear problem is one based upon Davidon’s original approach. One such method is the algorithm described in this paper. I n it a trial value for the operator -Pb,G-’ a t a point in M , is initially specified. This is the positive semidefinite matrix, H a . As hyperplanes are either added to or removed from the constraint basis, H , is charged so that it continues to map all of Em onto the appropriate linear manifold. Furthermore, information gained at each step is used to update H , according to the formulas of Fletcher and Powell (1963). Consequently, it can be proved that if G is constant and if m - q successive steps are taken in M,, where m - q is the dimension of M,, the approximating matrix H , will, in fact, be equal to -f‘,G-l within these m - q iterations. It is also easy to see that H, acts as an approximate and variable metric for each problem, since for the quadratic case

plying the vector gradient by the matrix H , rather than by the orthogonal projection matrix, P,, as in Rosen’s method. The heart of the algorithm, however, lies in the three formulas for updating the matrix, H,. The first two are used to modify Hq when a hyperplane is either added to or dropped from the constraint basis, while the third formula, which is due to Fletcher and Powell, is used to improve the approximation of H g to -kb,G-l. As in the gradient projection method, the - ~required a t each step. Recursion formulas matrix ( N , ’ I V ~ ) is that allow this matrix to be computed with a minimum number of calculations were developed by Rosen for use with his method and are given in the Appendix. The conjugate gradient method also requires some technique for locating the approximate maximum off(x) along the direction of search under the restriction that the new point lie in R. A cubic interpolation procedure, based upon one used by Davidon, was adopted and found satisfactory. I t is briefly described in the Appendix. Let the current point be .xi with a function value off(xi) and a gradient g i = g ( x i ) . I t is assumed that x i is in R and lies in the (affine) subspace M , determined by the intersection of q linearly independent hyperplanes. T h e matrix operator H is therefore Hqi. I n order to simplify the presentation of the algorithm, it is also assumed that there are no equality constraints of the form of Equation 3. The way in which they are handled is described below. The basic conjugate gradient algorithm then proceeds as follows : STEP 1. Compute Hqigi and a = (A‘, ’N,) -IN* ’g If Hqigi = 0 and a 0, x i is a (constrained) stationary point. STEP 2. If this is not the case, either IIHqigiii > max 1 1 a,b,,-li2) or liHlgill 5 - aqbq,--l/2,where a,b,,-1/2 2 2 aibii-1’2, i = 1, . . ., q 1, and where bii is the ith diagonal element of ( N , ’ N , ) - l . (By Equation 25 in the Appendix, bit > 0.) If the former holds, proceed to Step 3. If the latter applies, drop the qth hyperplane from the constraint basis and obtain H v P l ifrom

(0,

-

H,-12

H,i + P,-ln,nq’P,-l n, ’Pq-ln,

(13)

where P,-1 = Let q = q

z - “~l(Nq-l‘Nq-l)-~~vq-l’

- 1 and return to Step 1. Let si = Hqigi and compute

STEP 3.

A, = -(nj‘xi =

min { A ,

- bj)/nj’si

(j = q

+ 1,.

, ,

,k)

> 0)

+

Obtain the y‘, 0 < y i X i , which maximizes f ( x i yisf). Set a i = yisi and x i + l = x i ri and calculate gi+l = g ( x i + l ) . STEP 4. If T i = X i and si’gt+I >= 0, add to theconstraint basis the hyperplane corresponding to the min { A,) in Step 3. Then compute H,+li+l as follows:

+

Conjugate Gradient Algiorithrn

The algorithm described here is essentially an extension of Fletcher and Powell’s ,version of Davidon’s variable metric method to maximization under linear inequality and equality constraints. In its over-all strategy the method closely follows Rosen’s gradient projection method, with the principal difference that the directions of search are determined by premulti-

=

Set q = q STEP 5.

+ 1 and i = i + 1 and return to Step 1. Otherwise set y i = gi+i

- g’

and update H , as follows: H,i+l = H,i VOL 7

+ Ai + Bi

NO. 1

FEBRUARY 1968

(1 5) 145

T h e constraint basis, and therefore M,, remains unchanged. 1 and return to Step 1. Set i = i

Hence, if H , were equal to i),G-', then Hg+l,as given by Equation 14, would, by the above recursion formula, be equal to i),+'G-l. While restricted to a particular subspace only curvature information pertaining to it is incorporated into H,. Therefore, when lIH,gll satisfies the test of Step 2 for dropping the qth hyperplane, Equation 13 effects the modification of H Q in an unbiased way so as to remove the restriction of remaining on it. If q is set equal to q - 1 and G-' is set equal to I, the recursion relation of Equation 16 can, after some rearrangement of terms, be written as

I n the actual computer code an initial feasible point is obtained using a procedure developed by Rosen for use with his gradient projection method (1960). If the initial feasible point is an interior point of the allowable convex region R and if there are no equality constraints, HOO is initially chosen to be any positive definite symmetric matrix, usually I, the identity matrix. However, if the initial point lies on exactly q linearly independent hyperplanes, HqO is computed from HoO by using the formula of Equation 14 q times, resulting in some appropriate weighted projection matrix ?,. When there are subsidiary equality constraints, a largest linearly independent subset of them is included in the initial constraint basis and H,O is computed as just described. The algorithm then proceeds as in the case of no equalities, except that the components of a that correspond to the equality constraints are not included in any of the inequalities in steps 1 and 2 that involve a. Goldfarb (1968) shows that if H , is initially chosen so that:

Therefore, if Hq were identically equal to P,, Equation 13 of Step 2 for dropping a hyperplane from the constraint basis would give rise to a matrix Hq+ equal to f',-. I n the general case where H , is not necessarily equal to P,, the second term on the right-hand sides of Equations 13 and 17 is an orthogonal projection operator which projects any vector in E m into the line in M,-l which is orthogonal to M,. This line is given by flP,-ln, where /3 is an arbitrary scalar. Equation 15 of Step 5 modifies H , so that weref(x) quadratic, H , would equal -P,G-' after rn q steps and mutually conjugate directions of search would be followed in M,. The term A i achieves the first objective while the remaining terms achieve the second. This is shown by Goldfarb (1968), who proves that for a quadratic objective function

where

and

+

rn-a-1

A. H, is positive semidefinite and 4

B.

x'H,x = 0 if, and only if, x =

a,n,,-i.e.,

x

1M ,

t=l

then the formulas for updating H,, given by Equations 13, 14, and 15, preserve these properties. Moreover, it is easy to demonstrate that A and B together imply that a

C. H,x = 0 if, and only if, x = i-1

aini

I t is then easy to show that H,g(x) = 0 and a 5 0 are necessary and sufficient conditions for a constrained global maximum forf(x) concave, and that these conditions are equivalent, respectively, to the obvious geometrical conditions that the gradient a t x be orthogonal to M , and lie in the cone generated by all negative linear combinations of the unit normals to the q hyperplanes whose intersection is M , (Goldfarb, 1968). From the above three properties of matrix H,, it is evident that the algorithm always moves in an uphill direction and generztes a path that is always confined to some appropriate subspace, M,. If the maximum along the direction si = Hqigi is located before a new constraining hyperplane is reached, information about the curvature of the objective function f ( x ) in this direction is incorporated into the matrix operator H , by the equations of Step 5. However, if this is not the case, it is necessary to remove from H , any curvature information which is orthogonal to this new hyperplane and to force the new H,+, to rotate any vector into the intersection of this hyperplane with the subspace M,. This is accomplished by Equation 14 of Step 4. I n addition, it is easily demonstrated (Goldfarb 1966), that

146

IhEC FUNDAMENTALS

-

and that the iterative scheme of Equation 15 for improving H , produces conjugate directions of search whether or not the term A i is included. Finally it can be proved that a variation of the conjugate gradient algorithm is quadratically convergent. Thus far the possibility that a point could lie in a subspace formed by the intersection of a linearly dependent set of hyperplanes has not been allowed. However, it can be shown that this limitation need not be imposed upon the method. If degeneracy occurs, several iterations may be required in which hyperplanes are added to and removed from the constraint set while point x remains fixed. Computational Results

A computer program of the conjugate gradient algorithm was written in Fortran and tested computationally against a similarly written code of Rosen's gradient projection method. Both computer programs included Rosen's technique for finding an initial feasible point if one was not given and handled the testing for linearly dependent constraints, the computation of (N,'N,)-' and P,, and the reinversion of the matrix (N,'N,) in the same way. T o improve convergence a three-point extrapolation scheme, originally suggested by Forsythe and Motzkin (1951) and used in a computer code of Rosen's method programmed under his supervision (Merrill, 1962), was incorporated into the gradient projection code and used whenever three successive steps were taken within the same linear manifold. I n all problems tested, HoO was set equal to the identity matrix. Then, if the constraint basis contained q hyperplanes at the initial feasible point, the conjugate gradient method was programmed so that H,O = P, was automatically computed. Although both programs could readily accommodate problems with u p to 60 variables and 150 constraints on computers with storage capacities of 32K, the test problems described here were relatively small in size. Other comparab!e

results have, however, also been obtained for much larger variable systems. T h e first of these was the following minimization of a convex cubic function of five variables subject to 15 linear inequality constraints, of which five were nonnegativity restrictions. Minimize

subject to 5 J=1

i

a i j x j ;g bi

=

1, 2 , . . .,10

and x,

20

j

=

1, 2,. . ., 5

where the required coefficients e j , ci,, d,, a i j , and b i are given in the Appendix. Starting from an initial feasible point (0, 0, 0, 0, 1) the conjugate gradient program took nine steps and a total of 13 functional evaluations to obtain a constrained minimum of -32.34868 for this cubic function. T o reach the same minimum, the gradient projection program required 11 steps and 29 functional evaluations. (In this context a step refers to a move in a different direction from the preceding move.) A total of eight basis changes were made using the conjugate gradient method, compared with nine basis changes for the gradient projection technique. T h e resulting constraints sets a t the minimum contained the same four hyperplanes in both cases. T h e execution times on a CDC 6600 were 0.7910 second using the conjugate gradient code, compared with 1,0040 seconds for the other. T h e step by step progress of the two methods for this prciblem is presented in Table I. For many applications the number of functional evaluations is the most critical criterion of effectiveness, especially in problems of optimal control sind design where each functional evaluation may require long integration schemes and involve many complicated relationships. Frequently the gradient cannot be obtained analytically, and thus the evaluation of the function and gradient a t a point requires m 1 functional computations. As exhibited by this problem, the conjugate gradient method requires many fewer functional evaluations than does Rosen’s method. This comparison is not completely equitable, since the cubic interpolation technique used in the former method probably requires fewer interpolations (and, hence, fewer functional evaluations) per step to

+

5

locate the maximum (minimum) of f ( x ) along direction si than does the quadratic scheme employed in the latter. However, some of this efficiency is also a result of having an a priori estimate in the conjugate gradient method of how far to move in direction si to reach the maximum off(x) along that direction. This is given by Equation 21 in the Appendix. No such a priori estimate is available for Rosen’s or any other first-order gradient method. Therefore, some weighted average of the ratio of the number of steps and the ratio of the number of functional evaluations taken by the two methods is a better indicator of their relative effectiveness. A comparison of the number of steps taken gives a conservative measure of the superiority of the conjugate gradient method. Even based upon this criterion, the conjugate gradient method fared better than Rosen’s method. I n more highly nonlinear problems this relative superiority should be greater, as is borne out by the results of the next two numerical experiments. One surprising result was that even with a simple objective function and analytically obtained derivatives as in this case, the conjugate gradient method took less computing time in spite of requiring matrix manipulations in addition to those required by the gradient projection algorithm. T h e second problem experimented with was the determination of the equilibrium composition of a complex mixture of ideal gases at constant temperature and pressure. A solution can be obtained by determining the nonnegative set of mole numbers that minimizes the total Gibbs free energy of the system (White et al., 1958). If there are m species considered present involving k elements, the problem can be formulated as : Minimize the total Gibbs free energy rn

F(x) =

+ log

Xr[Q

i=l

xi/x*]

subject to m

j = 1,. . ., k

and xi

>=

0

i

=

1 , . . ., m

where

rn

1able:l. Minimization of Cubic Function of Five Variables Subject toILinear Constraints Gradient Projection Method Conjugate Gradient Method Total Total No. of No. of No. of functional functional constraints evaluations Step -f(x) I1P*glI evaluations in basis -fb) I P*glI 1 0 - 20.0000 109.5901 0 -20.000 109.5901 1 23.8967 47.0242 2 1 1 23.8967 47 ,0242 2 2 25.1972 14.1838 3 2 25.1972 14,1838 3 6 25.2605 1.1331 3 3 25 ,2605 1.1328 5 2 25.5748 19.6090 7 4 28.5235 19.7660 6 31.4719 7.8097 10 3 5 29.6326 19,9032 7 4 32.1252 4.6574 13 6 32.0615 6,2295 8 16 3 32.2955 1,9821 7 32.1134 4.8099 10 32.348653 0.6468 17 3 8 32.3353 1.3553 11 4 32.348671 0.002279 19 9 32,34879