Transformational Discrimination for Unconstrained Optimization

Transformational Discrimination for Unconstrained Optimization. V. J. Law, and R. H. Fariss. Ind. Eng. Chem. Fundamen. , 1972, 11 (2), pp 154–161...
0 downloads 0 Views 938KB Size
Transformational Discrimination for Unconstrained Optimization V. J. Law Tulane University, N e w Orleans, L a . 70118

R. H. Fariss X o n s a n t o Company, Springfield, Mass.

A method for minimizing a general multivariable function i s developed. The major advantages of the method are its quadratic convergence properties and its ability to solve very difficult problems without modifications. Numerical results are presented which compare the present method with other techniques.

T h e appearance of another algorithm for optimization requires some justification in light of the large number of techniques which have been proposed in the past. Exist'ing methods seem to fall into one of two categories: (1) methods which converge rapidly when they do indeed converge; (2) methods which are notoriously slow, but which usually proceed to a solution if given enough time. Among the methods which fall into the first cat'egory are those of Newton (see Rosenbrock and Storey, 1966)' Davidon (1959), Fletcher and Powell (1963), and Powell (1964). Methods which fall into the second class are those of st,eepest descent. (Booth, 1956) and other gradient methods (Rosenbrock and Storey, 1966). The present method 120: Isesses some of the desirable properties of both classes of methods ment'ioned above. Namely, i t often converges rapidly and it seldom fails to converge. The basis of the technique was inspired by the met'hod of rotational discrimination for nonlinear regression as described by Fariss and Law (1968). A method of somewhat similar nature has been presented by Greenstadt (1967). There are some important differences between the two methods, however, which will be pointed out in the sequel. The proposed method for numerical optimization belongs to a class of iterative met,hods which repeatedly apply bhe following two steps: (1) evaluation of a direction in which to move a m y from a base point in order to improve the objective function; and (2) execution of a one-dimensional search along the direction of movement' in order t'o locate, a t least approximately, the best point on that path. This leads to a new base point a t which step 1 may be repeated. Denoting the base point on 20 and a direction of movement' or search vector as Ax, t8he one-dimensional search is made over t,he scalar parameter cy in the equation z =

20

+ aAz

(1)

The ultimate success or failure of iterative procedures such as the proposed one depends 011 a property which shall be called truncation convergence. For convenience it will be assumed (without loss of generality) that function minimiration is desired. Truncation colivergence is achieved if the search vect,or is such that the quantity

154 Ind. Eng. Chern. Fundarn., Vol. 11, No. 2, 1972

is negative. That is, as one contracts the search vector the existence a t some point of a value of q smaller than the value at the base point is guaranteed. It should be noted that truncation convergence is neither necessary nor sufficient for convergence to a solution; it is, however, a property of practical significance. Development of the Optimization Method

It has often been suggested that a reasonable way to make good use of the favorable aspects of the method of steepest descent and Xewton's method is to use the former when far from the solution and the latter when the vicinity of the solution has been approached. The first reason why this has been found to be only a partially satisfactory approach is that there is no well-founded logic which indicates when the switch from steepest descent to Newton's method should be made. Second, it may well be that some of the variables are near to their values a t the solution while others are not. This would indicate that steepest descent logic should be applied to those variables which are far from their final values and Kewton's method to those which are close. It is not possible to do this because of variable interaction. T h a t is, Sewton's method is not amenable to the consideration of each variable separately. All variables must be considered simultaneously. Therefore, while the method can usually be made to work, it is often inefficient (because of variable interaction) and requires marl-machine interaction to determine when to switch modes of computation of the search vector. The present method does, in fact, combine Newton's method and the method of steepest descent in a unique fashion. The critical aspect of this combination is that new variables defined by a linear transformation may be computed by either steepest descent logic or by Newton logic. This latter step involves a logical discriwzination between variables. It is obvious from this how the name transformational discrimination came to be applied to the method. I n order to indicate the development of the present method clearly, it is necessary to review Newton's method and the method of steepest descent. As a step in the development of relating these two methods, a technique called weighted steepest descent will be described. Steepest Descent. T h e steepest descent search vector is given by i

(2)

AX = -9

T h a t is, the function decreases locally most rapidly along the negat,ive gradient of the objective function. If the optimal length of the search vector is chosen via a one-dimensional search then the method usually exhibits a “zig-zag” or “hemstitching” tendency if the objective function contours are not nearly circular. The method of parallel tangents (Shah, etal., 1964) was devised as a cure for this but has not been found very effect,ive for problems with more than two variables. Perhaps the most desirable feature of the steepest descent method is that the property of truncation convergence is alrvays guaranteed. Newton’s Method. T h e basis of this method is t h e approximation of the objective function as a quadratic in the neighborhood of the base point. The equation for determining the search vector may be shown to be

(3)

H A x = -9

If the Hessian matrix of eq 3 is positive definite, then the resulting Ax has the truncation convergence property. When this is true, this method exhibits quadratic convergence and is highly effective for producing precise answers from base points near the solution. There are two situations in which Kewton’s method becomes ineffective and dangerous. When the Hessian is semidefinite (singular), or nearly so, then eq 3 has no solution or is ill-conditioned. A further difficulty is that the matrix might be indefinite or negative definite in which case the predicted stationary point is either a saddle point or a maximum (rather than the desired minimum). I n the latter cases, t,he truncation convergence property is not guaranteed. Weighted Steepest Descent. Weighted steepest descent (WSD) is a modification of the method of steepest descent. It arises from modifying the element,s of the search vector by tionequal positive multiplicative factors chosen so as to produce a more effective vector. specifically, these factors may be selected so that, under favorable circumstances (ix., when the Hessian is strongly positive definite), the search vector will coincide with the one produced by Sewton’s method, provided the method is applied in a coordinate system where, in relation to the quantity being minimized, there is n o local interaction of variables. The necessity of using such a coordinate system in achieving general coincidence between the Sewtonian and the steepest descent vector has been shown by L ~ (1967). K The coordinate system required may be created by transforming the Hessian matrix into diagonal form. Let T be a noiisingular transformation matrix such that

TTHT = D

(4)

where D is diagonal. Methods for computing such transformation matrices are discussed in Appendix IV. If this transformation is applied to eq 3, there results

TTHTT-’Ax

=

-TTg

(3

By defining the new variables (coordinates) y

=

T-’Ax

(6)

then the Sewtonian equation (3) for the search vector y in terms of transformed coordinates may be written as

Dy

=

-p

(7)

O n the same basis, the steepest descent equation (2) beconies Y

=

-P

(8)

I n order for coincidence to exist between Kewtonian and steepest descent vectors, eq 8 must be modified to y = -kD-’p

(9)

in which kD-’supplies the necessary weighting. The scalar factor k is indeterminate (but positive), since steepest descent is intended to define only the direction of the search vector. The general form of the RSD equations, on a transformed coordinate basis, may then be written as

in which TV is a diagonal matrix of positive elements. Consequently, coincidence between Newtonian and steepest descent vectors can be achieved only if all diagonal elements d i t are nonzero positive, that is, if the Hessian is positive definite. Furthermore, experience has shown that, if reasonable external scaling has been applied, weighting factors which are excessively large to yield a n effective search vector will result from eq 9 when one or more d i t are orders of magnitude smaller than others. (Note: h discussion of external scaling is given in a later section.) It may be concluded from this that the favorable circumstances under which it is feasible and reasonable to weight steepest descent so as to force coincidence with the Kewtonian search vector are confined t’o cases where all d i i are positive and of the same order of magnitude. An important feature of the WSD method is that it can be adapted to deviate from Newton’s method when circumst’aiicesdo not warrant their coincidence. This can be achieved by adapting the calculation of the weight factors. The suggested method for doing this will now be described. First, assume that the diagonal elements, d t i , and the corresponding columns of the transformation matrix, T , are rearranged to achieve descending algebraic order. T h a t is, > ann.If k in eq 9 is chosen as dll, then 21.11

=

1

and ideal weighting (Le., that which causes coincidence with Newton’s method) is given by

wit

=

dii -; i = 1, 2 , . dii

. ., n

An adaptation of the ideal case which has been used successfully is, f o r i = 1, 2, . . ., n

a

I n eq 11, is the maximum weight factor allowed. Thus, if d i t 0 or d i t 5 0; or (c) y i > 6. (5) When the Newton sequence is terminated for reason b or c above, a t the kth y variable, then a switch is made to WSD logic treating yk as the first WSD variable. That is, y r is assigned a weighting factor of 1 and yk, yk+l, . . ., yn are computed by the R S D method as described above. (6) The resulting y vector is converted back into a Ax vector by

Ax = Ty

(19)

and a one-dimensional search is performed along the x vector as given by eq 16. (7) The distance fact’or, y, is updated as described in Appendix 11. (8) Go to step 2 and repeat until convergence is achieved (see Appendix 111). It can be shown that the T D search vector has the property of truncation convergence. A discussion of parameter selection (Le,, p , y, 6, e) is given in Appendix V. A Comparison with Greenstadt’s Method

Greenstadt (1967) has recently proposed a method for unconstrainted optimization which computes a search direction in a space generated by the eigenvectors of the Hessian. This is comparable to taking T in eq 4 to be the matrix of normalized eigenvectors of H . He suggests using Newton’s method in a rotated space with the eigenvalues of the Hessian replaced by their absolute values. T h a t is

This procedure does have truncation convergence provided that no eigenvalues are zero; i.e., the objective function is locally linear in yi, in which case eq 20 cannot be used. More practically, it is quite possible to compute extremely large step sizes if any of the eigenvalues are small in magnitude. For nonquadratic objective functions, the significant advantages of Newton’s method are lost when large steps are taken. This is because Newton’s method uses a quadratic approximation of the objective function which has real utility only in the neighborhood of the base point. The problem of eigenvalues of small magnitude in Greenstadt’s method, while serious, can be partially overcome by setting a threshold below which eigenvalues are considered zero. These may then be replaced by some arbitrary positive num-

ber (perhaps the t’hreshold itself). It is likely, however, that even with such a modification, the method may be inefficient when small eigenvalues are encountered since t’he long search vector which is produced will likely have to be severely truncated by the one-dimensional search. hnother difficulty associated with Greenstadt’s method is a t the opposite pole. If a large negative eigenvalue is encountered, t’his is a n indicat’ion of a severe saddle point situation. The y i computed by eq 20 in this case would produce only a very short move. It would probably take several iterations to progress away from such a saddle point region. The onedimensional search is not likely to extend the search vector greatly since any larger moves produced by positive eigenvalues would predominate the objective function behavior along the search vector. T o reiterate, small eigenvalues produce long moves which niay tend to exceed the range of the quadratic approximation of Kewton’s method aiid large eigenvalues tend to produce very short moves which do not allow rapid exit from a severe saddle point region. Thus, Greenstadt’s method can encounter difficulty when negative eigenvalues of any niagiiitude are encountered. I t is felt that by arsigiiing maximuni weight factors to those traiisformed paramet,ers with negative dii, the transformational discrimination algorithm will on the average be significantly more efficient than Greenrtadt’s method. As a final note on Greenstadt’s method, there is no apparent reason why a rotational (eigenvector) transformation must be used. The authors have previously used such a transformation in the TD method. For problems with many variables this can be a very time consumiiig transformation to compute. N o significant difference has been iiot,iced in the efficiency of the T D method since converting to nonorthogonal transformations. Only in cases where there is an apparent need for using the Hessian as the metric of both the original aiid transformed coordiiiate systems must a n orthogonal transformation be used. It is difficult to imagine such a situation for purposes of numerically solving a n optimization problem. I n the example problems given in the sequel, T D with orthogonal transformation (called rotational discrimination (RD)) is included as a method for comparisoii. It will be noted that RD and TD performed about equally well on all test problems in terms of number of function evaluations.

Computational Aspects

Some of the important computatioiial aspects of applyiiig the proposed algorithm appear in the Appendices. Three particularly pertinent points are discussed in this section. Xamely, these are the problems of external scaling of variables, numerical differentiation, aiid noise or random error in the objective function calculations. External Scaling. External scaling of variables is desirable primarily because of the problem of heterogeneous physical dimensions. Variables which have different physical dimensions will naturally have different ranges of variation because of round-off considerations. Vastly different effects on the objective function lead to very elongated contours of the objective function. The latter problem results in large differences in the diagonal elements of the transformed Hessian. It was previously stated that the logic of discriminating between variables by examining the relations between these diagonal elements depends upon reasonable scaling having been performed. Some aspects of scaling are discussed cogently by Bard (1968).

The most natural way to scale variables is by way of dimensional analysis. T h a t is, some variables may he made dimensionless by choosing a natural scale factor. Ai1 example of this for temperature as the variable would be to set t’

= (t - tL)/(tL

-

tL)

(21)

I n addition to having no physical units, the dimensionless temperature would also, conveniently, vary between 0 and 1. However, it is not always possible or practical to scale the problem as suggested on the basis of dimensional analysis. Another way of arriving a t scaled variables which has proven to be effective and satisfactory is to choose a n increment for numerical differentiation which will cause a change in the objective function in the last three or four digits (as available on the computer to be used). Then choose a n external scale factor which will cause the scaled increment for differentiation to be the same for all variables. T h a t is

e( = 6 y t / c d (22) This technique can usually be carried out easily if the problem solver has some basic knowledge of the problem and if sound engineering judgment is applied. Note t.hat the scaled variables are obtained by dividing the original ones by the scale factor, Le., xi = zt/ei. Numerical Differentiation. It is recommended t h a t t h e numerical derivatives be computed on the basis of the noiiinteracting coordinate system from the previous iteration. T h a t is, the elements of the y vector from the previous iteration should be perturbed rather than those of the z vector. Furthermore, the individual 6yt should be scaled on the basis of t,he diagonalized Hessian of the previous iteration. Experience has shown that this can lead to significantly more accurate derivatives. Suppose that 6yl is chosen to be equal to cd. Internal scale factors for the y variables based on the diagonalized Hessian of the previous iteration can then be used to determine the remaining 6yt which will more appropriately produce proper impact on q during the perturbations. T h a t is

dGay,;

i = 2 , 3 , . . ., n (23) 6yf = where the wit scale factors are determined using eq 11 with dll/dit replaced by the absolute value of that quantity. It is also recommended that first derivatives be obtained by central differences and that second derivat,ives be coniputed by quadratic expansion using iiicrement sizes one order of magnitude larger than those used for first derivatives. This second recommendation requires some explanation. Suppose that the first derivatives have already been obtained. X quadrat,ic expansion of the objective function in the neighborhood of the base point on the basis of y variables from the previous iteration leads to

Here the differentiation increments should be related to those for firqt derivatives as given by eq 23 as 6’y* = 106y, (25) The kth diagonal element of the Hessian may be obtained by setting all other 8’yi to zero. Then, solviiig eq 24 for the requisite second derivative gives

Ind. Eng. Chem. Fundam., Vol. 1 1, No. 2, 1972

157

Table IV. Beale Problem.

Table I I

Vi

Zi

1 2 3 4 5 6 7 8 9 10 11

0.1957 0.1947 0.1735 0.1600 0.0844 0,0627 0.0456 0.0342 0.0323 0.0235 0,0246

4 2 1 0.5 0.25 0.167 0.125 0,100 0.0823 0.0714 0.0625

Method

x1

4

x2

q

x2

2 . 1 8 X lo-" 20 2.9999 0.5000 Davidon 2 . 9 4 X lo-' 20 2.9987 0,4997 RD 8.25 x 18 3.0003 0.5001 TD 1.09 X 20 Greenstadt 3.2700 0.5620 3.0000 0.5000 2.22 x 10-l6 70 Starting Point zo = (0.1, 0 . 1 ) Greenstadt's method was very near the solution after 30 evaluations, but required another 40 evaluations to meet the convergence criteria.

Table II. Rorenbrock Problem. Method

XI

Function calls

Function calls

Davidoii

0.6886 0.4580 1.22 X lo-' 50 0.9992 0.9984 6.27 X lo-' 80 RD 0.9992 0.9984 2.45 X 50 1.0000 1.0000 0.00 54 TD 1,0000 1.0000 2.23 x 50 1,oooo i.oooo 5 . 5 5 x 10-15 53 Greenstadt 0.9637 0.9251 2.59 X 50 1.0000 1,0000 1.35 x 68 Starting Point 2 0 = (-1.2, 1. O ) RD and TD were both significantly better than the other two methods. Greenstadt's method appeared to experience difficulty due to the large difference in magnitude of the eigenvalues.

shown that the expected accuracy for the resulting Hessian is drastically greater than when equal increments are used for first and second derivatives. There is another important reason for determining the Hessian on the basis of y coordinates from the previous iteration. Procedures for diagonalizing the resulting Hessian produce more accurate information because the Hessian on such a basis will itself tend to be dense around the diagonal. Noise. I n cases where the objective function calculation is subjected to random noise, the T D algorithm has shown significant tolerance to the noise. The recommended method for numerical differentiation is the primary reason for this tolerance. One of the example problems given below supports this claim. Objections

Table 111. Cubic Problem. Method

x1

9

X2

0 9421 0 8334 4 16 X 1 0000 1 0000 0 00 RD 0 5477 0 1591 2 07 X lo-' 1 90 x 10-8 1 0000 0 9999 3 61 x 10-1 TD 0 4156 0 0578 5 33 X l o u 4 0 9771 0 9326 3 12 x 10-9 1 0000 1 0000 Greenstadt 0 7957 0 4896 6 19 x 1 0000 1 0000 6 02 x lo-" Starting Point z@= (-1 2 , 1 0) a All methods performed equally well. Da\idon

Function calls

50 75 50 75 50 75 81 50 75

The ( k , I)th element of the Hessian may then be found by simultaneously perturbing yi; and y l while keeping all other 6'yt zero. Thus

There are two objections to the T D method which will no doubt be raised in the minds of many readers. They are a s follows. (1) It is necessary to compute second derivatives. (2) The logic associated TTith computing a search vector is not as straightforward as with presently popular methods. Greenstadt (1967) has estimated the efficiency of his method relative to gradient methods (not including the accelerated methods such as parallel tangents or Davidon's method). His results show that the efficiency is proportional to the ratio of the largest to the smallest eigenvalues. If, for example, this ratio is 104, it would take about lO4as much computation effort per iteration of Greenstadt's method for gradient methods to be faster. Since T D does not require an eigenproblem computation but only a diagonalization of the Hessian, the effort of obtaining second derivatives for problems of even moderate difficulty would seem worthwhile. Any objection to complex logic hardly seems worthy of comment. Hon ever, the authors have heard this complaint in the past. The only arugment in defense of this is that T D has been found to be more reliable on a variety of practical problems than any other method with which the authors are familiar. If reliability has a n y merit then the programming of the more complex logic would certainly seem justifiable. Example Problems

The Hessian on an z basis can then be obtained from

H ( z @ )= T - T H ( Y )T-l @

(28)

The reason for recommeiiding larger differentiation increments for second derivatives is that the resulting Hessian is not subjected to serious round-off problems. Experience has 158 Ind.

Eng. Chem. Fundam., Vol. 11, No. 2, 1972

The first six example problems presented here are the same as those presented for comparison purposes by Kowalik and Osborne (1968). The last one is presented to show the effectiveness of the numerical differentiation procedure presented here in the presence of noise. In particular, the problems are as follows. (i) Rosenbrock Problem q =

lOO(r2

-

2?)2

+ (1 -

21)2

Table V. Box Problem. Method

x2

XI

Table VI. Enzyme Problem. 9

x3

Punction calls

1 000 10 000 1 000 30 RD 1 000 10 000 1 000 24 TD 1 000 9 999 1 000 10-l0 30 Greenstadt 0 615 10 563 1 319 30 0 614 10 178 1 320 10+ 300 Starting Point xo = (0, 10, 20) 0 Greenstadt's method failed completely because of the very small (positive and negative) eigenvalues associated with one of the rotated coordinates. Davidon

q has a minimum of 0 at z = ( 1 , l ) . (ii) Cubic Problem q = lOO(z2

- 213)'

+ (1 -

Method

Table VII. Enzyme Problem. Funcx1

x2

X3

x4

q

tion calls

0.186 0.300 0.121 0.182 10-4 30 0.193 0 , 1 9 1 0.123 0.136 60 12 RD 0.193 0,194 0.125 0.137 TD 0.193 0.194 0.125 0.137 30 Greenstadt 0.193 0.193 0.125 0.137 24 36 0,193 0.194 0.125 0,137 lo-' Starting Point 2, zo = (0.25, 0.39, 0,415, 0 , 3 9 ) Approximately the same results were obtained as with starting point 1 . All methods faired better because of the better starting point. Davidon

where c1 = 1.5, cz = 2.25, c3 minimum of 0 at z = (3,0.5). (iv) Box Problem

-

q

x4

21)2

2 = 1

[(e-zlz.

x.3

0.185 0.092 0.009 0.100 10-4 50 0.193 0,192 0.123 0.136 110 RD 0.193 0.193 0.124 0.137 10-4 25 0.193 0.194 0.125 0.137 27 TD 0.193 0.194 0.125 0,137 28 0.193 0.194 0.125 0.137 10-4 30 Creenstadt 0.193 0.195 0 . 1 2 5 0.137 10-4 41 0.193 0.194 0.125 0,137 66 Starting Point 1, zo = (0, 0, 0 , 0 ) a In this example, TD and RD were clearly superior to the other two methods. Note that RD and TD only required about one-third the number of function evaluations as Davidon's method and about half the number required by Greenstadt's method.

Method

3

XZ

Davidon

This problem has the same solution as the Rosenbrock problem. (iii) Beale Problem

q =

XI

Function calls

=

e-zzz,)

2.625. The function has a

- z3(e-zi - e-10zt)l~

t = l

Q

where z1 = 0.1, zz = 0.1, 23 = 1. The minimum of q is 0 at z (l, lo,')* (v) Enzyme Problem

=

where the values of z t and u t are given in Table I. The minimum is q = 3.10 X 10-4 at 2 = (0.1928, 0.1916, 0.1234, 0.1362). (vi) Watson Problem

where z t = (i - 1)/29. The minimum is q = 2.288 X a t x = (-0.016, 1.012, -0.233, 1.260, -1.513,0.993). (vii) Noise T e s t Problem The Box function mas used here again. Gaussian noise with a mean of zero and several levels of variance was superimposed upon the value of q at each function evaluation. The purpose of this example is to illustrate the superiority of numerically differentiating along noninteracting coordinates (KIC), particularly in the presence of noisy function evaluations. Example Problems, Results and Discussion

The results of the example problems are given for four methods, namely, Davidon's method (Davidon, 1959), TD, RD, and Greenstadt's method. For comparative purposes, the same tactic of estimating the work required as used b y Kowalik and Oshorne (1968) is followed. In particular, the number of function calls exclusive of those for derivatives is used as a comparative guide. Of the four methods compared, Davidon'ci requires only first derivatives whereas the others also use second derivatives.

Computational results are summarized in Tables 11-IX. The comments included in each t,able point out the more important aspects of each example. The important conclusions of the examples are as follows. (1) There is no appreciable difference in the performances of TD and RD. Therefore, the computationally simpler T D is preferred. (2) Considering the extra function evaluations required to obtain second derivat'ives, T D and Davidon's method performed about equally well though TD typically required fewer iterat'ions. (3) T D is clearly superior to Greenstadt's method. (4) Differentiation along noninteractiiig coordinates vastly improves t,he tolerance to a noisy objective function evaluat'ion. Appendix I

(One-Dimensional Search). There are many onedimensional search procedures which may be used. Perhaps the most popular method is the Golden Sect'ion Search (Rilde, 1964). X more sophisticated method is described by Fariss and Law (1968). For many problems, it is sufficient to find a n a in eq 16 which produces a value of q which is smaller (in a numerically significant sense) than the value at the base point. Such a crude search is not recommended as a general rule, however. Appendix II

(Updating 7).I n order to choose a value of y, t h e steepest descent distance factor, it is espedieut to use the experience of the one-dimensional search. One means of accomplishing this effectively is to use a n empirical update formula somewhat like the one shown in eq 29. Ind. Eng. Chem. Fundam., Vol. 1 1, No. 2, 1972

159

Table VIII. Watson Problem. Method

x.3

X2

XI

x4

x5

x6

-0 232 1 262 -1 51 0 992 -0 233 1 260 -1 51 0 993 -0 233 1 260 -1 51 0 993 -0 233 1 260 Greenstadt -1 51 0 993 Starting Point xo = (0, 0, 0, 0, 0, 0 ) a Results are similar to those with the enzyme problem. TD and RD reached the solution in about one-half tion evaluations required by the other methods. -0 -0 -0 -0

Davidon

RD TD

016 016 016 016

1 012 1 012 1 012 1 012

~

Table IX. Noise Problem. Minimum determined Std dev of noise

0 0 0 0

NIC

Standard procedure

procedure

00000

10-10 10-10 00020 10-8 10-9 00040 101 10-9 00060 10’ 10-9 0 00100 10-1 10-10 0 00120 101 101 0 00140 10’ 101 0 00160 101 10‘ 0 00180 10’ 10’ a When the noise level reached 0.001, the magnitude of the noise was of the same order of magnitude of the perturbations. Therefore, above this noise level, no significant results could be expected except for random successes. IL’ote, however, the effectiveness of the NIC method in reaching the minimum until the noise became intolerable. This is in contrast to the standard procedure which was overwhelmed by a noise level as low as 0.004. That is, the NIC method was able to handle noise levels of more than twice the magnitude of that handled by the standard method.

++l)

y(t)

-

{

ru a. rL

if a0 2 ru if r L 5 a0 if a,, 5 r L

5

TU

(29)

This formula is purely empirical and merely attempts to produce a y to be used for the new iteration based on past experience while not allowing large changes from one iteration to the next. The previous y is multiplied by a factor between r L and r L . Typically practical values for these parameters would be r L = l / 4 , ru = 4.For well-scaled problems a n iuitial value for y of 0.2 is recommended. Appendix 111

(Convergence Criteria). There are several possible criteria for convergence which may be used with the proposed algorithm. The most precise criterion is that the magnitude of the gradient must become quite small. Two others which should, perhaps, be included also are the following. (1) The change in q between iterations becomes quite small for several iterations. ( 2 ) Both Az and a0Az contain very small elements. These latter two criteria should be considered as “frustration” finishes in that problems with well-defined optima should show convergence of the more precise variety.

Function calls

10-3 10-3 10-3 10-3

65 31 25 58

the number of func-

a n informative discussion of the mathematical aspect,s of such operations. The sequence of elementary operations which diagonalize H then forms the composite transformation matrix T . It is recommended that when performing such operations care be taken to use the largest possible elements as pivots (Le., those elements used to annihilate a row and column). This recommendation is related to preserving numerical accuracy while simultaneously producing a ycoordinate system which is not severely “sheared.” T h a t is, y will be nearly a n orthogonal coordinate system relative to the z coordinate system. Space does not permit a complet,e explanation of the subst’ance of this recommendation. Appendix V

(Parameter Selection). (a) Selection of p. T h e choice of 8, the maximum allowable factor in eq 11, is not critical. T h a t is, the sensitivity of R S D to 8 is relatively slight. A value of about IO4 has been used sat,isfactorily in almost all applications of T D . (b) Selection of y. If t8hevariables have been scaled in any reasonable way their expected value should be about unity, at least within a few orders of magnitude. A change of 20-50% would therefore be considered relatively large. Therefore, if y is chosen t’obe 0.2, the length of blie SD search vector will limit each individual variable increment to be less than 209’,, of unity. The actual initial choice of y is not too critical since it is updated a t each iteration. (c) Selection of 6. T h e steepest descent threshold factor, 6, should be chosen with the philosophy that if some y i as calculated by Newt,on’s met’hod is greater than 6, then a switch should be made to K S D logic. *\gain, if reasonable external scaling has been applied, a value of about 0.2 for 6 has been found to be satisfactory. (d) Selection of e. The parameter E is used t o distinguish between small dit and those which should be treated as zero. The choice of e depends upon the number of significant digits carried in the arithmetic. For digital computers which carry d digits, it is recommended that e be chosen as e = Nomenclature

cd

D

d dit el g gz

Appendix IV

H k

(Diagonalization of t h e Hessian). Since t h e Hessian is symmetric, it can always be diagonalized by a sequence of elementary row and column operations. Perlis (1966) gives

p p, q

160 Ind. Eng. Chem. Fundam., Vol. 1 1, No. 2, 1972

9

smallest scaled increment for differentiation (0.001 is recommended) = diagonalized Hessian matrix = number of digits carried in arithmetic operations = elements of D = external scale factor for variable i = gradient of the objective function = elements of g = Hessian or second derivative matrix of q = scalar coefficient = number of variables = transformed gradient = TTg = elements of p = objective function to be minimized

=

= = = = = = = = = = = =

= = = =

= = = =

lower limit in eq 29 upper limit in eq 29 transformation matrix such t h a t TTHT = D temperature lower temperature limit upper temperature limit scaled, dimensionless temperature diagonal matrix of weighting factors elements of T t ’ n-dimensional vector of variables elements of z x a t a base point candidate optimum x scaled x vector search vector elements of Ax transformed variables elements of y y vector a t the end of the previous iteration elements of ,yo

G R E K KLETTERS = scalar parameter in one-dimensional search cy0 = optimal value of a along the search vector p = maximum steepest descent weight factor relative to unit’y y = steepest descent distance factor 6 = steepest descent threshold factor 6yi = perturbation in variable i for first derivative calculations 6‘yi = perturbation in variable i for second derivative calculations 6lky = n-rector with the kth element containing 6’yk and all others zero 6’k,2y = n-vector with the kth element = 6‘yk, the Ith element = 6’yl, and all other elements = zero LY

=

e

discrimination factor for distinguishing small eigenvalues

SUPERSCRIPTS T = vector or matrix transposition operator ( i ) = refers to iteration i Literature Cited

Bard, Y., Malh. Comp. 22,665 (1968). Booth, A. D., “Numerical Methods,” Butterworths, London, 14.66 -V1l.

Davidon, W. C., Atontic Energy Commission R. and D . Report, ANL-6990 (Rev.) (1959). Fariss, R. H., Law, V. J., Prepared Lecture Kotes, Chemical Engineering Department, Tuhne University, Ken- Orleans, La., 1968. Fletcher, R., Powell, AI. J. D., Computer J . 6 , 163 (1963). Greenstadt, J., Math. Computation 21, 360 (1967). Kowalik, J., Osborne, 11. R., “Methods for Gnconstrained Optimization Problems,” Elsevier, Sen- York, Tu’. Y., 1968. Law, V. J., Prepared Lecture Notes, Chemical Engineering Department, Tulane University, Kew Orleans, La., 1967. Perlis, S., “Introduction to Algebra,” Blaisdell, Waltham, Mass., 1966. Powell, 11.J. D., ComputerJ. 7 , 155 (1964). Rosenbrock, H . H., Storey, C., “Computational Techniques for Chemical Engineers,” Pergamon Press, New York, S . Y., 1966. Shah, B. V., Buehler, R. J., Kempthorne, O., J . S I A M 12, 74 (1964). Wilde, 1). J . , “Optimum Seeking Llethods,” Prentice-Hall, Englewood Cliffs, Ii.J., 1964. RECEIVED for review June 20, 1968 RESUBMITTED July 15, 1971 ACCEPTED November 27, 1971

Diffusion in Dilute Polymeric Solutions Herman R. Osmers*’ and Arthur B. Metzner University of Delaware, X e w a r k , Del.

The mutual diffusivity of three liquid solutes in two dilute aqueous polymeric solutions was measured as a function of polymer concentration with an optical wedge. The experimental data obtained yielded solute diffusivities which never deviated from that in the pure solvent b y more than 20% and were always smaller. This i s in contrast to expectations raised in the prior literature that mass transfer of solutes in solvents might increase significantly on addition of small amounts of polymer. The data are correlated by a thermodynamic model in which the easily measured excess volume of mixing polymer and solvent i s the principal parameter. Data for the concentration dependence of the diffusivity of polyacrylonitrile in dimethylformamide are presented. These suggest the need for inclusion of additional molecular parameters in polymer diffusion theories and development of a phenomenological theory.

Recent’ly, a number of investigators have examined the problem of diffusion of solutes through solutions consisting of a polymer dissolved in a suitable solvent. An examination of the data reported by these authors (Biancheria and Kegeles, 1957; Boss, et al., 1967; Hoshino and Sato, 1967; Li, 1967; Li and Gainer, 1968; McCabe, 1967; Metzner, 1965; Quinn and Blair, 1967) clearly indicat’esthat, for those syst’ems in which 1

To whom correspondence should be sent at the Department

of Chemical Engineering, University of Rochester, Rochester,

K.Y . 14627.

the diffusion coefficient decreases as polymer is added to the solvent, the decrement in the solute diffusivity is a t least an order of magnitude less than the increment in the solution viscosity, which is kiioan t,o increase drastically on addition of polymer. This implies that the macroscopic uiscosity is not a suit,able iiidex of resistance t o molecular motion, but rather that there are regions wit,hiii the solution in which the transport resistance must be nearly equal to that’ in the pure solvent. Even more striking are those data which show an increase in the solute diffusivity as the polymer concentrat,ioii Ind. Eng. Chem. Fundam., Vol. 11, No. 2, 1972

161