Simplex techniques for nonlinear optimization - ACS Publications

made with simplex-based techniques currently In use In analytical chemistry. Suggestions for direct comparisons of nonlinear optimization techniques o...
0 downloads 0 Views 1MB Size
1480

Anal. Chem. 1980, 52, 1460-1467

Simplex Techniques for Nonlinear Optimization P. Barry Ryan,"' Richard L. Barr,* and H. David Todd" Hall-Atwater Laboratories of Chemistry, Wesleyan University, Middleto wn, Connecticut 06457

A new, nonllnear optlmizatlon lechnlque based on the modifled slmplex method of Nekler and Mead has been developed which shows promise as a method for accurately flndlng the region of an optlmum. Statlstlcal comparlsons of two forms of the new algorithm were made wlth slmplex-based techniques currently in use in analytical chemlstry. Suggestlons for dlrecl comparlsons of nonlinear optimlratlon technlques on a quantitative level are made and Implemented. The utlllty of the new technique Is demonstratedfor several experimentally relevant functions.

The process of optimizing an unknown function is common to many areas of chemistry (1-9, 13-22). In experimental chemistry, typically several preliminary experiments are done to determine which variables or factors are important in a procedure. After the important factors are determined, optimization is often effected by varying factors one-at-a-time, while keeping the others fixed, until an optimum is reached. This procedure can be very slowly convergent and, in fact, need not converge at all (10) since the variables are usually coupled (not fully independent). Simultaneous optimization of the variables avoids this problem. In this work, we discuss recent developments in simplex methods for optimization of multivariable functions in which the variables are simultaneously optimized.

T H E SIMPLEX METHOD The simplex method, in its present form, was first suggested by Spendley, Hext, and Himsworth (11) in 1962. This method, referred to as Basic Simplex Method (BSM),is well described by Spendley et al. and will not be detailed here. Nelder and Mead (12) modified BSM to allow the procedure to adapt to the functional or response surface much more readily than the earlier method. This method will be referred to as Modified Simplex Method (MSM). MSM has been described in detail elsewhere (12--15),but the description will be repeated here as the initial step in the development of our algorithms. A simplex is a geometric figure in n dimensions consisting of n + 1vertices not all in the same n - 1 dimensions. Each dimension corresponds to a variable or factor in the optimization procedure. Thus, a two-dimensional simplex is seen to be a triangle, a three-dimensional simplex is seen to be a tetrahedron, etc. The optimization procedure begins by the choice of these n + 1 points and evaluation of the response a t each. After evaluation of the response at each point, the vertex which gives the least desirable response, W, is discarded and a new vertex is chosen according to some algorithm. In BSM, the point chosen for the new vertex is the reflection of W through the centroid of the opposite hyperface. Nelder and Mead's modification allows the simplex to expand and contract to conform to the topography of the response surface. In Figure 1,suppose that W gives the worst response, B gives the best response and M gives an intermediate response. C is the centroid of the hyperface opposite W in this two-diPresent address: Atmospheric and Environmental Research Inc., 872 Massachusetts Avenue, Cambridge, Mass. 02139. *Present address: Department of English, University of Virginia, Charlottesville, Va. 22903. 0003-2700/80/0352-1460$01 OO/O

mensional example. In Nelder and Mead's modification, the new point, N, is given by

N

= (1

+ a) C-CUW

Initially the point is chosen with a = 1.0. If the response at N is more favorable than a t any previous point, the search continues by choosing another point, NE, with a greater than 1.0 (usually 2.0) in Equation 1. This is called expansion. When the response a t N E is still better than the points previously determined, the point is kept to form a new simplex and the entire procedure is repeated. If the response at NE is worse than the responses at B and M the point is discarded. In this case, the initial reflection point N is used to form the new simplex. This situation is called a failed expansion. On the other hand, when the initial reflection produces a point whose response is less desirable than B or M, a contraction is in order. A contraction may be either positive or negative, the designation stemming from the sign of a in Equation 1. A positive contraction, 0 Ia I1 is done when the reflection point, N gives a more favorable response than W. A negative contraction, -1 5 a 5 0 is done when the response a t N is less favorable than that at W. Usually the contractions are done with a = *1/2. In MSM, contractions seldom fail to give a more favorable response. When these "failed contractions" occur, Nelder and Mead suggest shrinking the simplex about B by producing a new simplex containing B and new vertices halfway between B and each vertex of the old simplex. In summary, for MSM the possible positions of new points for the two-dimensional simplex given in Figure 1 are: (1)N, for normal reflection; (2) NE, for reflection followed by expansion; (3) PC, for reflection followed by positive contraction; (4) NC, for reflection followed by negative contraction. A number of procedures have been suggested for convergence criteria (12-15). The technique found most satisfactory to the present authors is due to Dewar and Student (16). The first criterion is that the variance in the response among the n + 1vertices be below some preset value. After this condition is met, the response a t the centroid of the entire simplex is calculated and the variance from this point is used as the final convergence criterion. In practice, this method has been found to produce accurate minima with an insignificant increase in the number of functional evaluations. The method of Nelder and Mead has seen a number of applications in analytical chemistry ( 1 7 ) . As early as 1968, Ernst (18) used MSM to optimize gradient and curvature settings to control magnetic field inhomogeneity in a nuclear magnetic resonance experiment. Deming and Morgan (13) applied this technique to a nonlinear least squares fitting procedure using a least-squares response surface from an equation of the type:

where A , and k are parameters to be determined in the least-squares sense from known values of A and t. Morgan and Deming (14) also report a more detailed analysis of MSM as applied to colorimetric determination of cholesterol in blood serum as the multifactor system to be optimized. The response was taken to be a combination of color intensity and stability of the sample. Their variable factors were solvent, C 1980 American Chemical Society

ANALYTICAL CHEMISTRY, VOL. 52, NO. 9, AUGUST 1980

Flgure 1. Possible new points for MSM

Figure 2. Comparison of search directions for MSM and WCM

dehydrating agent, color reagent, and color development time. After 26 experiments, the optimum region was reached which gave 96% of the total desired response, which was an improvement of about 17% over the average of the responses from the initial simplex. Their work showed that MSM could provide a reliable method for improvement in experimental conditions. Recently, a number of workers have applied MSM to problems in pattern recognition (1S21). This application can require the optimization of several hundred variables, which makes other techniques unworkable. The success of MSM in these procedures indicates the power and simplicity of the procedure. Routh, Swartz, and Denton (15)have recently developed a further modification of the simplex method which they call the Super-Modified Simplex Method, (SMS). The choice of the new point is determined in the following manner: after the point giving the worst response, W, is determined, the response is determined a t the point N which is the reflection of W through the opposite hyperface. The response a t the centroid of the opposite hyperface, C, is also determined. A parabola is then passed through these three points and the optimum value of the reflection coefficient a , is determined. T h e new point for the simplex is determined by using the optimum a in Equation 1. Routh, Swartz, and Denton report significant increases in efficiency, utility, and reliability of SMS vs. MSM. Wilkins and Kaberline (22) are currently applying the SMS algorithm to problems in pattern recognition.

T H E WEIGHTED CENTROID METHOD Description of t h e Method. BSM and MSM are examples of directed search methods (10) which use past observations to predict the most promising direction for continuing the search. Simplex procedures search the direction opposite the point giving the least favorable response, thereby crudely approximating steepest descent procedures ( I 1, 23, 2 4 ) . Simplex methods attempt to define and search approximate gradient directions in the region of the simplex. If one could improve the simplex method approximation to the gradient, while retaining its relative simplicity, one would have a useful tool for optimizing functions for which explicit derivatives are difficult to obtain or are unknown. The Weighted Centroid Method, (WCM), attempts to fulfill the above criteria. Consider Figure 2. The points are labeled as Figure 1. If the function is a t least fairly well-behaved, the true gradient direction must lie closer to the direction defined by W-B than to that defined by W-M. One may wish to “weight” this more favorable direction. We define the weighted centroid C, of the hyperface opposite W by:

E (R(Pi) - R(W)JPi l#U

n+l

Et

‘t

Figure 3. Progress of MSM on a two-dimensional paraboloid

included in either summation. WCM uses the point C, in place of C in Equation 1. Inspection of Equation 3 shows that points giving a larger change in response from that at W will be weighted more heavily in determining the weighted centroid. The summation in Equation 3 is normalized so that the weighted centroid point may never lie outside the closed hypersurface determined by the n best points. This restriction forces the search to continue in a direction which is bounded by points with known responses. Relaxing this restriction would allow searches in directions not passing through the hyperface opposite the point giving the least favorable response. This would permit reflection into a region in which insufficient information has been obtained to predict a new search direction and would imply an unwarranted assumption of good functional behavior in unknown regions of the response surface. WCM now proceeds in a manner entirely analogous to MSM. The simplex expands and contracts as warranted by the topography of the response surface until convergence criteria are met as described above. Figures 3 and 4 show the progress of MSM and WCM respectively on the two-dimensional paraboloid: R(X,Y)= YL (4) using the same starting simplex. Comparison of Figures 3 and 4 reveals that WCM follows the gradient direction (which points toward the origin) more closely than does MSM. Furthermore, MSM produces six new simplices at an average of two functional evaluations per simplex while WCM produces only four new simplices to reach the same distance from

x2 +

n+l

CW =

1461

(3)

E M P , ) - R(W)I 1fU

where R(PJ represents the response at the point Pi and R(W) represents the response at W. Notice that the point W is not

1462

ANALYTICAL CHEMISTRY, VOL. 52, NO. 9, AUGUST 1980

Development of Methods Which Use t h e WCM Algorithm. WCM successfully predicts gradient directions but tends to become degenerate in following them. We present here two solutions to the degeneracy problem. These are denoted controlled weighted centroid method (CWC) and orthogonal jump weighted centroid method (OJWC). Both names are suggestive of the nature of the algorithm used. One possible solution to the degeneracy problem suggests favoring, but not closely following, the approximate gradient direction as determined by the simplex. This algorithm proceeds by determining a parameter, y, which is a measure of how strongly the weighted centroid direction differs from the normal reflection direction as implemented in MSM. We define: IIC, - CII

(5)

Y =

5

+

3 c

Figure 4.

Progress of WCM on a two-dimensional paraboloid

Figure 5. Limitation of search direction for a two-dimensionalsimplex approaching degeneracy

the true minimum a t (O.,O.). The slight increase in computation time needed to calculate the weighted centroid is more than compensated for by the reduction in number of functional evaluations, which we assume to be the most timeconsuming aspect of the optimization. However, note that the WCM simplex becomes much more distorted as the optimization continues. This distortion leads to a number of problems. Problems Associated w i t h WCM. The fact that WCM more closely approximates the gradient direction is apparent from Figure 4. However, the distortion which is also apparent leads to a significant drawback in the method. As the distortion of the simplex increases, the ability to search directions perpendicular to the previous search direction is diminished. For a two-dimensional case (see Figure 5), the angle CBWC can become quite small and limits the search to the area enclosed by the dotted lines. If the gradient changes direction, the simplex cannot easily respond and may become “stranded” or unable to continue. In this case, an n-dimensional simplex will,in general, lose the ability to search a particular direction and become what we will call a “degenerate simplex” which will be contained in an n - 1 dimensional space. We have found that for two-dimensional cases, simplices rapidly become linear, which results in undirectional “optimization” of the response surface. Owing to the adaptive nature of the modifications developed by Nelder and Mead, WCM is usually able to contract repeatedly until the simplex is of small enough size that other directions may be searched in the usual manner. This repeated contraction before continuing the search results in a large number of wasted functional evaluations. Occasionally, the simplex becomes so nearly degenerate that the convergence criteria are met before the simplex can change direction. This inopportune “convergence” will be referred to as convergence to a false minimum.

IIB - CII That is, y is the ratio of the length of the segment from the normal centroid to the weighted centroid, with the length of the segment from the normal centroid to the point giving the most favorable response, B. The parameter can take on values from zero to one; zero indicates that no point on the hyperface is favored over any other point and one indicates that B is infinitely favored over all other points. We now define a controlled, weighted centroid, C,: C,=C, i f O l y l x (64 C, = (1 - x ) C xB if y > x (6b) where 0 Ix I1. We can therefore control the effect of the weighted centroid by the choice of x . In the actual controlled, weighted procedure CWC, x is fixed externally (we have found x = 0.3 to be a good compromise value) so that the algorithm proceeds with x as a constant. Thus, CWC tries to take advantage of the benefits of WCM while diminishing the probability of degeneracy. The second method of dealing with the degeneracy problem stems from the observation that the simplex approach to degeneracy occurs gradually. We have found that the simplex becomes more and more distorted from iteration to iteration before i t loses the ability to search a given direction. The degeneracy problem could be overcome if one were able to predict that a given simplex configuration will produce a degenerate simplex. One method of ascertaining the relative degeneracy of the simplex is to determine the linear dependence of the vectors from the worst point to each of the others. There are n such vectors which must span the n-dimensional space of the optimization. If these vectors are linearly dependent, a t least one dimension is not included in the search. One method of measuring the linear dependence of n vectors in an n-dimensional space is to calculate the determinant of those vectors. Linear dependence implies that the determinant is identically zero. A small magnitude for this number indicates near, but not complete, degeneracy. We have chosen this criterion for ascertaining approaches to degeneracy. A t each k iterations, the magnitude of the determinant of the n vectors using W as the origin is calculated. If this value falls below some preset value, the simplex in deemed degenerate and the following algorithm is used to remove the degeneracy. The equation of the hypersurface determined by the simplex points, excluding W, is ascertained (26). From this equation, the vector normal to the hypersurface is easily generated. We then choose a point, P , say the centroid, on the hypersurface. From this point, we choose two other points: 1 P+ = P - RAVG * U (74

+

+

& 1

P- = P - - RAVG

6

*U

(7b)

ANALYTICAL CHEMISTRY, VOL. 52, NO. 9, AUGUST 1980

1463

Table I. Comparison of Modified Simplex Method (MSM), Super-Modified Simplex Method (SMS), Controlled-Weighted Centroid Method (CWC), and Orthogonal Jump Weighted Centroid Method (OJWC) for a Two-Dimensional ParaboloidaVb M SM

avg no.

a

ISS

eval

0.05 0.10 0.50

113

1.00

88

2.00 5.00 10.00 20.00

83 78 76 76

RTOL =

106 94

std non-conv/ dev, crit. 90 non-conv 9 9 9 9 9 8 5 5

010 010 010 010 010 010 010 010

Region -1000