Exchange of comments on the simplex algorithm culminating in

Steven. Brumby. Anal. Chem. , 1989, 61 (15), pp 1783–1786. DOI: 10.1021/ac00190a038. Publication Date: August 1989. ACS Legacy Archive. Cite this:An...
0 downloads 0 Views 440KB Size
1783

Anal. Chem. 1989, 6 1 , 1783-1786

search under Contract DE-AC05-840R21400 with Martin Marietta Energy Systems, Inc., and the ORNL Director R & D Fund. This research was also supported in part by the appointment of J.M.B. to the Postgraduate Research Training Program under Contract DE-AC05-760R00033 between the

U.S. Department of Energy and Oak Ridge Associated Universities and of D.L.S. to the SCUU Science Semester under Contract DE-AC05-840R21400 between the U.S. Department of Energy and Martin Marietta Energy Systems, Inc.

CORRESPONDENCE Exchange of Comments on the Simplex Algorithm Culminating in Quadratic Convergence and Error Estimation Sir: When they proposed their well-known simplex algorithm for minimizing an arbitrary nonlinear function of n parameters, Nelder and Mead (1) suggested that a suitable way to end their procedure would be to fit the n + 1function values of the final simplex to a quadratic equation. The necessary theory had been derived earlier by Spendley et al. (2). Recently, Phillips and Eyring (3) have reiterated this suggestion, stressing that, for least-squares problems, the quadratic approximation provides a sound basis for estimating the uncertainties in the optimized parameter values. The chief purposes of this contribution are (a) to correct a number of serious errors in the literature; (b) to draw attention to a particularly efficient method of implementing the quadratic approximation that is applicable in dealing with least-squares problems; (c) to stress the value of the quadratic approximation in locating the precise minimum after the simplex algorithm has progressed to a suitable point; and (d) to point out that under certain circumstances the quadratic approximation is unsuccessful, although the simplex algorithm itself very rarely fails. ERRORS I N THE LITERATURE The expression given by Phillips and Eyring (3) in their Table I, for calculating the elements of the matrix Q, is incorrect and should be eq 1. In other words the ith column

Table I. Rules for Calculating the Elements of Vector a and Matrices B and Q a, = YO,

- y1- YO

bt, = 2(Y1 + YO - ~ Y O J 4 1 = 2(Y,] + Yo - Yo1 - YO]) Q,l = P,]- PO]

i = 1, ..., n i = 1, ..., n ifj i = 1, ..., n;j = 1, ..., n

The parameter vector corresponding to the minimum of the quadratic function is given by eq 2.

(2) Pmin = PO- f/2QB-'a If the quadratic approximation is satisfactory, the calculated pminought to lie within the parameter space adequately mapped by the simplex. This requirement is expressed by the inequality (3). The constant 4 in inequality 3 corresponds (B-la)t(B-la)< 4 (3) to a suggestion made in ref 2: The inequality could be made slightly less restrictive by substituting a larger constant, say 16. For a least-squares problem, u2 is calculated as y m h / ( Nn) (where N is the number of observations), and the variance-covariance matrix, e, is calculated by using eq 4. The

(1)

c = u2QB-'Qt (4) uncertainties in the parameters may now be calculated as the square roots of the diagonal elements of e.

of Q is equal to j i - &,. Equations 5-7 in the paper of Phillips and Eyring are also incorrect. To be consistent with their definition of I, the term QB% in eq 5 ought to be multiplied by lI2. In eq 6, the constant should be 4, not lI4. Equation 7 reiterates an error, or possibly a misprint, in ref 1; the multiplier 2 is not required. It may be shown that the matrix (Q-l)tBQ-l is equal to the curvature matrix, a,as defined by Bevington ( 4 ) , not as incorrectly defined in ref 3. It follows on the basis of equations derived in ref 4 that the multiplier 2 is not required. These corrections are incorporated in Table I and eq 2-4 of this paper. Here and subsequently, the symbols used by Nelder and Mead (I) are mainly adhered to. For example, the parameter vector at vertex i is p i = (pil,...,pin),and the corresponding function value is yi. An exception is a, which conforms with the definition of Spendley et al. (2), not that of Nelder and Mead (I). Before a, B, and Q are calculated according to Table I, the indexing is adjusted so that yo is the minimum function value (the other yi need not be sorted), and the function values y& # j ) at the halfway points (Le. midway along the sides of the simplex) are computed.

LEAST-SQUARES PROBLEMS Experimentalists are frequently concerned with the differences between observed and calculated quantities. These differences are sometimes named residuals. In least-squares problems, the aim is to minimize the sum of the squares of the residuals. Equations 2-4 may be used to deal with least-squares problems, but their use is inefficient. For this type of problem, the quadratic approximation implies that the residuals vary linearly with respect to the parameters. It follows that function evaluations at the n + 1 points of the simplex are sufficient to define the quadratic function, and it is unnecessary to calculate the additional n(n + 1)/2 function values at the halfway points. The following procedure is based on theory given by Spendley (5).The simplex procedure may require minor surgery, since it is necessary to retain in memory the N residuals, fJpJ (k = 1, .. ., N), as well as the function value yl = Cf;',l(fk(pi)]2, associated with each of the n + 1vertices of the simplex. On exit from the simplex algorithm, the first step is to adjust the indexing so that yo is the minimum function value. One then calculates elements ML@)(i = 1, . .., n;k = 1, . .., N) using eq 5. The

qji = ei, - e,,

0003-2700/89/0361-1783$01.50/0

0 1989 Amerlcan Chemical Society

1784

ANALYTICAL CHEMISTRY, VOL. 61, NO. 15, AUGUST 1, 1989

(5) elements Mi(k)may be stored in the array previously used for the residuals, ensuring that the f k ( P 0 ) are not overwritten. Next, one constructs an n X n matrix, Q,using eq 6 and an N Q 1. 1, =

k=l

Mi(k)Mj(k)

(6)

n-element column vector, Fo, using eq 7. Like B and a, Q N

(7) and Focontain the coefficients of a quadratic approximation in a convenient coordinate system, but whereas Q is directly comparable to B, Fo is comparable to -1/2a. In terms of Q and Fo, eq 2-4 are replaced by eq 8-10, respectively Equa-

0, 27r0(x1,x2) = arctan ( x z / x l ) ; when x1 0, 2n0(x1,x2)= 7c + arctan ( x 2 / x l ) ;when x1 = 0 and x2 = 0, y = 1OOOO; starting point -l,O.O. Test function 3, ref 6. ( 5 ) y = ~ ~ ~ 1 xstarting , 4 ; point 1,1, . . ., 1. Test function 4, ref 6.

ANALYTICAL CHEMISTRY, VOL. 61, NO. 15, AUGUST 1, 1989

1785

Assign values to certain veriW6,e.g. the maximum number of luncbn evaluabns. N-MAX. and me maximum final variance of fundlon values, REQMN. verUen8. pb, of the initY simplex. and the Wcu!ate correspondingfuncllon values, y) Determine me lnlthl h snd /, Such that yh 16 the maximum and ylthe minlmum functbn value.

4

r

Calculate the cantrold,

j,the reketed vertex,

r r - +~ a ) F - w h

and the amesponding functlon value. .,’

I Wculate the contracted vertex,

Calculate the expanded vertex,

7 ’ = W + ( l - r )P,

and hcorrespondingfunction value,

L” No

I

I

I

I

Determine the new h and 1.

I

Output the new pb yb if required. Determine the current number of fundon evaluations. N-CURR, and the ament variance of function values. CURMIN. ~

------.-_-________-----------------

Yes

Swap pland

end swap ypnd y1. Calculate

and B, using Table I.

approximation has felled.

cB]cuiate a. catcuiate and output pmh (eq 2 ) and the cwresponding ym,,,

(eq 4) and output the uncenaintles In the pcvameten.

Figure 1. Flow diagram showing a slightly modified Nelder-Mead simplex algorithm with an addendum (below the broken line) exploiting eq 2-4.

Anal. Chem. 1089, 61 1786-1787

1786

I

Table 111. Performance of Programs P1, P2, and P3 with Test Function 6 no. of function

program REQMIN spproximmion hw failed.

I \

Yes

I

I

Figure 2. An addendum to the simplex algorithm (above thebroken line in Figure 1) that exploits eq 8 and 10.

Table 11. Performance of Programs P1 and P2 with Five Test Functions no. of

function

program

REQMIN

1

P1

10-13

P2

10-10

P1

10-13 10-10 10-13 10-10 10-13 10-10 10-13 10-10

2

P2 3

P1 P2

4

P1 P2

5

P1 P2

a

function evaluations

function min

60 49 151 142 206 203 238 230 457 349

3.8 X 9.1 x 1043 3.1 X 10% 1.7 X 10% 1.1 x 10-7 a

1.9 x 10-7 5.5 x 10-10 8.2 x 10-7 a

Quadratic approximation failed.

(6) y = CKI(Aj- xl(l - exp(-x2tj))]2;starting point 1,l. This corresponds to the least-squares problem defined in ref 12, where the 18 observed values of A and t are listed. For these test runs the side length of the initial simplex was 1, and the maximum number of function evaluations allowed (N-MAX) was 1000. For the maximum allowed variance of function values, REQMIN, various values were tried: It was found that with program P1 a suitable value was and with programs P2 and P 3 a suitable value was Results with functions 1-5 are reported in Table 11. These functions all have minima of zero. The table shows that whenever the quadratic approximation did not fail, P2 achieved a lower function minimum after a smaller number of function evaluations than P1. Of course, the value of REQMIN could be adjusted so as to achieve a different trade-off between speed and accuracy. Functions 3 and 5 do not approximate to a quadratic form, even in the close vicinitv of the minimum. In both cases the failure of the quadratic approximation was clearly shown by the fact that the inequality (3) was not satisfied. I t seems likely that this kind of situation would be rare in dealing with least-squares problems; if it did arise, one would clearly need to disregard any errors derived via eq 4 or eq 10. Phillips and Eyring (3) did not use the quadratic approximation to help locate the function minimum, but only as a

Sir: In an earlier paper (I),we discussed a method proposed by Nelder and Mead (2) for estimating parameter errors in nonlinear least-squares data analysis using the sequential

P1

1043

P2 P3

10-10 10-10

evaluations

76 67 64

function min

param uncertainties

0.0036029589 0.0036028120 0.0088;0.0104 0.0036028090 0.0083:0.0098

means of estimating errors. With their method, a relatively small value of REQMIN must be used to locate the minimum precisely. As Table I1 shows, the method is less efficient than the kind of strategy used in P2, except when the quadratic approximation fails. Because of the smallness of REQMIN, the variations in yi (at any rate when calculated by using single precision) may be strongly influenced by round-off errors, necessitating enlargement of the simplex before proceeding with the quadratic approximation, as originally suggested by Nelder and Mead (I). Such an enlargement compounds the inefficiency of the method, because of the additional function evaluations needed. Results with function 6 are reported in Table 111. This function has a minimum of 0.0036028045.. . . Clearlv P2 and P3 are superior to P1, both in rate of convergence and in providing estimates of the errors. P2 and P 3 give similar results, but P2, unlike P3, requires n(n + 1)/2 function evaluations at the halfway points. This is not a problem for simple functions such as 6, but for complex functions of many parameters it can be a problem. We regularly use a program similar to P3 to adjust spectral parameters so as to minimize the deviation between calculated and observed electron spin resonance spectra (13). In a typical example the number of parameters might be 13, and a function evaluation might take 40 s, using an IBM PC. In this situation the additional n(n + 1)/2 function evaluations required by P2 would take about 1 h.

LITERATURE CITED (1) Nelder, J. A.; Mead, R. Comput. J. 1965, 8 , 308. (2) Spendley, S.N.; Hext, G. R.; Himsworth, F. R. Technometrics 1962, 4 , 441. (3)Phillips, G. R.; Eyrlng, E. M. Anal. Chem. 1988. 60, 738. (4) Bevington, P. R. beta Reduction and Error Analysis for the Physical Sciences; McGraw-Hill: New York, 1969;eq 8-24, 11-13,8-30. (5) Spendley, W. I n Optimization: Symposium of the Institute of Methematics and its AppNcations, University of Keele, England, 7968; Fletcher, R.. Ed.: Academic: New York, 1969;p 259. (6) O'Neill, R. Appi. Stat. 1971, 2 0 , 338. 1978, 7 , (7) Demlng, s. N.; Parker, I . R. CRC Crit. Rev. A&.

187.

(8) Benyon, P. R. Appi. Stat. 1976, 25, 97. (9) Aberg, E. R: Gustavsson, A. G. T. Anal. Chim. Acta 1982, 744, 39. (IO) Daniels, R. W. Introduction to Numerical Methods and Optimization Techniques; North-Holland New York, 1978. (11) Chambers, J. M.; Ertel, J. E. Appi. Stat. 1974, 23, 250. (12) Deming, S.N.; Morgan, S.L. Anal. Chem. 1973, 4 5 , 278A. (13)Beckwith, A. L. J.; Brumby, S. J. Magn. Res. 1987, 73, 252.

Steven Brumby Research School of Chemistry Australian National University G.P.O. Box 4, ACT 2601, Australia RECEIVEDfor review July 28, 1988. Accepted April 17, 1989.

simplex method. There were several errors in the paper, and a correction has appeared in print (3). In the preceding paper Brumby ( 4 ) discusses a second approach to this problem due

0003-2700/89/0361-1786$01.50/00 1989 American Chemical Society