Nonlinear least squares methods: A direct grid search approach

The purpose of this paper is to recommend a simple iterative process to solve minimization problems in chemistry when the number of parameters is smal...
0 downloads 0 Views 3MB Size
Nonlinear least squares Methods

J. G. Becsey,' Laszlo Berke2 and James R. Callanl US. Air Force wright-patterson AFB, Ohio 45433

A direct grid search approach

The functional relations which describe the behavior of an experimental response y and a set of inputs x in physical chemistry are quite often nonlinear, in the sense that the parameters occur in a nonlinear fashion. A number of examples are listed in Table 1. Some experimentalists are tempted to linearize these relations and attempt to obtain parameter estimates by linear least squares methods, a temptation which can lead to gross inaccuracies. One can see that even in the common example of the Arrhenius relation y = A exp (ax)the linearized and unweighted In ( y ) = ax In ( A ) will not produce the best fit in the least squares sense in order to estimate the parameters a and A (1). Quite often the problem cannot be formulated in a linear form, as in the case of relations 1 , 3 , 4 , 6, and 7 of Table 1. I n these cases the problem is to find the minimum of S(A,a),where S(A,a) is the sum of the squared deviations surface as a function of the parameters. Thus the parameters which produce the minimum on the S(A,a) surface will result in the best fit in the least squares sense, a problem which in most cases is not easily solved numerically. The purpose here is to recommend a simple iterative process to solve minimization problems in chemistry when the number of parameters is small as is the case in most empirical laws. There is a sizable body of powerful algorithms available for finding unconstrained local minima of functions of several variables (2, 3). Among these methods Fletcher and Powell's elegant method appears to be one of the most efficient minimization techniques (4). A disadvantage of this method is that it requires analytical expressions for the gradients of the parameters of the function to be minimized. I n many applications these expressions might be cumbersome to obtain. Recently Powell (5, 6) and Stewart (7) modified the method so as to generate finite difference gradients during the iterative minimum search and removed this limitation. The method of Marquardt (8) is particularly suited to least squares applications as well as the modified NewtonRaphson method of Strand, et al. (9). Papers published

+

Table 1 .

1 Y Y

2

= =

/

+

3

Y = A ,exp ( s & )

4

Y=A+-erf

5 6

y Y

Many examples of nonlinear curve fitting in physical chemistry are similar to relation 1, Table 1. I n these 'Aerospace Research Laboratories, Office of Aerospace Research, WPAFB, Ohio. 'Flight Dynamics Laboratory, WPAFB, Ohio.

Arrhenius-type relations Lambert-Beer's hw, kinetics, diffusion, sedimentation Vapor pressnre, viscosity, Arrhenios-type relation with constant temperature error

c z v ' m

~z.+~~..+con.,lt...

=

As' exp (bz

=

The Direct Grid Search Method

T v ~ i c a lNonlinear Functional Relations in Phvsicol Chemirtrv

+ czS . . ) + B

=

7 f(t)

728

+

A ,exp (az bzs exp (az b )

in THIS JOURNAL also discuss numerical (10) and graphical (11) methods to minimize the sum of the squared deviations of nonlinear curve fitting problems. Unfortunately each comes with various disadvantages. I n most cases it is assumed that the neighborhood of the minimum can be approximated by a quadratic Taylor expansion and hence the ultimate convergence is highly dependent upon the initial estimates for the parameters. However, in least squares problems when the experimentalist has little or no idea about the size of the parameters, this condition may be hard to satisfy. I n addition, they are sensitive to varying degree of ill-conditioning and interaction between parameters. The method proposed here can be recommended because: it provided results to problems which had resisted solution by more elegant analytical methods; it provided otherwise unavailable starting values when a gradient method was used; the time involved was so small that more elegant methods were not justified; it is virtually unaffected by scale conditions or interaction between parameters; and in some instances it provided better fits than other methods. On the other hand, it is recognized that when no conditioning problems arise, and the number of parameters is large, the more sophisticated methods are clearly superior in computational efficiency. For least squares purposes, this method becomes too time consuming for more than four variables. However, this restriction is seldom exceeded in physical chemistry. The method is easy to adapt to high speed electronic computers since it uses repeated arithmetic operations and it provides an approximate solution a t each stage of these calculations.

+ m 2+ . . .)

.I-:h(tf)g(t - t')dt'

Journal of Chemical Education

Diffusion with sero-time correction Transport phenomena, thermodynamics Viscosity Convolution integrals; deeonvalut~ianof spectra, luminescence lifetime. instrumental resnonse. eta.

relations some parameters occur in a linear form (A, B) while a, b, e, and d occur in a nonlinear fashion. If one separates the linear parameters from the nonlinear parameters, the problem can be simplified to some extent. The relation ji = A.f(a,h,.. . ;zi) + R where are the computed values, A and B are the linear parameters, f(a,b,. . . ;xi) is an expression containing the nonlinear parameters and the input values XI, can he cast conveniently in a matrix notation as where ?is t h e n X 1vector of the computed values, Cis the 2 X 1vector of the linear parameters, and X is the n X 2 matrix whose i-th row is [f(a,h,.. . ;zi),ll The sum of the squared deviations can he written in a quadratic form (assuming that the observed values y, have the same variance) as where Y is the n X 1 vector of the observed values y,, [ Y - XC] is t h e n X 1vector of the deviations between the observed values Y and the computed values XC and [Y - XCI1isthe transpose of the deviation vector. By taking the first variation of S(a,b,. . .)with respect to C or C1 and setting it equal to zero one obtains the normal equations for the linear parameters X'XC

=

X'Y

and hence

c=

(x'x)-'X'Y

(2)

(X'X being singular only in the trivial case). Unfortunately this expression permits the calculation of the linear parameters only when X is known, and the nonlinear parameters cannot be calculated analytically. Therefore, the strategy will be to assume various values for the nonlinear parameters and to use eqn. (2) to determine the linear parameters and by an iterative process, to minimize eqn. .(I). The direct search suggested here utilizes a "stretching" feature. The discussion will relate to four nonlinear parameters, for definiteness. The minimization of eqn. (1) with respect to the nonlinear parameters a, b, c, and d is accomplished by a four dimensional direct grid search. Using a set of initial estimates Q, bo, eo, do as the center and a given Table 2.

Test function (a) Navy Chapeau (b) Rosenbrock's Valley (3) ( e ) Fletcher-Powell

. (dl Function of Four

initial stepsize, Aa, Ab, Ac, Ad, a four dimensional grid is constructed with grid points ao kAa, bo + kAb, (k = 0, 1, . . ., m), etc. The nonlinear parameters take the values associated with the grid points, while the linear parameters are calculated analytically in accordance with eqn. (2). Then the value of S(a,b,. . .) is calculated a t each grid point in accordance with eqn. (1). The next step is the determination of the least value of the S(a,b,c,d) grid values. There are three possibilities. The least S(a,b,e,d) value, Si, may occur at the center point in the grid, a t an interior point other than the center, or at a boundary point of the grid. I n the first case, the overall size of the grid is reduced by reducing the grid step size and the process is repeated. I n the second case, the size of the grid is retained by keeping the grid stepsize constant, but the grid is shifted so that the center takes the values of the grid components of Si and the process is repeated. If Stis a boundary point, the boundary components of grid value 8,; at, b,, el, d* are noted and the process continues as in case two. If S,+,is found to have the same set of boundary components, the grid is shifted and it is stretched in this direction by increasing the correeponding grid step size. If any of the boundary components are central values, the grid is not stretched in this direction. The more often St falls on the same boundary, the more the grid stretches. I n this manner the grid stretches and shifts in the direction of the line connecting two consecutive minimum points on the boundary. By this iterative procedure, the grid soon arrives a t a position, and without much oscillation, where the minimum lies within the grid, and even if the initial guesses were quite distant from the best values or if the initial stepsizes were too small. Once the true minimum is inside the grid, the grid shrinks rapidly. The process is terminated by a limit on the computation time, the minimum stepsize, the value of S(a,b,c,d) or other criteria set up by the experimentalist. I n our program we used a grid consisting of seven equidistant steps in each of four directions. The program is also constructed in a manner that when one of the parameters is inactive, (because the stepsize becomes smaller than the predetermined limit) the fourparameter grid search is reduced to a three, two, or a one parameter search.

*

Numerical Results

To test the direct grid search method in finding the minimum of a function, Table 2 presents the minimiza-

Test o f the Direct Grid Search Method to Find Local Minima of Functions of Several Variables

Ana!yticd mmmum

Stsrtine values

Number of Approx. itera- time, tions see.

,?I

Variables (4)

~

~~

~

f(0,O) = -54.58 f(1,l) = 0

f(5,5) = 1 . 3 X 10' f(3,3) = 3 . 6 x loS

42 162

1 3

f(1,0,0) = 0

f(-2.5,1.8,1.2) = 2 . 6 X 10'

86

10

f(1,0,0) = 7 . 6 X 10-l8

f(O,O,O,O) = 0

f(3,-1,0,1)

31

16

f(1.4X10-3,-1.6X10-4,3.2X lWS, 3 . 2 X 10-9 = 1 . 3 X lo-*

T l ~ l l ~ .0 r 1 -..-J

.~

Comnuted minimum

=

215

f(2.9 X lo-', 1 . 4 X lo-') = -54.8 f(1,l) = 0

Volume 45, Number 1 1, November 1968

/

729

tion of three test functions which frequently appear in the quoted literature and a fourth which could not be minimized using the Fletcher-Powell method and some other methods when the initial estimates were some distance from the actual minimum. The computation was carried out on an IBM 7094 computer. As an example of its use in least squares applications, the direct grid search was applied to the problem of fitting the viscosity of water (12) as a function of temperature in an Arrhenius-type relation (Table 1, 1) over a wide temperature range in order to determine the energy of activation for viscosity n

=

no exp ( E I R T )

Here 7 is the viscosity of water in cp, rameter to be determined, and E

=

Table 3. Least Squares Fit of Viscosity Data as a Function of Temperature Using the Direct Grid Search Method

E ( T ) = Eo

qo is

a linear pa-

+ ELT + EIT' + E8T"

where T is the absolute temperature, and E , are the nonlinear parameters. From this relation the energy of activation is defined as (13)

Table 3 shows a four-parameter fit in order to determine 70' = 70 exp

( E d R ) ,E", 4,and E8

From these data the energy of activation for self-diffusion is ED = Eetr

+ RT

=

Eo(25'C)

Eo - E2TZ- 2EsTa =

+ RT

4.7 kcal/mole

which agrees fairly well with the experimental results (14). The results (Table 3) also show some systematic distribution of the error, which indicates that the attempted relation does not describe the phenomenon accurately. The uncertainty of the parameters in Table 3 refers to the precision of the numerical determination of the parameters, which is the smallest corresponding stepsize in the grid. The Gauss criterion of goodness of fit (15) is defined as the sum of the squared deviations divided by the difference between the number of the observed values and the number of the parameters (degree of freedom). The direct grid search method also is suggested for solving convolution problems, for example, to determine short lifetimes in luminescence studies, finding the minimum of nonlinear functions and to solve ill-conditioned simultaneous linear equations, nonlinear and transcendental equations, where the number of parameters is small and the analytical methods are not readily

730 / Journal of Chemical Education

usable. The basic advantages of the method presented here over other alternative approaches are its extreme reliability and accuracy even in case of severly ill-conditioned problems and with poor initial estimates of the solution. Literature Cited

( 1 ) DEMING,W. EDWARDS, "St&t.isticalAdjutment of Data," Dover Publications, Inc., 1964, p. 195. ( 2 ) SPANG,H., A,, SIAM Review, 4,343, (1962). H . H., Comp. J., 3,175 (1960). ( 3 ) ROSENBROCK, R., AND POWELL,M. J. D., Comp. J., 6, 163 ( 4 ) FLETCHER, (1963). ( 5 ) POWELL,M . J. D., Comp. J., 7,155 (1964). ( 6 ) POWELL,M . J . D., Comp. J., 8,303 (1965). ( 7 ) STEWBT, G . W., J . Assoc. Comp. Mach., 14,72 (1967). DONALD T, W., J . S0c.Ind. Appl. Mdh., 11,431 (8) MAR~UA~D 11963I. ~-..., ( 9 ) STRAND, T. G., KOHL,D . A,, AND BONEAM,R. A., J. Chem. Phys., 39,1307 (1963). W. E., J. CHEM.EDUC.,42.96,162 (1965). (10) WENWOHTH, S. D., J. CHEM.EDUC.,42,604 (1965). (11) CHRISTIAN, (12) STOKES,R. H., AND MILIS, R., "Viscosity of Electrolytes and Related Topics," The Inl. Ene. of Phys. Chem. and Chem. Phys., Pergamon Press, 1965, Vol. 3, Topic 16, p. 74. J., AND WATRINSON, P., J. Chem. Phys., 45,2809 (13) LIELMEZS, 11966)~ - - - - ,. (14) WANG,J . H., ROBINSON, C. B., AND EDELMAN, I. S., J . Am. C h m . Soe., 75,466 (1953). (15) WORTHINO, A. G., AND GEPPNER, J., "Treatment of Experimental Data," John Wiley & Sons, Inc., New York, 1960, p. 261. \