Numerical optimization on a microcomputer - Journal of Chemical

Bits and pieces, 15. It is the purpose of this article to introduce and offer for distribution three useful and important optimization programs and an...
0 downloads 0 Views 2MB Size
ALLOWED OPERIITORS: +,-,*,/.**,I.! I N M Y ACCEPIRBLE PORTRAN EXPRESSION - U S E CNN OR CNNlPm! TO SPECIFY COL NN OR THE m-TH OBSERVATIOIV OF COL NN. THE ALLOWED BONCTS. BRE ABS,EW.U)C,LN.SQRT,SIN,COS,TRN, A S I N , A C O S , A T * I N , D E G , ~ , ~ ~ , R I ~ D . S G NAS, WELL A S n11i AND Max. WRlCH CAN TAW 2 TO 9 ARGIMEIRIS AND HOD, W I C H T A K E 2 . EXAMPLES ARL: CAlC,LOG(L.28)+

3.0

*

SIN10. 5 )

PRINTS VALUE

FOR IF-ELSE COP(MND, SYNTAX 1 5 SIMILAR. CIL-CIWI. IF(ABS(C3!.GT.O.5! C3-99. ELSE C+L3 I F ( X 3 . E Q . L . ) X 4 - X S * SINIRADlbO.O!!

IF(C3.IT.2..AND.lC3.GI.-1.))

ELSE CLAUSE I S O R I O N A L

-

NORMRL: 2-1.2 EXP (-.5 [X+.83lr*2 -.5~Y+.031**21 ra

FOR EXAMPLE: ELSE CIL-0.

VALUES DEFAULT TO 0.0 I F NOT

Figure 3. Potential surface plot generated by SURFUN

ALRUiOY SOMETHING ELSE

Figure 1. Sample commands for program MATH.

piecewise, weighted, and constrained fitting. The results of the analvsis in Fieure 2 are essentiallv the same as those obtained &ing a s t i d a r d nonlinear least squares approach (10). The t statistics, although not strictly valid for nonlinear models, are printed since the program can also he used for linear fitting. The run in Figure 2 consumed about 20 seconds of CPU time on a PRIME 750 minicomputer. The third ororram, SURFUN, combines the function translator with a standard three-dimensional surface plotting algorithm (11). By typing the expression Z = 1.2 * EXP(-.5 r (X t .03) ** 2 -.5 * ( Y

+ .03) ** 2 )

the ~ o t e n t i afunction l surface of Fieure 3 was automaticallv and the latter for hardcopy. The run in Figure 3 reqiired 43 second of CPU time. with both screen (Tektronix 4010) and plotter (Tektronix k 6 2 ) output. Acknowledgment The authors gratefully acknowledge financial support from the U S . Environmental Protection Agency. The PRIME 750 computer used in the development of these programs was purchased with the aid of funds from the National Science Foundation B(1)

----

ST ERR181

. . . . . . . . . .

TI

19 OP!

. .. .. . . .. .

SIC

----

Numerical Optimization on a Microcomputer G. Brink, L. Glasser, R. A. Hasty, and P. Wade

University of the Witwatersrand, Johannesburg 2001. South Africa Figure 2. Sample terminal session with program SIMFUN. User input occurs in the numbered steps. which are discussed in the text.

scatterplots and store fitted values. Figure 2 shows a sample terminal session with SIMFUN, in which a set of structureactivity data is fit to the nonlinear relationship (9): log(Activity)= R I -~Rnlog(B:ilO-"+ 1.0) + R41 + Rs The activity and lipophilicity (4values are input (steps 1and 2), the lipophilicity values are transformed and an indicator variable (I) is generated (steps 3-61, and the fitting command is given (step 7). Numeric constants in the FIT command are taken as initial parameter estimates, which are optimized in the fitting process (steps 8 and 9). Because of this, the constant 1.0 in the above equation must he stored in C4 (step 41, rather than typed in the FIT command, otherwise it would be optimized along with the other parameters. After fitting, the predicted activity values are stored (step 101, and they can be plotted or examined as desired. This program also permits 564

Journal of Chemical Education

Ootimization is the process of adjustment of the set of numerical parameters used in the description of a system. T h e obiective function (dependent variahle) which describes the syitem, and whose vaiue is controlled by the parameters, is adjusted to its optimum value. The objective function may he described by a mathematical expression, or it may he the sum of the squares of the differences between given values and corresponding values generated by a fitted equation (as in least-squares analysis), or it may simply be an instrumental resoonse resultine from a oarticular choice of experimental parameters. In all such instances, it is desirable to have some rapid and efficient procedure by which the optimum may he found. The classical analvtical ~rocedureuses differential calculus. The derivative of a function of one independent variable is set to zero and the resultinr- eauation solved for the corres~ondinr . . value of the independent variahle. In the case of multiple independent variable~,a system of simultaneous partial differential equations is established and solved. The latter procedure is used, for example, in multiple linear regression.

Such a procedure is not always feasible. The equations may not be differentiable. or the set of differential euuations cannot be solved, or the equations are nonlinear and intractable. In such cases it is aunronriate to use numerical methods. Many such methods are.inown and available on mainframe computers. Unfortunately, there is no one technique that is best for all problems and the user must make a judicious choice from among the many nrocedures now available, guided by the experience-of experts-(13,141. With the advent of sufficiently powerful microcomputers. it has become attractive to consider using these machcnes for optimizations of limited scope, i.e., with a small number of variables and/or an objective function of reasonable simplicity. There is already a commercial offering (15) in BASIC of a nonlinear least squares analysis program, NLLSQ, based on Marquardt's method (16). Such BASIC programs can he s ~ e e d e dhv a factor of three to five times if the Droerams are chmpiled into machine code, using one of the BAYSIC compilers now available. The result is a powerful tool readily accessible to researchers and to undergraduates for tackling problems of a complexity which could not have been considered in the recent past without access to major computing resources. It is the uuruose of this article to introduce and offer for distribution three useful and important optimization programs and an impressive graphic demonstration of the progress of one of the methods to an optimum. Unl-Dimensional Direct Search: Method of Davies, Swann, Campey ( 17, IS)

Figure 4. Screen display aner a simplex has progressed function whose contour map is displayed.

points, which constitute the vertices of the simplex. The worst response vertex is then replaced by reflecting this vertex through the centroid of the remaining vertices to produce a new vertex, "inverting" the simplex in the process. The response a t the new vertex is then evaluated. The process is rrlir.:~ted3ntl tlw ;impies rnc,ve-, t\,ward, t h t t~ptimuni.I'n ,autionj r ~ i ~ hs tt takrrt 1 . 3 nwid usrlll.~tiw~ ,,I the ,cnrrh ill "v;~llev," c,r .'r1diec3' (4the, h i w t i w iun~.rion.:\areleratim ~ ~ u or deceleration of the movement of the simplex in favorable or unfavorable directions may be arranged (21). The simplex method has the feature that no objective function need he evaluated. I t is onlv necessarv to s u.. n ~"l vto the . oroeram the " response values for the vertices that it selects. Hence, the simplex can he used emniricallv: " . for e x a m ~ l eto outimize a synihesis with respect to reagent concentrations, temperature, contact time, etc. ~

.

an unreliable "infinite-time" value (rather than zero) against which differences must he taken. The usual method bv which such reliance is avoided is due to Guggenheim, but this carries its own problems in train 118). Thus. Holt and Norris suenest -accepting the "infinite-time" value as another adjustable parameter in the least-squares analys~sand optimizing its value to get the most favorable least-squares plot. The DSC optimization method (13, 18) uses a direct search along the range of "infinite-time" values until a minimum in the weighted least-squares deviation of experimental against fitted data points is bracketed. Then a quadratic interpolation is used to refine the estimate of the minimum. Since the curve will not generally be quadratic, the step-size must now he reduced and a new search initiated in the vicinity of the estimated minimum. This is repeated until the step-size reaches some pre-set lower limit. This procedure is guaranteed to converge to a unique and correct solution (18). Since no derivative is required, this is termed a "direct" search.

~

~

sults'in an appropriate and convenient form. Holt and Norris (18) offer a Fortran version of this program; it should he noted that the BASIC program offered is not simply a re-coding of that program, hut rather a new formulation of their procedure. Multi-Dimensional Direct Search: Simplex Method One method for a direct search is to evaluate the obiective function a t an array of points on a predetermined grid and then simnly choose the most favorable response. This method is crude and does not use the experienceof the function already generated to guide the further progress of the search. A most successful alternative is the simpler search. A simplex i n n dimensions is a convex geometric figure having n 1vertices. In two dimensions the simplex is a triangle, while in three dimensions the simplex is a tetrahedron. Coordinates are arbitrarily chosen for n 1 points in the n dimensions of the space and the objective function evaluated at each of these

+

+

toward the optimum

01 the

~

~

original device to move away from a valley or ridge, by weighting the vertices according to their resnonses. This has -

spaces, and 1s not recommended for n > 5. A graphical demonstration is offered to illustrate the motion of the simplex in two-dimensional space. The user chooses arbitrary coordinates in the allowed range, then sees the simplex (displayed on a contour map of the function) invert itself repeatedly as it travels purposefully towards the optimum, altering shape as circumstances demand. The demonstration may he repeated after the optimum has been reached, using a new set of startmg points. Figure 4, taken from the screen, shows the trace left by a simplex progressing to the optimum. Multi-Dimensional Gradient Search Function linearization and steepest descent (23) are two classes of techniques for finding an optimum response which both use partial derivatives of the optimizing function with respect to the parameters to be determined. Function linearization invo1;es an approximating linear expansion (Taylor expansion) of the function, while the steepest descent method seeks a minimum by moving along a vec& whose components are proportional to the corresponding partial derivatives of the function a t that point. Both methods require iterative solutions and converge rapidly only if near a minimum, where the assumptions used in their derivation are realized. In general, however, the nature of the multidimensional hyperspace is such that convergence to a minimum may fail or he slow. Numerous modifications have been made to the basic Volume 60

Number 7

July 1983

565

-

~

-

techniques to ensure, and speed up, convergence (23, 24). Perhaps the best known method is that due to Marquardt (16) which attempts to use the best features of linearization and steepest descent. The particular method we have used is that due to Meiron (251, as described by Pitha and Jones (241, who found it to converge the most rapidly of seven methods examined with their test data. The program offered is only suitable for least-squares optimization. The partial derivatives reauired are estimated numericallv, usine differences: errors

only drawback is that derivatives (of sometimes awkward forms) are needed. At some cost in speed, all derivatives can he approx~matedby finite differences. The Marquardt algorithm can he summarized as follows (28): A) Evaluate the residual error RE; if it is small enough, stop.

Here d, is a data point, x , is the ith set of independent varitime. Availability The programs offered are written for the Apple I1 Plus microcomputer with 48K of memory and a single disk drive. As the programs are written in the Applesoft dialect of BASIC, they should be readily translatable to other Microsoft BASICS, e.g., P E T or TRS-80, except for the graphic demonstration of the simplex method which uses the unique features of the Anole's eranhics. grams are' availahle for US$7 each, all from ~ r o f e s s hL. Glasser a t the above address. The disk is also availahle through Project SERAPHIM.

A Marquardt Nonlinear Least-Squares Program for the Apple II or PET

are to be determined by the fit. B) Evaluate the residual error's response to changes in the q as a Taylor series through quadratic terms.

The second term in the second derivative expansion (eqn. 4) is dropped, as an approximation. Then if B is a vector (of first derivatives of RE) and A the matrix (of second derivatives) while A is a vector of the displacements dq, in each parameter, the residual error is

+ AT A A = RE(qO)+ AT (B + A A)

C a r l Trindle University of Virginia, Charlottesvilie. VA 22901

Sometimes there is no simple way to plot chemical data so that a linear relation can be tested. In other cases the transformation to variables which produces a linear relation weights some data unfairly (26). In those cases a fit of data to some nonlinear form is necessary. Programs to fit simple higher order polynomials to data are commonplace; it is a simple exercise to program a microcomputer to produce a least-squares polynomial representation of nonlinear data. However. manv theoretical rewresentations of nonlinear data do not take the polynomial form. I have found the need to evaluate the best parameters in rate laws, computed potential functions in atoms, equilibrium expression, convolution integral representations of fluorescence decay, and low-temperature heat capacity expressions; none of these functions is a polynomial. A numher of numerical methods for nonlinear function fitting are known (27). These may he distinguished hy their use of derivatives of the functional form: Need Derivatives

NO Derivatives

Fletcher-Powell-Davydon Marquardt Steepest Descents

Simplex Pattern Search

by their "robustness," which is an indication of how tolerant they are of poor initial guesses: Robust

Delicate

Marquardt Simplex

Pattern Search Steepest Descents FPn

and by their speed of convergence Fact

Slow

Marquardt

Simplex

The Marquardt algorithm is attractive, fast, and reliable; its 566

Journal of Chemical Education

RE@) = RE(qO)t AT B

(5)

go corresponds to a minimum in RE@), the coefficient of A would be zero. C) Choose A so If

B+AA=O;A=-A-IB

+

and consider q0 A = q to he a better description of the least-squares optimized set of parameters. D) If need he, revise the step length A defined in the "steepest descents" direction; that is, altering q o by A will produce the most drastic decrease in RE. The matrix A may be ill conditioned, in which case inversion is unreliable. In the Marquardt method, A is replaced by A + XI. The parameter A is chosen so that A + XI is non-singular (det (A + XI) $ 01, and so that the step A(X) = (A + XI)-'B is guaranteed to decrease the residual error. Figure 5 may clarify the last remark. Let the RE surface he represented hy a contour map. The original point q Ohas a steepest descents direction defined locally. A very short step will surely lead to a lower RE. But a long step may in fact increase R E The step length is short if X is large, while it could 0. The parameter X provides a way for the he long if X method to pick its way through steep, jagged terrain. E) Return to step A. A program employing the Marquardt algorithm is usually available as part of the library of any computer center (NLWOOD is the name likely to he familiar to computercenter staff). A few similar programs are availahle (from commercial vendors) for popular microcomputers. We have made our Apple 11+ version, which has been developed and tested over the past year, available through Project SERAPHIM (NSF-DISE),Eastern Michigan University, Ypsilanti, MI 48197. The program requires APPLESOFT in ROM, 48KB memory, DOS 3.3, and one disk drive. Please enclose a check for $4 made out to "Project SERAPHIM, acct. #20350." A PETIVIC cassette tape is availahle as well; hut no provision has been made for disk storage on the PETIIIIC.

--