Global geometry optimization of clusters using ... - ACS Publications

ACS Legacy Archive .... Francisco Alberto Fernandez-Lima , Omar P. Vilela Neto , André Silva Pimentel , M. A. C. Pacheco , Cássia Ribeiro Ponciano ,...
1 downloads 0 Views 496KB Size
J . Phys. Chem. 1993,97, 9973-9976

w 3

Global Geometry Optimization of Clusters Using Genetic Algorithms Bend Hartke Department of Chemistry, University of Bielefeld, Universitbtsstrasse 25, 3361 5 Bielefeld, Germany Received: May 1I , 19938

A genetic algorithm is used to find the global minimum energy structure for Si4 on an empirical potential energy surface. Given a suitable encoding of the cluster geometry, and an exponential scaling of the potential energy values to obtain a fitness function, the genetic algorithm can successfully optimize all degrees of freedom. With the number of potential energy function evaluations as a measure, the genetic algorithm is more economical than either a set of traditional, local minimizations or a molecular dynamics simulated annealing approach.

1. Introduction Atomic clusters are an active field of current experimentaland theoretical research.’ Theoretical prediction of their structures is a difficult task, because the number of local minima grows exponentially with the number of atoms.* In particular, a set of local minima in itself does not properly characterize the potential energy surface (PES); only the knowledge of the global minimum enables one to address the question of relative importance of all minima found. There are several general methods for finding global minima (or, in general, optima),for example: enumeration of local minima found by traditional minimization?.4 simulated annealingu in conjunction with molecular dynamics or Monte Carlo methods, threshold accepting? great deluge and recod-to-record travel? and genetic algorithms.9-12 In some form or other, some of them have alreadybeen applied to the problem of geometry optimization of chemical species.lS16 The efficiency of (global) optimization methods has traditionally been measured by the number of evaluations of the function to be optimized.3*4 In most applications, function evaluation dominates the overall computational load, but it is still possible to do many function evaluations-and accordingly this is required by most optimization methods in present use. Calculation of PES points is relatively “cheap”with an empirical PES function; hence those methods could be utilized also for cluster geometry optimization. But many empirical PES functions lead to wrong PES features when used for clusters, and thus one wouldlike tocalculatethePESabinitio. This,however,introduces a huge additional computational burden, which makes it very difficult to use standard methods even for small systems. Therefore, it will ultimately be necessary to counter this increase in computational load by qualitatively different methods, e.g., by introducing problem-specific knowledge or “artificial intelligence”.l1 And even for designing a preliminary approach in this direction, it becomes of paramount importance to make the most out of the limited PES information available. With this argumentation in mind, the global minimization strategies mentioned above can be characterized as follows: In the method of enumerationof local minima, it is possible to choose a local minimization method which makes use of previously generated PES information for further steps (usually within the framework of a local quadratic approximation3.4). External, problem-specific knowledge can be fed in by way of an intelligent choice of the starting configuration for each local minimization run. There is, however, no guarantee whatsoever that the global minimum will be found, no matter how many local minimizations are performed. Also, one is obviously throwing away PES information, since one local minimization knows nothing of any *Abstract published in Aduunce ACS Abstructs, September 1, 1993.

of the others. Furthermore, this approach has been shown to be inferior to simulated annealing for specific examples.6 It has been proven theoreticallp that simulated annealing always finds the global minimum, with only fairly general requirements on the form of the PES and on the cooling schedule, but with an infinitely long trajectory. It can be argued that this translates into a still reasonable chance for finding the global minimum for sufficientlylong trajectories in practical calculations. But when dealing with an ab initio PES, one would like to have short trajectories. It is possible to treat small clusters with ab initio simulated annealing and even to show that the final minimum found is truly independent of the starting conditions,l’ but larger clusters are beyond the scope of the method. There are several attempts at making simulated annealing faster,l*J9 but a completely convincing recipe has not been found yet. And independent of this issue, there is clearly waste of PES information: imagine the system in the basin of attraction of the global minimum at t = 0 and having enough energy to escape. Between t = 0 and t = At it is cooled down, such that at t 1 At it cannot escape anymore (a phase change’9). All the parts of the trajectory at t < 0 are arbitrary as long as they ensure that the system is in the basin of attraction by t = 0. Related methods like threshold accepting and the great delugealgorithm may rival simulated annealing in effectiveness, but they do not eliminate its two basic deficiencies (long trajectories and waste of PES information). Genetic algorithms (GAS), however, are based on different principles and thus manage to alleviate some of the problems: PES information is preserved and used through simulation of the genetic learning process of biological species and through the built-in informationinterchange via geneticcrossover. A further speedup is provided by the “inherent parallelism“ 9~11of the algorithm. Further advantages are robustness with respect to changes in the problem, the method is simple to program and modify, and it is easy to include any sort of external knowledge (couplings to “artificial intelligence” methods like classifier systemdl are well studied). There are also a few disadvantages: genetic algorithms are not good for fine-tuning to a located minimum, but this is the same with simulated annealing and in both cases is not serious, since it is straightforward to add a local optimization of the final result. TWOdifficult problems of GAS are that they work fine only with a good problem e n d i n g (which satisfies the requirement for short, meaningfulbuildingblocksll) and that a suitable fitness function scaling has to be found. The latter can be problematic, in particular if the range of possible function values is not known in advance. Both problems will be discussed below. Chemical applicationsof GASso far have been limited largely to analytical chemistry,” to conformational optimization (dihedral angles only, no bond distances or angles) mostly of biochemical

0022-365419312097-9973$04.00/0 Q 1993 American Chemical Society

Hartke

9974 The Journal of Physical Chemistry, Vol. 97, No. 39, 1993

molecules21 and only few others,16J2 and to a distribution optimization in a binary alloy model.23 This paper is a model study of feasibility of all-parameter optimization of cluster geometries (that is, not only dihedralangles but all degrees of freedom). The final aim is doing this on an ab initio PES, but this is much too expensive for initial tests. Hence, an empirical PES of high quality is used, in the hope of capturing the most important features of the ab initio surface (like the 'density" and number of minima). The fully developed method will then need only a substitution of the empirical PES subroutine by an ab initio calculation (one point per call) but no new parameter tuning or other testing. Si4 is chosen as a test case, since it provides a fairly good balance between being too trivial and too large. There is also a very good empirical PES24 which is tuned to reproducing small silicon clusters correctly. In particular, it gives the rhombus as a global minimum for S i , in contrast to other established empirical PES models. Furthermore, there is an ab initio molecular dynamics simulated annealingstudy17 for comparison. Of course, a full evaluation of performancecalls for larger systems than this small test case.

, ,

z

2. Computational Method

The "standard GA" is employed as described by Go1dberg.l' Each cluster geometry is encoded as a bit string, which in turn is interpreted as a genome of an individual. The genetic material of an initial population of individuals is chosen at random. After appropriatescaling, the potential energy of each cluster geometry is interpreted as the fitness of the corresponding individual. Then, pairs of individuals are chosen as parents for the following generation, where one parent is chosen according to fitness and the other at random. Each parent pair generates two children, with a fixed probability for mutation of each bit and with a fixed probability for a simple, one-point crossover between the parent strings. The children constitute the following generation, while the parents are discarded, hence, the population size is constant. Anelitist strategy,"JJl*zwherethebest individualofeachgeneration is guaranteed to survive into the next generation, was tested and found not to be essential for the present problems. For the parameter values, common literature choices were adopted and checked by varying one parameter at a time with the others held fixed. This led to a value of 5% for the mutation probability and a value of 60% for the crossover probability. A population of 20 individuals propagated through 20 generations usually was sufficient for convergence, see below. The empiricalPES of Bolding and AndersenUwas used without changes. It has its global minimum at a planar rhombus, with an energy of-1 3.0149 eV, while arbitrarygeometrieshaveenergies up to 105 eV. A simple linear scaling of these potential energy values does not lead to useful fitness values. There are several important minima below -5.0 eV, but with a linear scaling including values up to los eV, the GA would be unable to discriminate between geometries in the range -13.0149 and 0.0 eV. Geometries in this energy range would correspond to individuals with high, but essentially equal, fitness values; therefore, these individuals would have essentially equal numbers of children. In order to remedy this situation, an exponential fitness scaling is employed. It is defined by setting the fitness of an individual with a potential energy of E, = -13.5 eV to 1.0 and the fitness of an individual with a potential energy of E, = 0.0 eV to 0.01. With this scaling, all cluster geometries with E, > 0.0 correspond to a fitness of almost zero, and the corresponding individuals are quickly weeded out. Also, this scaling differentiates the reproductive rates for individuals with E, < 0.0, allowing for effective exploration and evaluation of the corresponding PES parts. Another essential ingredientto making the GA work is a proper geometry model for translation of a cluster geometry into a set

a

2

< 2

--. -.-. 1 '"-@ - HdX dY

dz

b

Y

>

The Journal of Physical Chemistry, Vol. 97, No. 39, 1993 9975

Optimization of Clusters Using Genetic Algorithms

1001

'

'

'

'

'

'

'

'

'

'

I

% of all

I

0

10

generation

I 20

- 9.0 EpotleV

0

10

20

# Figure 3. Convergence to the global minimum: For the set of 100 runs of a population of 20 individuals through 20 generations for the sixparameter geometry model, the cumulative percentage of runs for which the best individual has already found the basin of attraction of the global minimum is shown against the number of generations on the abscissa. A traditionalPowell algorithmis used to locally optimizethe best individual in each generation. After 20generations,92 runs havesuccessfullylocated the basin of attraction of the global minimum. global basin found by generation

- 11.0

-13.0 0

lo

generation

20

Figure 2. Performance of the genetic algorithm: both panels depict the potential energy value in eV for the best individual, out of a total of 20, for each generation up to the propagated maximum of 20 generations. The middle curve, labeled with x's, is the average over 100 independent runs; the top curve, labeled with empty squares, shows the single worst run; and the bottom curve, labeled with empty triangles, stands for the single best run out of the same set of 100 runs. For panel a, the fourparameter geometrymodel was used, for panel b the six-parametermodel. Clearly, the convergence is much better for the restricted model in panel a but still satisfactory for the general model in panel b. For clarity, generation0 (initialization)is not shown;generation 1is the first generation of children, where the genetic algorithm has already weeded out many unfit geometries.

ranges of 0.040 A and &1 80°, respectively. The decimal values of these parameters are encoded into the genome by converting them to binary values, using seven bit positions and one sign bit. From visual inspection of Figure la, two important observations are easily made: (1) this geometry model does not allow for all possible geometries (e.g., T-shaped structures, which are wellknown as local minima,14J7.24are excluded; however, the model still reflects the known bias of Si4 toward planar geometries14J7) and (2) the particular coordinates chosen are separable to a large extent (e.g., specific values of ~ $ 2 will be better than others, independent of the value of the other coordinates). The sixparameter model of Figure 1b allows for arbitrary geometries. The difference from the previous model is simply that the parameter 42 is replaced by shifts dx, dy, and dz of atom pair (1,2) relative to the origin. Since these shift parameters are very similar to Cartesian coordinates, the parameters in this model do not separate as well as in the four-parameter model (inspection of Figure 1bshows, e.g., that optimizationof 4 is clearly influenced by the current values of dx and dy). 3. Test Results Figure 2a shows the performance of the GA on the fourparameter model. For this model, 100 test runs for the same geometry model and GA parameter set were done in order to

eliminate artificial effects from the random number generator. The average fitness of the best individuals in the last generation is at -12.95 eV, very close to the best individual of the bestperforming run, which ends up at -13.0076 eV. Both values are very close to the true global minimum of -1 3.0149 eV, which can be obtained from any of the final geometries using any traditional local optimization routine. Here, the Powell algorithm of the "Numerical Recipes"4 was used. Note that using the Powell algorithm on arbitrary geometries in general does not lead to the global minimum but to local minima (see below). The best individual of the worst-performing run only ends up at -1 1.30 eV, but the closeness of the average to the best run qualifies this result as outlier. Figure 2b shows analogous results for the six-parameter model. Here, the GA clearly does not perform as well (presumably this is due to the combined effect of an enlarged search space and less separability of the coordinates, cf. section 2). In particular, the percentage of bad runs has clearly increased, with the best individual of the worst run ending up at only -9.98 eV. The average performance (-1 2.08 eV), however, is not much worse than in the four-parameter model, while the top performer comes very close to the above result with -13.0028 eV. Clearly, both results indicate that there is still room for GA parameter improvement; in particular, one could increase pop ulation size until the performance on the six-parameter model is no worse than that on the four-parameter one. But this is not the point here. A GA is not meant to do fine tuning; it is at its best when exploring the PES in an "overview mode" and should be supplemented by a local traditional optimization routine which does the fine tuning. Here, the same Powell method is used again to complement the GA: the best geometry of each generation is taken as input to the Powell routine. This combined GA and Powell algorithm is used with the six-parameter model to show how the convergence to the basin of attraction of the global minimum proceeds. Again for a set of 100 runs for the same conditions, Figure 3 shows the cumulative percentage of runs having reached this basin by each generation. After three generations, 61% have arrived; after 10 generations, it is 83%; and after 20 generations, only 8% are still left out. That means we have a chance of greater than 90% for finding the global minimum with 20 individualspropagated through 20 generations, which is equivalent to 400 function evaluations (plus the about 600-1000 function evaluations needed by the Powell algorithm to bring the final geometry to the actual minimum to within a relative error of 10-6).

9976 The Journal of Physical Chemistry, Vol. 97, No. 39, I993

This result has to be contrasted with the following two scenarios: A collection of traditional, local Powell minimizations started from thesame number (20)of randomly chosen geometries takes a total of 47 OOO function evaluations or on average 2600 functionevaluationsper starting geometry,and only 15% of them actually end up in the global minimum. Note that it would be unfair to use intelligently chosen starting geometries here, since we do not really feed any intelligence into the GA either. Also, one could make use of more efficient gradient methods (likequasiNewton or conjugate gradient algorithms), but again this would be unfair since the simple GA used here also does not take into account any gradient information. A global minimum search for SiJ7 on the corresponding ab initio PES using ab initio molecular dynamics simulated annealing14J7used on the order of 5000 function evaluations (9000 steps were barely sufficient; energy and gradient were not calculated at every step, and gradient evaluations are converted here into a time-equivalentamount of energy evaluations), plus the added traditional, local optimization of the final geometry. The ab initio PES and the empirical one used here can be quite confidently called similar in the aspects important to this comparison, due to the high quality of the empirical surface.24 4. Conclusions

It has been shown that the full geometry of Si4can beoptimized effectively by a GA. Compared to two other, prototypicalglobal optimization methods (A, collecting local minima by a traditiorial method, and B simulated annealing), the GA clearly wins in terms of needing fewer function evaluations. This is due to its inherent information economy: the individuals in the GA are similar to the independent runs in method A, but in the GA they are not independent and exchange information via genetic crossover. And unlike in method B, the previously collected information is not lost but inheritedinto the followinggenerations. One should be careful in drawing conclusions from such a simplistic test case as the one presented here. Yet the extensive studies of GAS in the 1iteraturelC-12 support the hope that this approach will not break down for larger systems. Hence, the above finding warrants further development of the GA, in particular in two areas: (i) Generalizabilityof the method hinges upon finding good geometry representations for larger clusters; this will need some work but should not be totally impossible. Many researchers’s already use a cluster buildup strategy successfully, which means that an analogous geometry representation of larger clusters should work and that a fairly general representationwith a high degree of building block character can be found. (ii) Of course, as for all other methods in global optimization, the GA cannot guarantee finding the global minimum in any practical application. However, it seems reasonable to hope for a substantial reduction in computational cost, in particular when the GA is coupled to intelligent problemspecific heuristics as hinted at in the Introduction.

Hartke Future work in this area will also include full parameter tuning, further improvement of the GA by literature-knownmeans,lbl2 and, in particular, coupling of the GA to ab initio PES calculation. Acknowledgment. I am very grateful to Dr. Barry C. Bolding for sending me a ready-made Fortran subroutinefor the BoldingAndersen PES (originally written by Lionel M. Raff and Ronald D. Kay). I also wish to thank Prof. Richard Judson for helpful correspondence and for sending preprints prior to publication. It is a pleasure for me to thank Tom Ngo for a lively discussion on the GA molecule e-mail conference (ga-molecule@ tammy.harvard.edu). References and Notes (1) Cohen, M. L.; Knight, W. D. Phys. Today 1990,43,42. Cohen, M. L.; Chou, M,. Y.;Knight, W. D.; de Her, W. A. J. Phys. Chem. 1987,91, 3141. Kouteckf, J.; Fantucci, P. Chem. Reu. 1986,86, 539. (2) Stillinger, F. H.; Stillinger, D. K. J. Chem. Phys. 1990, 93, 6106. Wille, L. T.;Vemik, J. J . Phys. A 1985, 18, L419, L1113. (3) Gill, P. E.;Murray, W.; Wright, M. H. Practical Optimization; Academic Press: London, 1981. (4) Press, W. H.; Flannery, B. P.; Teukolsky, S. A.; Vetterling, W. T. Numerical Recipes; Cambridge University Press: Cambridge, 1989. (5) Kirkpatrick, S.Gclatt, C. D., Jr.; Vecchi, M. P. Science 1983, 220, 671. (6) van Laarhoven, P.J. M.; Aarts, E. H. L.Simulated Annealing: Theory and Applications; Reidel: Dordrccht, 1987. (7) Althofer, I.; Koschnick, K. U. Appl. Math. Oprim. 1991, 24, 183. Diick,G.;Scheuer,T.J. Compur.Phys. 1990,90,161. Moscato,P.;Fontanari, J. F. Phys. Lett. A 1990, 146, 204. (8) Diick, G. J . Compur. Phys. 1993, 104, 86. ( 9 ) Holland,J. H. Adaptionin NaturalandArtificialSystems; University of Michigan Press: Ann Arbor, MI, 1975. (10) Davis, L., Ed. Genetic Algorithms and Simulated Annealing; Pitman: London, 1987. (1 1 ) Goldberg, D. E.Genetic Algorithms in Search, @timization, and Machine Learnina, Addison-Wesley: Readinn. MA. 1989. (12) Proc. Second Int. Conf. onGenetic ATgorithms; Grefenstette, J. J., Ed.;Lawrence Erlbaum Associates: Hillsdale, NJ, 1987. Proc. Third Int. Conf. Genetic Algorithms; Schaffer, J. D., I1 Ed.; Morgan Kaufman: San Mateo, CA, 1989. Proc. Fourth Inr. Con$ Genetic Algorithms; Belew, R. K.; Booker, L. B., Eds.; Morgan Kaufman: San Mateo, CA 1991. (13) Car, R.; Parrinello, M. Phys. Rev. Lett. 1985,55,2471. Remler, D. and Madden, P. A.; Mol. Phys. 1990,70,921. Jones, R. 0.;Angew. Chem. Int. Ed. Engl. 1991,30,630. Pastore, G.; Smargiassi, E.; Buda, F. Phys. Rev. A 1991,44,6334. (14) Hartke, B.; Carter, E. A. J . Chem. Phys. 1992, 97,6569. (15) Moral-, L. B.; Garduno-Juarez,R.;Romero, D. J. Biomolec.Strucr. Dynam. 1992,9, 951. (16) Judson, R. S.;Jaeger, E. P.; Trcasurywala, A. M.; Peterson, M. L. J . Comput. Chem., submitted for publication. (17) Hartke, B.; Carter, E. A. to be published. (18) Szu, H.; Hartley, R. Phys. Lett. A 1987,122,157. Ingber, L.Math. Comput. Modelling 1989, 12, 967. (19) Basu, A.;Frazer, L. N. Science 1990, 249, 1409. (20) For example: Fontain, E. Anal. Chim. Acta 1992,265,227. (21) For example: Blommers, M. J. J.; Lucaaius, C. B.; Kateman, G.; Kaptein, R. Biopolymers 1992,32,45. Judson, R. J. Phys. Chem. 1992, 96, 10102. (22) McGarrah, D. B.; Judson, R. S. preprint, 1993. (23) Smith, R. W. Comp. Phys. Commun. 1992, 71, 134. (24) Bolding, B. C.; Andersen, H. C. Phys. Rev. A 1990,41, 10568. (25) For example: Poteau, R.; Spiegelmann, F. J. Chem. Phys. 1993.98, 6540. Vlachos, D. G.; Schmidt, L. D.; Aris, R. J. Chem. Phys. 1992, 96, 6880.