14 Optimization of Sampling Algorithms in Monte Carlo Calculations on Fluids JOHN C. OWICKI
Downloaded by CORNELL UNIV on May 22, 2017 | http://pubs.acs.org Publication Date: June 1, 1978 | doi: 10.1021/bk-1978-0086.ch014
Department of Chemistry, Stanford University, Stanford, CA 94305
The analysis of simple model classical fluids by Monte Carlo (MC) techniques now is f a i r l y routine. This is not so in more complicated systems, where computational expense may limit the precision of the results or the number of calculations performed. Examples include liquid water, dilute solutions, and systems with phase boundaries. Even in "simple" systems, some properties (notably free energies) are very d i f f i c u l t to calculate precisely. The scope of feasible calculations can be expanded by reducing the cost of computing. Since computer technology continues to improve, this factor is significant. MC standard errors are asymptotically proportional to the reciprocal of the square root of the length of the computation, however, so improvements in precision will not be as great as drops in computational costs. Another approach is to improve sampling procedures, since i t is a maxim for MC calculations in general that a properly designed sampling algorithm can reduce s t a t i s t i c a l errors dramatically (1). Therefore, this chapter deals with the optimization of sampling algorithms for MC calculations on bulk molecular systems, p a r t i cularly fluids. Part I is a discussion of some developments in the mathematical theory of MC sampling using Markov chains. The scope is purposely limited, and important topics such as umbrella sampling and sampling in quantum mechanical systems are given short s h r i f t , because they are well-reviewed elsewhere (2.,.3). Nor do we discuss the work of Bennett (4), which deals primarily with e f f i c i e n t data analysis in MC free energy calculations, not with the sampling algorithms per se. Part II of the chapter is a generalization of a sampling algorithm (5) which we have developed for MC calculations on dilute solutions, with some calculations to illustrate i t s usefulness. I.
Theoretical
Developments
The Metropolis Algorithm. For reference purposes and to establish notation, the standard sampling algorithm of Metropolis 0-8412-0463-2/78/47-086-159$05.00/0 © 1978 American Chemical Society Lykos; Computer Modeling of Matter ACS Symposium Series; American Chemical Society: Washington, DC, 1978.
160
COMPUTER MODELING OF MATTER
et^ a]_. (6) is summarized below. We assume that the reader is already f a i r l y familiar with i t . Underlining denotes a vector or matrix, and, for typographical convenience, subscripts are often indicated in parentheses. The system is assumed to have a large but f i n i t e number of states labeled i = l , 2 , . . . , S , corresponding to different molecular configurations. States occur according to the probability distribution vector TT, where the i t h . element TT(i) is the normalized Boltzmann factor for state i . Transitions between states are made according to the stochastic transition matrix P_, which must satisfy the eigenvalue equation (1)
Downloaded by CORNELL UNIV on May 22, 2017 | http://pubs.acs.org Publication Date: June 1, 1978 | doi: 10.1021/bk-1978-0086.ch014
TT = TTP
There are other conditions on J\ such as irreducibility (1_), but we will not discuss them here. The matrix element p ( i , j ) is the probability of moving to state j in the next step, given that the system currently is in state i . The Metropolis choice of is p ( i , j ) = q ( i , j ) - m i n [ l , Tr(j)/ir(i)] = 1 - E p(i,k)
i t j i = j
(2)
The elements q ( i , j ) of Q. are defined by the way t r i a l configurations j are generated. Usually one of the N molecules is selected at random, and each of its coordinates is given a perturbation uniformly on some fixed interval [-A,A] (for a spherically symmetric molecule). Thus the state j l i e s in some domain D(i) in configuration space centered on state i , with volume V(D) = N(2A) . More concisely, 3
q(i,j) = W(D)
j c=D(i)
= 0
otherwise
(3)
Other relevant features of Q are q(i,j)
0
(4)
E q(i,J) = 1 j
(5)
q(i.j) = q ( j » i )
(6)
The t r i a l step generated by £ is accepted with probability min[l ,TT(j)/TT(i)]. A variation on this scheme, usually attributed to A. A. Barker ( 7 J , is to use a step acceptance probability of 1/[1 + Tr(i)/7r(j)]. It has been noted (3) that this algorithm
Lykos; Computer Modeling of Matter ACS Symposium Series; American Chemical Society: Washington, DC, 1978.
14. OWICKI
Optimization of Sampling Algorithms
161
actually was used earlier by Flinn and McManus (8). To compute ensemble average properties (e.g., the mean potential energy) the relevant mechanical variable is formally represented by a vector f, where f ( i ) is the value of the variable in state i . A Markov chain of M states {X(m)} , m=l,2,...M, is then generated using P_, and the average of f is estimated to be
Downloaded by CORNELL UNIV on May 22, 2017 | http://pubs.acs.org Publication Date: June 1, 1978 | doi: 10.1021/bk-1978-0086.ch014
f = 1 £f[X(m)] m=l
(7)
The Hastings Formalism. Hastings {9) has published a generalization of the Metropolis algorithm. The essential features are that Q no longer need be symmetric as in eq. (6), and that any of a whole class of matrix functions a may be used to determine step acceptance probabilities. More s p e c i f i c a l l y , p(i,j) = q(i,j)a(i,j)
i t j (8)
= 1 -
Z
p(i,k)
i = j
We will not describe the allowed functional forms for a except to note that the Metropolis and Barker algorithms, generalized for nonsymmetric Q, represent two special cases: ( ^ ( i j ) = min[l,t(j,i)] a (i,j) B
= 1/[1 + t ( i , j ) ]
(9) (10)
where t(i,j)
= [Tr(i)q(i,j)/(Tr(j)q(j,i))]
(11)
Hastings (and Metropolis et a K ) considered only £ which were reversible, i . e . , Tr(i)p(i,j) = 7r(j)p(j,i)
(12)
There exist non-reversible £ which s t i l l satisfy eq. (1), but reversible P_'s generally are simpler to construct and deal with mathematically. The Metropolis and (especially) the Hastings formalisms in principle give the MC practitioner wide latitude in choosing £ for a particular problem. We now turn to the question of computational efficiency. In other words, how should a and £ be chosen to produce a low error ( i . e . , variance) in f for a given investment in computer resources? In addition to empirical and intuitive rules, there now is some useful theoretical guidance on this question.
Lykos; Computer Modeling of Matter ACS Symposium Series; American Chemical Society: Washington, DC, 1978.
162
COMPUTER MODELING OF MATTER
Optimization of Step Acceptance Matrix a. It has been widely conjectured that the Metropolis a is superior to the Barker a, since 0^(1',j) ^ a ( i , j ) when i f j i f the same £ is B
used for both. The Metropolis scheme will reject fewer t r i a l steps; i t will sample more states and presumably lead to smaller s t a t i s t i c a l errors. Peskun (]0) has proved that this is so, and he has also obtained some much stronger results. In this mathematical work, optimization refers to minimizing the asymptotic variance of f, using to sample from TT, where the asymptotic variance is defined as v(f.T!>P) =
l
i
m
M-*
{ M
C 5 f(X(m))/M]} m=l
v a r
Downloaded by CORNELL UNIV on May 22, 2017 | http://pubs.acs.org Publication Date: June 1, 1978 | doi: 10.1021/bk-1978-0086.ch014
30
(13)
The basic theorem is as follows. For two choices of JP.-J and Pg, assume that £ Pg in the sense that p-|(i,j) ^ P ( » J ) 2
1
for a l l ( i , j ) where i + j . That i s , a l l transitions to different states are at least l i k e l y for as for Pg. Then vd.TL.P^ 4 v(f,7r,P ). 2
It is clear that P^Pg
d> hence, that P^ gives an asymptotic
an
variance which is at best as low as that for P^.
For a discussion
of the special case of sampling in a two-state system, see (11). Peskun also showed that P^P. for any other choice of a within the Hastings formalism, so that the Metropolis scheme is asympt o t i c a l l y the optimum choice of a. Optimization of Transition Matrix Q. In another paper (12), Peskun has investigated how the choice of Q affects the asymptotic variance, again within the Hastings formalism. The optimum (£, which may not be unique (V3), is asymmetric and reversible: ir(1)q(1,j) = 7r(j)q(j,i)
(14)
If £ is reversible, then 0^(1",j) = 1 for a l l ( i , j ) , and P = Q. In other words, i t is asymptotically optimum to place the whole burden of the sampling on Q_ so that no t r i a l steps are rejected. Eq. (14) does not give sufficient constraints to f i x Peskun has shown further that the £ which minimizes v(f,TT,P) is a complicated function both of f and TT. Even though a rigorous optimization is impractical, by making a simplifying approximation he was able to come up with qualitative guidelines for choosing £ to reduce v(f,7r,P). The prescription is to relate [ q ( i . j ) - ir(j)] roughly linearly to [f(D
" f(j)]
2
Tl(i)
TT(i) >, Tl(j)
Lykos; Computer Modeling of Matter ACS Symposium Series; American Chemical Society: Washington, DC, 1978.
(15)
14. OWICKI
Downloaded by CORNELL UNIV on May 22, 2017 | http://pubs.acs.org Publication Date: June 1, 1978 | doi: 10.1021/bk-1978-0086.ch014
or
Optimization of Sampling Algorithms [f(1) - f ( j ) ]
2
ir(j)2/7r(1)
163 Tr(i) < ir(j)
At the same time, £ should approximately satisfy eq. (14) and must rigorously satisfy eq. (4) and eq. (5). The prescription implies that transitions should be encouraged to states of high TT and to states with values of f which d i f f e r greatly from that of f in the current state. Next we will discuss how the usual choice of Q can be analyzed in terms of the guidelines. If Q is chosen according to eq. (3), f is ignored completely. The current state i of the system probably will have high TT, as will nearby states j . "Nearby" means states with similar molecular coordinates. We wish to sample these states more frequently than distant states, about which we have no information except that most of them have TT~0 (in a dense f l u i d ) . A simple procedure is to set q ( i » j ) = 0 except for states lying in D(i), where D(i) corresponds to the perturbation of the coordinates of one or more molecules at state i. V(D) should be made as large as possible, consistent with including a high proportion of states with high TT. In practice, V(D) is adjusted empirically by increasing A until the step acceptance fraction drops to - 0 . 5 . Thus, there is some optimization of with respect to TT. Within D(i), q ( i , j ) should be made reversible. Unfortunately, this requires extensive knowledge of the behavior of TT in D(i), which is computationally expensive to obtain. A zero-order approximation is that TT is constant in D(i); this leads to the usual symmetric choice q ( i , j ) = q ( j , i ) « It often is not too expensive to obtain spatial analytical derivatives of potential energy functions, as long as the energy is being calculated anyway. One can make the first-order approximation that the gradient of the energy is constant in D(i) and is equal to i t s value at state i , then construct q ( i , j ) to be reversible under this assumption. Pangali et a l - (14 and in this Symposium) have used essentially this technique to improve the rate of convergence of MC calculations on liquid water. A related method was used earl i e r for a quantum f l u i d , but with less success (1_5). The extension to higher derivatives of the energy is obvious. Another approach is to improve by using information about the current configuration to select the size, shape, and position of D(i) at each step. For example, Rossky e l a l . (1_6) used the f i r s t derivative of the energy to select the origin of a Gaussian perturbation of the molecular coordinates. In optimizing Q for a dense f l u i d , the importance of f usually will be less than that of TT. The primary reason is that TT is a strongly varying exponential function of the molecular coordinates, differing significantly from zero only in a small fraction of the states. The variation of most other functions (e.g., the potential energy) is weak by comparison. Since the true ensemble average of f is the sum of f(i)ir(i) over a l l states, states with high TT dominate ?. Also, MC runs typically involve
Lykos; Computer Modeling of Matter ACS Symposium Series; American Chemical Society: Washington, DC, 1978.
164
COMPUTER
MODELING O F M A T T E R
the calculation of several properties of the system; one £ genera l l y will not optimize a l l of the estimates. Nevertheless, optimizing Q_ with respect to ? sometimes can be important. For example, the Helmholtz free energy difference A(1)-A(0) between two systems 1 and 0 can be written as A(l) - A(0) = -kT-ln
(16)
0
where U(l) and U(0) are the potential energies and is the ensemble average in the 0 system. Here f, which is the exponent i a l of the energy difference, may be as strongly varying as TT, where TT
+ &•-•
3-5
+
4-0
4-5
Figure 4. Comparison of standard errors for g(r), when g(r) is computed with different sampling algorithms. (Q) uniform sampling of solvent particles, solute perturbed every tenth step; (-\-) preferential sampling of solvent particles, solute perturbed as above; (#) uniform sampling of solvent particles, solute fixed. For clarity, only every other data point is plotted. Estimates of errors are based on fluctuations in g(r) between blocks of 20,000 configurations (21).
Lykos; Computer Modeling of Matter ACS Symposium Series; American Chemical Society: Washington, DC, 1978.
170
COMPUTER MODELING OF MATTER
computing costs, compared with uniform sampling, were about 20 seconds of IBM 370/168 CPU time per 10 steps.
Downloaded by CORNELL UNIV on May 22, 2017 | http://pubs.acs.org Publication Date: June 1, 1978 | doi: 10.1021/bk-1978-0086.ch014
5
Digression: How Important Is It to Move the Solute? Perturbing a solvent molecule generates only one new solute-solvent distance for calculating g(r), while moving the solute generates N new distances. This suggests that one should move the solute more often than a solvent molecule, on the average. Note that i t is not necessary to move the solute at a l l , since only relative molecular coordinates are important and thus the coordinate system can be fixed on a molecule ( e . g . , the solute). To test the importance of this effect, the two computations discussed above were repeated, except that the solute was not perturbed. The standard errors for the uniform sampling run are plotted (#) in Figure 2. They usually l i e well above the errors from both runs in which the solute moved. For c l a r i t y , the solute-fixed/ preferential sampling results are not plotted. They are not too different from the solute-moved/preferential sampling results for small r , but are significantly higher for large r. Not surprisi n g , i t is important to perturb the solute. Acknowledgements Thanks go to P. Peskun for a copy of his paper {]2) as well as for interesting discussions of the relevance of his results to calculations on dense fluids. The author is supported by an Institutional Research Fellowship under NIH Grant No. GM 07026.
Literature Cited 1. 2.
3. 4. 5. 6. 7. 8. 9. 10. 11.
Hammersley, J . M . , and Hanscomb, D. D., "Monte Carlo Methods," especially Chapter 9, Methuen, London, 1964. Valleau, J. P., and Torrie, G. M . , Chapter 5 in Berne, B. J. (ed.), "Statistical Mechanics, Part A," Plenum, New York, 1977. Wood, W. W., and Erpenbeck, J. J., Ann. Rev. Phys. Chem., (1976) 27, 319. Bennett, C. H., J. Comp. Phys., (1976), 22, 245. Owicki, J . C . , and Scheraga, H. A., Chem. Phys. L e t t . , (1977), 47, 600. Metropolis, N . , Rosenbluth, A. W., Rosenbluth, M. N . , T e l l e r , A . H . , and T e l l e r , E., J. Chem. Phys., (1953), 21, 1087. Barker, A. A . , Austral. J. Phys., (1965), 18, 119. Flinn, P. A . , and McManus, G. M . , Phys. Rev., (1961), 124, 54. Hastings, W. K . , Biometrika, (1970), 57, 1. Peskun, P. H . , Biometrika, (1973), 60, 3. Valleau, J . P . , and Whittington, S. G . , Chapter 4 in Berne, B. J . (ed.), "Statistical Mechanics, Part A," Plenun, New York, 1977.
Lykos; Computer Modeling of Matter ACS Symposium Series; American Chemical Society: Washington, DC, 1978.
14. OWICKI Optimization of Sampling Algorithms 12.
13. 14. 15. 16.
Downloaded by CORNELL UNIV on May 22, 2017 | http://pubs.acs.org Publication Date: June 1, 1978 | doi: 10.1021/bk-1978-0086.ch014
17. 18. 19. 20. 21.
171
Peskun, P. H . , "Guidelines for Choosing the Transition Matrix in Monte Carlo Sampling Methods Using Markov Chains," presented at the Fourth Conference on Stochastic Processes and Their Applications, York Univ., Toronto, August 5-9, 1974. For an abstract, see Peskun, P. H . , Adv. Appl. Probab., (1975), 7, 261. Peskun, P. H . , private communication. Pangali, C . , Rao, M . , and Berne, B. J., Chem. Phys. L e t t . , in press. Cepereley, D . , Chester, G. V . , and Kalos, M . , Phys. Rev. B, (1977), 16, 3081. Rossky, P. J., Doll, J . D . , and Friedman, H. L., J. Chem. Phys., in press. Torrie, G . , and Valleau, J. P., J. Comp. Phys., (1977), 23, 187. Owicki, J . C . , and Scheraga, H. A., J. Am. Chem. S o c , (1977), 99, 7413. Owicki, J . C . , and Scheraga, H. A., J. Phys. Chem., (1978), 82, 1257. Squire, D. R., and Hoover, W. G., J. Chem. Phys., (1969), 50, 701. Wood, W. W., Chapter 5 in Temperley, H. N. V . , Rowlinson, J . S., and Rushbrooke, G. S. (eds.), "Physics of Simple Liquids," North-Holland, Amsterdam, 1968.
RECEIVED September 7, 1978.
Lykos; Computer Modeling of Matter ACS Symposium Series; American Chemical Society: Washington, DC, 1978.