4799
J . Phys. Chem. 1987, 91, 4799-4801 0.
TABLE VI: I
0.
g
0.1 0.5 1.0 2.5 5.0 7.5
0.
and J. Values Obtained from y* of CaBrl so $ RMSD"
-3.4784 0.1639 0.3500 0.1593 0.3541 0.5425
80.11 0.5913 -0.3501 -0.08433 -0.1013 -0.01237
2.086 8.364 5.101 1.838 8.194 6.107
X (2.837 X lo-') X (3.655 X X (2.390 X IO-') X lo-' (4.598 X lo-') X lo-' (4.961 X IO-') X lo-' (9.419 X lo-')
OThe RMSD values in parentheses refer to the existing values. = 0.07 and $ = -0.007. -0.31
1
2. 0
0.0
1
1
4.0
TABLE VII: Trace Activity Coefficients for NaBr and CaBr, in NaBr-CaBr2 Mixtures
1
8.0
6. 0 1
I 0.1 0.5
Figure 3. log y" vs. I: (+) NaBr and (*) CaBrz.
1.0
-0.15
-
t
0
.
2
~
and $
I
log Y k B r -0.1052 -0.1286 -0.1630
log yEaBr2 -0.1989 -0.2566 -0.2588
2.5 5.0 7.5
log Y&aBr
log YZaBr2
-0.1402 -0.02153 0.1052
-0.1922 0.1617 0.5054
values. If we use the weighted averages for and $ with being weighted proportional to I and $ proportional to 12, we get different values from the 7 N a B r and 7 C a B r 2 data. NaBr (Table V) gives = -0.0321 and $ = 0.0209 while CaBr, (Table VI) gives = 0.3808 and $ = -0.0356. The experimental and calculated y N a B r at I = 2.5 m are plotted in Figure 4 by using the three different sets of (%, $) described above. From the system studied in our laboratory, we conclude that and $ are also distinct functions of ionic strength. We are unable to explain this behavior due to the lack of sufficient lit~ ~ mixing parameters ~ ~ well as several ~ erature on these as other~ systems. To facilitate a more detailed comparison of the effect of one electrolyte on the other in the mixture, the trace activity coefficients of both NaBr and CaBr2 were calculated by using the following equations.
which can be obtained from eq 9 and 10 by substituting YB = 1 and YA = 1, respectively. These trace activity coefficients of NaBr and CaBrz as a function of ionic strength are given in Table VI1 and are plotted in Figure 3. ( 2 5 ) Rogers,
P. S. Z. Ph.D. Dissertation, University of California,
Berkeley, 1980.
Acknowledgment. We acknowledge the support of this research by a consortium of 11 petroleum production and service companies. Registry No. NaBr, 7647-15-6; CaBrz, 7789-41-5.
Umbrella Sampling: Avoiding Possible Artifacts and Statistical Biases Stephen C. Harvey* and M. Prabhakaran Department of Biochemistry, University of Alabama at Birmingham, Birmingham, Alabama 35294 (Received: January 15, 1987) The umbrella sampling procedure for Monte Carlo or molecular dynamics simulations allows the examination of structural and energetic questions in regions of conformational space that are not near the minimum-energy structure. This paper discusses biases that can occur in the determination of thermodynamic parameters in umbrella sampling simulations in multidimensionalconformational space. Two tests are described for determining whether such biases exist in a given simulation, and criteria for selecting appropriate umbrella sampling potentials are suggested. Introduction Among the most popular computer modeling methods for examining molecular structure are the Monte Carlo'-2 (MC) and (1) Valleau, J. P.; Whittington, S. G. In Stafistical Mechanics, Parr A : Equilibrium Techniques; Berne, B. J . , Ed.; Plenum: New York, 1977; pp 137-168.
0022-3654/87/2091-4799$01.50/0
molecular dynamics3" (MD) algorithms. Both of these allow one to survey conformational space in the neighborhood of a particular (2) Valleau, J. P.; Torrie, G. M. In Statistical Mechanics, Part A : Equilibrium Techniques; Berne, B. J., Ed.; Plenum: New York, 1977; pp 169-194. (3) McCammon, J. A,; Karplus, M. Annu. Reu. Phys. Chem. 1980,31,29.
0 1987 American Chemical Society
~
4800
The Journal of Physical Chemistry, Vol. 91, No. 18, 1987
structure, generating a data base of thousands of configurations for investigating problems of structure and energetics. The MD algorithm also provides explicit information on the dynamics of the system. In many problems, one would like to examine conformational space in a region that is not near a minimum-energy structure; an example would be the transition state between two local minimum-energy conformations. Unrestrained MC and MD simulations are not well suited for this when E , the free energy of the structure relative to the system’s minimum free energy, is greater than a few times kT, where k is Boltzmann’s constant and T i s temperature. This is because the probability that the simulation will spontaneously produce the structure of interest is on the order of exp(-E/kT). The umbrella sampling procedure was developed to overcome this difficulty.2 In this method, an auxiliary or “umbrella” potential is added to the potential energy function to hold the system in the neighborhood of conformational space near the structure of interest. Simulations are carried out with this biased potential, and the resulting thermodynamic parameters can be corrected to eliminate the effects of the umbrella potential. Umbrella sampling has been used in a variety of studies on liquids, solids, and small it has also been applied to studies of the structure, dynamics, and energetics of biological macrom01ecules.l~ In a free, unconstrained MD simulation one measures p ( x ) , the probability density along the coordinate of interest, x. The potential of mean force along x, W ( x ) ,is determined by W ( x ) = -kTln [ p ( x ) ] If an umbrella sampling potential, U ( x ) ,is introduced, the relationship between the potential of mean force and the biased probability distribution p * ( x ) is W ( x ) = -kT In [ p * ( x ) ] - U ( x ) - C (2) where C is a constant normalization factor. Another way of expressing the relationship between the biased and unbiased probability distributions = P exp[-PU(x)I (exp[-PUI
(3) where @ = (k7‘)-’ and the last term is a normalization constant, with the angular brackets indicating an average in the case of a simulation with an unbiased potential. In this note we discuss a problem that can occur in umbrella sampling when the umbrella sampling potential is incorrectly chosen. The auxiliary potential can push the model system into inappropriate regions of conformational space. This can lead to erroneous values for the calculated thermodynamic parameters. We describe a test for determining whether or not a particular umbrella sampling simulation contains such errors, and we discuss precautions that can be taken to avoid the problem. P*
)-I
Discussion From the standpoint of statistical mechanics, eq 2 and 3 are perfectly general. However, they provide no guidelines to how one should select a biasing potential, a shortcoming that was emphasized early in the development of the umbrella sampling procedure.* What is required is that the biasing potential produce a set of configurations that allows one to calculate correct ther(4) Levitt, M. Annu. Rev. Biophys. Bioeng. 1982, 11, 251. (5) Van Gunsteren, W. F.: Berendsen, H. J. C. Biochem. Soc. Trans. 1982, 10, 301. (6) McCammon, J . A.; Harvey, S . C. Dynamics of Proteins and Nucleic Acids; Cambridge University Press: Cambridge, 1987; Chapter 4. (7) Pangali, C.; Rao, M.; Berne, B. J. J . Chem. Phq’s. 1979, 71, 2975. (8) Ravishanker, G.; Mezei, M.: Beveridge, D. L. Faraday Symp. Chem. Soc. 1982, No. 17, 79. (9) Jorgensen, W. L. J . Phys. Chem. 1983, 87, 5304. ( I O ) Berkowitz, M.; Karim, 0.A.; McCammon, J. A,; Rossky, P. J. Chem. Phvs. Lett. 1984. 105. 577. ‘(1 1 ) Chandrasekhar, J.; Smith, S. F.; Jorgensen, W. L. J . A m . Chem. SOC. 1985, 107, 154. (12) Mezei, M.; Mehrotra, P. K.; Beveridge, D. L. J . A m . Chem. S O C . 1985, 107, 2239. (13) Northrup, S . H.; Pear, M. R.; Lee, C.-Y.; McCammon, J. A,; Karplus, M. Proc. ’Vatl. Arad. Sei. U.S.A. 1982, 79. 4035.
Harvey and Prabhakaran
Figure 1. Polar scatter plotI5of pucker amplitude (radial coordinate) and pucker phase angle (angular coordinate) for the 3000 ribose structures generated in a free molecular dynamics simulation. Each point represents one structure. The outer circle corresponds to a n amplitude Om = 70°, while the inner circle represents Om = 3 5 O .
modynamic averages over all coordinates other than the coordinate of interest, x. It is therefore necessary that the sampling distribution be broader than the usual Boltzmann distribution, which is why the resulting distribution is called an umbrella distribution.2 While it is often easy to choose an umbrella sampling potential that gives an appropriately broad sampling distribution along x , unexpected shifts in the sampling distributions along other coordinates can lead to errors. As an example of the problems that can arise, we present a case that occured during our investigations on potentials of mean force for sugar puckering in nucleic acids.I4 The puckering of fivemembered rings is traditionally defined in terms of the five endocyclic torsions of the ring.ls A planar ring would have all five torsions in the cis (6= 0) configuration. The deviation from ring planarity can be defined by two parameters, a puckering phase angle, P, defining which of the torsions is farthest from the cis configuration, and a puckering amplitude, 8,, which is a measure of the extent to which the torsions differ from the all cis planar configuration. The exact relationships are given elsewhere.14J5 We are examining the effects of different potential functions on the potential of mean force as a function of pucker phase angle. In the language of the previous paragraph, the coordinate of interest is pucker phase angle (x = P). Figure 1 shows a scatter diagram for 3000 structures generated during a 300-ps unconstrained MD simulation using a previously described p ~ t e n t i a l , ’ ~ , ’indicating ~.’~ the pattern of the probability ~ , most ’ ~ densely popdistribution function. As e ~ p e c t e d , ’ ~ Jthe ulated regions of conformational space are the northern (N) and southern (S) quadrants, corresponding to the C3’-endo (3E) and CY-endo (*E) configurations, respectively, and transitions between these regions are by way of the eastern (E) pathway, Le., 04’-endo
(OE). Our goal was to determine the potential of mean force W(P) over all values of pucker angle, 0 IP < 360’. We had previously observedL4that, with our potential function, the principal contribution to the repuckering energy barriers arises from the term describing the torsional energy (rotations about covalent bonds), C E ( & ) where , the sum is taken over the torsions defined by the five covalent bonds in the ring. We consequently chose an umbrella sampling potential that was equal to, and of opposite sign to, CE((p); in other words, we simply removed all torsional contributions to the potential function. This should allow the ring to repucker much more freely and, we hoped, to sample all of pucker space. (14) Harvey, S. C.; Prabhakaran, M. J . A m . Chem. SOC.1986, 108,6128. (15) Altona, C.; Sundaralingam, M. J . Am. Chem. Soc. 1982 104,8205. (16) Harvey, S . C.; Prabhakaran, M.; McCammon, J. A. Biopolymers 1985, 24, 1169. (17) Tung, C.-S.; Harvey, S. C.; McCammon, J. A. Biopolymers 1984, 23, 2173. (18) Levitt, M.; Warshel, A. J . A m Chem. SOC.1978, 100, 2607. (19) Olson, W. K.; Sussman, J. L. J . A m . Chem SOC.1982, 104. 270.
Umbrella Sampling Procedure
The Journal of Physical Chemistry, Vol. 91, No. 18, 1987 4801 0.5 0.4
E
I
I
l
=n2 t 0.3
(II
0.2
n
0.1 0.0
0
36
72
~OB
144
iao
Pucker Phase Angle (deg) Figure 2. Polar scatter plot for the 3000 ribose structures generated during a molecular dynamics simulation with the umbrella sampling potential described in the text.
The resulting biased probability distribution function for a 300-ps simulation is indicated by the scatter diagram in Figure 2. One of the goals-sampling over all values of +was obtained, but it is clear that the umbrella potential has also produced large and unexpected biases in the pucker amplitude. Whereas the unbiased simulation of Figure 1 has an average amplitude of about 40°, the amplitudes of ribose pucker in the biased simulation are much smaller. To determine whether or not these biases in amplitude would generate errors in the calculated potential of mean force, we calculated the probability density, p ( P ) , over the range -36' Q P Q 180° for the biased simulation, using the corrections specified by eq 3. This result was compared to p ( P ) for the unbiased simulation. If the umbrella sampling procedure is error free, the shapes of those two curves should be identical. But there are large differences between them (Figure 3), and the umbrella potential that we chose has introduced biases that cannot be corrected by the standard procedure. Let us examine these results with the criteria described in the first paragraph of this section. When we compare the biased simulation (Figure 2) with the unbiased simulation (Figure l ) , we see that the biased sampling distribution is broader along the coordinate of interest, P, as required, but it has been unacceptably shifted along a second coordinate (Om), over which statistical averages are to be taken. Almost no configurations with large puckering amplitude were generated, and one cannot make the appropriate thermodynamic corrections to a set of nonexistent configurations. These unacceptable results are almost certainly a consequence of the fact that P is not an internal coordinate of the system of interest, so it was not possible to choose an umbrella sampling potential that was a simple function of P and not of any other coordinates. Fortunately, the error was detectable for two reasons. First, there existed another coordinate, Om, along which we could examine the sampling distribution. Second, a comparison of the corrected p(P) from the biased simulation with p ( P ) from the unbiased simulation over values of P near the maximum in p(P) showed marked differences. We propose that these two tests should be a routine part of any M C or M D study using the umbrella sampling method.
Conclusions MC and MD simulations using the umbrella sampling procedure are subject to possible errors if the auxiliary umbrella potential prevents one from sampling appropriate regions of conformational space along coordinates other than the coordinate of interest. The following guidelines should be useful in minimizing the probability of such errors and in detecting them when they occur: (1) Whenever possible, the umbrella sampling coordinate should be chosen to coincide with one of the internal coordinates of the system, so that the umbrella potential can be defined by simply changing one of the terms in the potential energy function. An
Figure 3. Probability distribution for sugar pucker phase angle, in ranges of 1 8 O , for the free ribose simulation (open bars), compared with that predicted by the correction procedure of eq 3 for the simulation with an
umbrella sampling potential (crosshatched bars). example of this would be the rotational isomerization of a linear alkane, where one or more torsions is taken as the reaction co~ r d i n a t e . ~ *Somewhat ~~ more complex cases can sometimes be treated by choosing a suitable linear combination of internal coordinates. (2) There is a simple test for examining umbrella sampling simulations to detect errors in the resulting thermodynamic functions. The potential of mean force for the biased simulation (eq 2) should be identical, within statistical error, with that for the unbiased simulation (eq 1 ) over the range covered by the free simulation. An alternative and equivalent test is to compare the unbiased probability density with the corrected probability density (eq 3). Differences such as those shown in Figure 3 are unacceptable. (3) Since it is only possible to carry out the test described in the previous paragraph when the conformational space is examined in a single umbrella sampling simulation, the multiple window method7%I3 should be avoided whenever possible. (A single simulation also avoids the problems of determining the series of normalization constants that are required to piece together the results from the multiple window method.) To cover the entire range of the reaction coordinate in a single simulation, one can estimate the shape of the potential of mean force curve in preliminary investigations and then use an umbrella sampling potential that is of equal magnitude and opposite sign to the potential of mean force. This should produce a uniform probability distribution function in the biased simulation, optimizing sampling statistics. We have recently described a method for doing this, based on a Fourier analysis of the estimated potential of mean force.*j (4) In those cases where one can identify conformational coordinates other than the umbrella sampling coordinate, the probability distribution functions along those other coordinates should be examined. It should be verified that, in the umbrella sampling simulation, those distributions adequately sample that part of conformational space that is covered in an unbiased simulation. Differences such as those observed along the radial coordinate in Figures 1 and 2 are not acceptable. The one-dimensional Fourier analysis method mentioned above can be generalized to a multidimensional treatment to guarantee appropriate sampling statistics along all identifiable coordinates.
Acknowledgment. W e thank Dr. C. D. Bell for helpful discussions. This research was supported by grants from the NSF (DMB-8417001) and the NIH (GM-34015). (20) Rebertus, D. W.; Berne, B. J.; Chandler, D. J . Chem. Phys. 1979, 70, 3395. (21) Karplus, M.; Kushick, J. N. Macromolecules 1981, 14, 325. (22) Northrup, S . H.; Cumin, M. S. J . Phys. Chem. 1985, 89, 4707. (23) Bell, C. D.; Harvey, S. C. J . Phys. Chem. 1986, 90,6595.