Binless Estimation of the Potential of Mean Force - The Journal of

Sep 13, 2008 - This result provides a “binless” method for estimating the potential of mean force, Φ = −β−1 ln ρ, eliminating the need to c...
0 downloads 0 Views 195KB Size
12722

J. Phys. Chem. B 2008, 112, 12722–12729

Binless Estimation of the Potential of Mean Force Jodi E. Basner† and Christopher Jarzynski*,†,‡ Department of Chemistry and Biochemistry and Institute for Physical Science and Technology, UniVersity of Maryland, College Park, Maryland 20742 ReceiVed: April 25, 2008; ReVised Manuscript ReceiVed: August 7, 2008

For a system in thermal equilibrium, described by classical statistical mechanics, we derive an unbiased estimator for the marginal probability distribution of a coordinate of interest, F(x). This result provides a “binless” method for estimating the potential of mean force, Φ ) -β-1 ln F, eliminating the need to construct histograms or perform numerical thermodynamic integration. In our method, the distribution that we seek to compute is expressed as the sum of a reference distribution, F0(x)sessentially an initial guess or estimate of F(x)sand a correction term. While the method is valid for arbitrary F0, we speculate that an accurate choice of the reference distribution improves the convergence of the method. Using a model molecule, simulated both in vacuum and in solvent, we validate our proposed approach and compare its performance with the histogram and thermodynamic integration methods. We also discuss and validate an extension in which our approach is used in combination with a biasing force, meant to improve uniform sampling of the coordinate of interest. 1. Introduction When performing atomic-level simulations of complex molecular systems, one is often interested in determining the marginal probability distribution of a selected degree of freedom, or reaction coordinate, x. In equilibrium, at temperature T, this distribution F(x) is conveniently represented through the potential of mean force (PMF):1

Φ(x) ) -kBT ln F(x)

(1)

When the reaction coordinate is chosen well, Φ(x) distills the statistics of many degrees of freedom into a single free energy profile that provides useful qualitative and quantitative information, such as the existence and the heights of barriers between metastable conformations. An estimate of the marginal distribution F(x) can be obtained by constructing a histogram of reaction coordinate values sampled during an equilibrium simulation. Dividing the x-axis into bins of width ∆x, and tallying the number of counts in each bin, we have

F(xb)∆x ≈

nb N

(2)

where xb is a representative value of the reaction coordinate at the bth bin, nb is the number of times that bin was sampled, and N is the total number of samples generated during the simulation. In this approach, there is an inevitable competition between statistical and systematic errors: for a fixed total simulation time, by increasing the bin size ∆x, we improve the sampling of each bin, at the cost of a coarser, biased representation of F(x) (see section 4).

An alternative approach uses the thermodynamic integration identity

-

〈 〉

∂V ∂Φ (xk) ) ∂x ∂x

xk

(3)

which equates the derivative of the PMF with the mean force acting along the reaction coordinate, at a particular coordinate value xk. Using independent simulations to estimate the mean force at discrete points along a grid with spacing δx ) xk+1 xk, the PMF is then obtained by numerical integration of ∂Φ/ ∂x. The fineness of the grid spacing δx is limited by the amount of sampling, though to a lesser extent than the bin size ∆x in the histogram approach (again, see section 4). In this paper we introduce a binless method of estimating the marginal distribution F(x), i.e., one that does not rely on the explicit division of the reaction coordinate axis into finite intervals (bins or integration steps). In our method, we begin with an arbitrary, known reference distribution F0(x), which can be viewed as an initial guess for F(x). We then define a correction term, ζ(x) ) (F - F0)/F0, which quantifies the relative amount by which the actual distribution differs from the reference distribution. For a given value of the reaction coordinate, we derive an unbiased estimator of this correction term (eqs 10 and 17), which ultimately provides an explicit expression for the marginal distribution, F, at that value (eq 16). This approach in similar in spirit to that of ref 2, where a binless method for estimating radial distribution functions, in both uniform and nonuniform fluids, was developed. Our paper is organized as follows. We first develop the theoretical foundations of our method (section 2) and describe the model system we used to investigate its performance (section 3). In section 4 we present numerical results. Finally, we end with a brief discussion in section 5. 2. Theory

* Corresponding author. E-mail: [email protected]. † Department of Chemistry and Biochemistry. ‡ Institute for Physical Science and Technology.

We consider a classical system with N degrees of freedom, and we wish to construct the marginal probability distribution

10.1021/jp803635e CCC: $40.75  2008 American Chemical Society Published on Web 09/13/2008

Potential of Mean Force

J. Phys. Chem. B, Vol. 112, No. 40, 2008 12723

of a selected one of those degrees, when the system is in equilibrium at temperature T. Letting x denote the selected degree of interest and b y a vector specifying the remaining N 1 degrees, the equilibrium probability distribution to be in a microscopic conformation Γ ) (x, b y) is given by the Boltzmann-Gibbs formula

peq(x, b y ) ) Z-1e-βV(x,yb) Z)

∫ dΓ e-βV(x,yb) ) ∫ dx dyb e-βV(x,yb)

(4)

∫ dyb e-βV(x,yb) ∝ e-βΦ(x)

(6)

This equation defines the potential of mean force, Φ(x), up to an additive constant. Now let us decompose the potential energy into terms V0 and V1, where only the latter depends on the variables b y:

(7)

This decomposition is not unique, and the analysis that follows remains valid for any choice of V0 and V1 satisfying eq 7. However, it is useful to think of V0(x) as a reference potential that in some sense represents our best guess of Φ(x). From this perspective, the normalized reference distribution

F0(x) )

(5)

where V(x, b y) denotes the interaction potential energy, β ) (kBT)-1, and Z is the classical partition function. The marginal distribution of the degree of interest, x, is obtained from eq 4 by projecting out the remaining degrees, b y:

F(x) ) Z-1

V(x, b y ) ) V0(x) + V1(x, b y)

1 -βV0(x) e Z0

(8)

represents a guess of F(x). In terms of these quantities, eq 6 becomes, identically,

F(x) ) F0(x)

Z0 Z

∫ dyb e-βV (x,yb) ≡ F0(x)[1 + ζ(x)] 1

(9)

We now proceed to derive an unbiased estimator of the correction term, ζ. That is, we construct a function ζˆ (Γ; jx) such that

TABLE 1: Unbiased Estimator Algorithm To Determine Marginal Distribution

12724 J. Phys. Chem. B, Vol. 112, No. 40, 2008

ζ(xj) ) 〈ζˆ (Γ;xj)〉 ≡

Basner and Jarzynski

∫ dΓ peq(Γ) ζˆ (Γ;xj)

(10)

Here, jx denotes a fixed value of the reaction coordinate, at which we wish to evaluate the marginal distribution, Γ ) (x, b y) is a microscopic configuration of the system, and angular brackets specify an average over configurations sampled from the equilibrium distribution: 〈 · · · 〉 ) ∫dΓ peq · · · . The expression we derive below for ζˆ (Γ; jx) depends on the choice of reference distribution, F0; we will discuss how this choice affects the convergence of the estimate. Let x- and x+ denote the minimum and maximum of the range of values over which the coordinate x is defined. (These may be (∞ in the absence of a box or periodic boundaries.) Now choosing an arbitrary but fixed value jx, between x- and x+, let us introduce the function

u(x;xj) )

F0(xj) [ψ (x) - θ(x - jx)] F0(x) 0

(11)

(12)

-





∫ dyb ∫x

x+ -

where 〈 · · · 〉x ) ∫dy b peq(x, b y) · · · denotes an equilibrium average, with the coordinate of interest held fixed at a value x. Combining eq 18 with a similar identity for the reference distribution, ∂F0/ ∂x ) -βF0(∂V0/∂x) (see eq 8), we get

〈 〉

∂(F/F0) 1 ∂F F ∂F0 F ∂V1 ) - 2 ) -β ∂x F0 ∂x F0 ∂x F0 ∂x

∂ dx (upeq) ) 0 ∂x

x

(19)

We now use this result to evaluate the equilibrium average of the function ζˆ (Γ; jx) (eq 17): +

∫xx dx′ F0(x′)

(18)

x

-

is the cumulative distribution function associated with the reference distribution F0(x). We view u as a function of x, parametrized by the value jx. By construction, this function vanishes at the boundaries: u(x(; jx) ) 0. These boundary conditions give us the identity

1 ∂ (upeq) ) peq ∂x

〈 〉

∂V ∂F ) -βF ∂x ∂x

∫ dΓ peq(Γ) ζˆ (Γ;xj) ) ∫xx

where θ(x - jx) is the Heaviside step function, and

ψ0(x) ≡

An alternative derivation of our central result, eq 16, proceeds as follows. We first substitute eq 1 into the thermodynamic integration identity, eq 3, to obtain

dx F(x)〈ζˆ 〉x

(20)

〈 〉 〈 〉

F ∂V1 F0 ∂x x x+ F ∂V1 dx βψ0 (21) xF0 ∂x x

∫ dΓ peq(Γ) ζˆ (Γ;xj) ) ∫jxx

+

dx β



∂(F/F0) + ∂x ∂(F/F0) x+ dx ψ0 (22) x∂x

∫ dΓ peq(Γ) ζˆ (Γ;xj) ) - ∫jxx

+

dx



(13) Integrating by parts and using ∂ψ0/∂x ) F0, ψ0(x-) ) 0, and ψ0(x+) ) 1 (eq 12), we get

Using eqs 4, 7, and 11, we also have, after some algebra

∂V1 1 ∂ (upeq) ) F0(xj) - δ(x - jx) - βu eq ∂x ∂x p

(14)

Combining eqs 13 and 14 with the identity 〈δ(x - jx)〉 ) F(xj) then gives us

〈 〉

F(xj) ) F0(xj) - β u

∂V1 ∂x

[ 〈

) F0(xj) 1 + β

θ - ψ0 ∂V1 F0 ∂x

(15)

〉]

(16)

In other words, the unbiased estimator of ζ(xj) is

ζˆ (Γ;xj) ) β

[θ(x - jx) - ψ0(x)] ∂V1 (Γ) F0(x) ∂x

(17)

F(xj) -1 〈ζˆ (Γ;xj)〉 ) F0(xj)

(23)

which after rearrangement becomes F(xj) ) F0(xj)[1 + 〈ζˆ (Γ; jx)〉]. This provides an alternative proof that ζˆ (Γ; jx) is an unbiased estimator of the correction term ζ(xj) (eq 10) and (through eq 18) establishes an explicit connection with thermodynamic integration. The results derived above suggest a method to determine the marginal probability distribution at a given value jx: 1. Choose a reference distribution F0 (or a reference potential V0). 2. Construct the function ζˆ (Γ; jx) given by eq 17. 3. Numerically generate a trajectory Γt that samples the equilibrium distribution peq(Γ). 4. Construct the estimate of F(xj) using the unbiased estimator:

[

Fest(xj) ) F0(xj)[1 + ζest(xj)] ) F0(xj) 1 +

1 τ

∫0τ dt ζˆ (Γt;xj)] (24)

While the value jx enters this expression only through the Heaviside function θ, the choice of reference distribution F0 (or reference potential V0) enters in three places: explicitly in the denominator on the right side and through the quantities ψ0 and ∂V1/∂x (eqs 7 and 8).

where τ denotes the duration of the trajectory generated during the numerical simulation. A more detailed outline of this method is given in Table 1. As discussed in the following sections, we have illustrated this method and compared its performance with

Potential of Mean Force

J. Phys. Chem. B, Vol. 112, No. 40, 2008 12725

both the histogram and thermodynamic integration methods (eqs 2 and 3), for a simple model molecule. It is useful to consider, briefly, the situation in which the coordinate of interest is uncoupled from the other degrees of freedom, i.e., the potential energy function takes the form

V(x, b y ) ) Vx(x) + Vy(y b)

(25)

If we choose the reference potential V0 ) Vx, then from eq 7 we get V1(x,y b) ) Vy(y b); hence ∂V1/∂x ) 0. Equation 16 then gives us (correctly!)

F(xj) ) F0(xj)

(26)

Note that ζˆ is a zero-variance estimator in this situation: no actual sampling is required to obtain the marginal distribution, since ζˆ (Γ; jx) ) 0, identically (eq 17). Because the decoupling of the reaction coordinate (eq 25) is not likely to arise in any realistic computation of a potential of mean force, the consideration of this situation is primarily of pedagogical interest. It both provides a simple case for which we can verify immediately that eq 16 gives the correct result, and it illustrates the importance of the choice of reference potential on the convergence properties of the unbiased estimator, ζˆ . A different choice of V0 would give ∂V1/∂x * 0, and sampling would be needed to evaluate F(xj). This suggests, at least tentatively, that the more accurately the choice of reference potential matches the actual potential of mean force, the better the convergence of the binless method. While intuitively plausible, it is not clear to us whether this conclusion remains valid when the coordinate of interest is coupled to the other variables. In the approach that we have described up to this point, the choice of reference, F0 or V0, does not affect the actual sampling in step 3 given above: we use the reference a posteriori in the analysis to construct the unbiased estimate of F(x). In an alternative implementation, the reference potential F0(x) can be used as a bias during the simulation itself, to alter sampling probabilities, for instance to enhance transitions over large free energy barriers. In this scenario, a biasing force +∂V0/∂x, acting along the direction of the coordinate x, is included during integration of the equations of motion in a molecular dynamics simulation. This is equivalent to subtracting V0(x) from the potential energy function. The coordinate x is then sampled from a distribution that is the normalized ratio of the unbiased marginal distribution, F(x), and the reference distribution, F0(x). When taking this approach, the final expression for the marginal probability distribution that we wish to estimate is modified as follows, to account for the biasing:

[ 〈

∂V1(x, b y) 1 (θ(x - jx) - ψ0(x)) c ∂x g(xj) ≡ F0(xj) 1 + c

F(xj) ) F0(xj) 1 +

[

]

〉] ′

(27)

The notation 〈 · · · 〉′ denotes an average over configurations sampled during the biased simulation, and c is a normalization constant associated with the biased distribution, namely

1 ) c

) 〈1/F0〉 ∫ dx FF(x) 0(x)

(28)

Figure 1. Model molecule used in simulations. The dihedral angle shown is the degree of freedom of interest for all our simulations. See text for the definition of the internal potential energy.

The equivalence between eq 27 and our earlier expression (eq 16) can be established by noting that the biased and unbiased averages are related by the importance sampling identity, 〈O〉′ ) 〈O/F0〉/〈1/F0〉, where O is an arbitrary observable. An expression for c is determined self-consistently, by substituting eq 27 into eq 28:

c)

1 [1 (x+ - x-)

∫xx

+

-

dxj g(xj)]

(29)

where g ) 〈(θ - ψ0)∂V1/∂x〉′, as defined in eq 27. 3. Model Details and Simulation Methods To validate the binless method we have proposed, and to compare its performance with the histogram and thermodynamic integration methods, we have carried out constant-temperature simulations of a model molecule (Figure 1) both in the gas phase (isolated molecule) and in a Lennard-Jones solvent. The configuration of this molecule can be represented using the atomic Cartesian coordinates, Γm ) (r b1, b r2, b r3, b r4). Alternatively, we can use six translational and rotational degrees of freedom and six internal coordinates: three bond lengths bij specifying distances between adjacent atoms, two bond angles θijk (formed by the triplets 123 and 234), and one dihedral angle φ ) φ1234. We have applied the binless, histogram, and thermodynamic integration methods to compute the marginal probability distribution of the dihedral angle of this model molecule. Thus, the variable φ corresponds to the coordinate of interest x of section 2. [Since we have implicitly assumed Cartesian variables (x, b y) in section 2, see e.g. eq 5, a certain amount of care must be taken when extending to non-Cartesian, internal variables to account for Jacobian factors involved in the coordinate transformation. For the dihedral angle, φ, the relevant Jacobian factor is unity; thus our central result (eq 16) can be applied without modification.] The internal energy of our model molecule is given by the following expression:

V int )



bonds

1 b k (bij - beq)2 + 2



angles

1 θ k (θijk - θeq)2 + 2

5

∑ Cn cos(φ - π)n + fVLJ(r14) (30)

n)0

where r14 ) |r b1 - b r4|. The first three terms on the right represent contributions from the three bonds, two bond angles, and the dihedral angle, taken from the molecular dynamics software GROMACS.3 Here kb, kθ and beq, θeq are the force constants and equilibrium bond length/angle, respectively, for a pair or triplet of atoms, and the Cn ’s are the Ryckaert-Bellemans parameters for the dihedral term. The final term in eq 30 describes a Lennard-Jones interaction between atoms 1 and 4

12726 J. Phys. Chem. B, Vol. 112, No. 40, 2008

[( σr ) - ( σr ) ] 12

VLJ(r) ) 4ε

6

Basner and Jarzynski

(31)

scaled by a dimensionless factor, f ) 0.1. This last term couples the dihedral angle, φ, to the other internal coordinates, the three bij’s and the two θijk’s. If we had excluded this term (i.e., set f ) 0), then we would be in the “uncoupled” situation briefly discussed in section 2 (eq 25). The values assigned to the parameters appearing in eqs 30 and 31 were chosen to give barriers along the dihedral coordinate that are greater than the thermal energy (kBT), but not large enough to prevent adequate sampling. Simulations with solvent also included Lennard-Jones interactions between pairs of solvent particles and between solvent particles and each of the four atoms in our model molecule. The values of all assigned parameters are given in the Appendix. During a given simulation of this model molecule, an initial relaxation interval was followed by a longer sampling interval (see Appendix for details), during which we collected the following data every 0.01 ps: (1) the value of the coordinate of interest, φ, and (2) the partial derivative of the potential with respect to this coordinate, ∂V/∂φ. These data were stored for later, postsimulation analysis to determine the potential of mean force, using the histogram, thermodynamic integration, and binless methods. To compute ∂V/∂φ, the Cartesian forces on the atoms (-∂V/∂r bi) were taken from GROMACS and linearly transformed, according to the general expression4,5

∂V ) ∂φ

M

b r

M

∑ ∂∂Vbr i · ∂φi ) ∑ ∂∂Vbr i · ((br i - br b) × be bp) (32) i)1

i)1

The sum is taken over the M atoms affected by the rotation of the rotable base-partner (bp) bond. In our model atom 2 is the “base” and atom 3 is the “partner” in the bp bond for the dihedral angle φ; thus b rb ) b r2, and b ebp is a unit vector along that bond. Rotations around this bond are defined by holding fixed the positions of atoms 2, 3, and 4. Therefore only atom 1 is affected by the rotation (M ) 1). The first step in the implementation of our method is the specification of the reference potential, V0. We tested our method using three choices of the reference potential, shown along with the corresponding reference distributions in (Figure 2). The first choice was a flat reference potential

V (a) 0 (φ) ) 0

(33)

corresponding to an equiprobable distribution of dihedral angles. For this choice our unbiased estimate (eq 16) takes the simple form

F(φ j) )

〈[

] 〉

1 φ ∂V j) + θ(φ - φ 2π 2π ∂φ

(34)

Our second choice of reference potential was the dihedral term in eq 30: 5

V (b) 0 (φ) )

∑ Cn cos(φ - π)n

n)0

(35)

The functions F0(φ) and ψ0(φ) were computed by onedimensional numerical integration of the Boltzmann factor e-βV0 along the variable φ and were stored in memory. Our third reference potential included both the dihedral term and a “slice” of the 1-4 LJ interaction term: (b) eq V (c) 0 (φ) ) V 0 (φ) + fVLJ(r14(φ))

(36)

where the distance req 14(φ) is defined by performing a dihedral rotation, while holding the three bond lengths and two bond angles fixed at their equilibrium values (beq and θeq). As shown in Figure 2, the addition of the Lennard-Jones term raises the potential energy by a significant amount in the region near φ ) 0 (2π). This range of dihedral angles describes the cis conformation, in which steric interactions between atoms 1 and 4 become relevant; if we had kept an unscaled Lennard-Jones interaction (f ) 1 in eq 30), then the repulsion between these atoms would have created a ∼50kBT barrier at φ ) 0, which in turn would have caused the sort of sampling difficulties that we wanted to avoid in our simulations. 4. Results We first performed a 100 ns simulation of the model molecule in vacuum. This simulation was divided into 1000 trajectories of duration 0.1 ns each. The marginal probability distribution F(φ) was then estimated for each of these trajectories, using five methods: (a)-(c) our unbiased estimator, with all three choices of reference potential described in section 3; (d) the histogram method; and (e) thermodynamic integration. For (a)-(c), F(φ) was evaluated at 100 equally spaced values of the dihedral angle; when implementing (d) and (e), 100 bins or integration steps were used. Figure 3a shows the resulting estimates (averaged over all 1000 trajectories). The excellent agreement among these methods serves as a proof-of-principle illustration of the binless method. Figure 3b compares the relative efficiencies of the methods, by showing the standard deviations of the various estimates. The best convergence is achieved with (c) and (e) and the worst with (a) and (d). The differences are substantial. For instance, at φ ) π the standard deviation of the histogram estimate is about 60 times that of the thermodynamic integration or binless estimates, using V (c) 0 , roughly indicating a factor 602 ) 3600 in relative convergence rates. Comparing the choices of reference potential, we see noticeably improved convergence as we proceed from the (b) simplest, flat reference, V (a) 0 , to the dihedral reference, V 0 , and then to the inclusion of the Lennard-Jones interaction term in V (c) 0 . Not surprisingly, the better our initial guess of the potential of mean force, the more rapidly the binless estimate converges. We next simulated our model molecule immersed in a Lennard-Jones solvent, generating 40 trajectories, each of duration 1 ns. Figure 4a shows the estimates of the dihedral probability distribution, F(φ), using the histogram method, thermodynamic integration, and the binless method with the dihedral-plus-Lennard Jones reference potential, V (c) 0 . The three methods agree and reveal that the solvent has little effect on the distribution of the dihedral angle. Figure 4b compares efficiencies. While the differences here are not as great as in the unsolvated case (Figure 3b), the binless method converges more rapidly than thermodynamic integration, which in turn is more efficient than the histogram approach. At a given value of the dihedral angle, φ j , the variance of the estimator ζˆ (Γ; φ j ) is broader for the solvated system than for the unsolvated system. This is due to the factor ∂V1/∂φ appearing

Potential of Mean Force

J. Phys. Chem. B, Vol. 112, No. 40, 2008 12727

Figure 2. Reference potentials V(a, b, c) and corresponding distributions.

Figure 3. Comparison of methods for the model molecule in vacuum. Note the logarithmic scale on the vertical axis on the right.

Figure 4. Comparison of methods for the model molecule in Lennard-Jones solvent.

in our expression for ζˆ (eq 17). In the presence of solvent, this factor includes terms corresponding to the forces that the solvent particles exert on the dihedral angle; these terms contribute to

the variance of ζˆ . Thus, even when the reference distribution F0 is a very accurate guess of the marginal distribution F, considerable sampling might be required to establish that 〈ζˆ 〉

12728 J. Phys. Chem. B, Vol. 112, No. 40, 2008

Basner and Jarzynski

Figure 5. Comparing of convergence of the binless method both with and without biasing.

≈ 0. An alternative to explicit simulation of solvent particles is the use of implicit solvent. In addition to speeding up the simulation itself, the use of implicit solvent would presumably reduce the variance of ζˆ , improving the convergence relative to simulations with explicit solvent. Finally, we have performed simulations with a force bias, in the manner described at the end of section 2. We used the dihedral reference potential, eq 35, and we added a force +∂V0/ ∂φ during the simulations. This had the effect of removing barriers associated with the dihedral potential and improving the sampling of low-probability dihedral angles. We ran 500, 0.1 ns simulations of an unsolvated molecule and 12, 1 ns simulations of a solvated molecule, with other parameters as in the simulations described above. From these trajectories, we used eq 27 to estimate the potential of mean force and found agreement (data not shown) with the estimates of F(φ) previously obtained using the unbiased simulations. We also computed the standard deviations of the estimates obtained from the biased simulations. Figure 5 compares these standard deviations with the corresponding results for unbiased runs using the binless method. Thus, the solid and dashed lines were computed with the same (dihedral) reference potential, F0, but in one case this reference was used only in the postsimulation analysis (eq 16), while in the other it was also used to bias the simulation. For the unsolvated case, we see that the force bias provides substantial improvement in the convergence of the estimate. For the solvated case the difference is less evident: while convergence of the lower probability dihedral angles (the cis conformation) seems to be enhanced by biasing, at other angles the results are mixed. The issues discussed above affecting convergence for the solvated system are clearly also relevant when implementing a force bias. 5. Discussion We have derived and validated a “binless” method for estimating a marginal probability distribution, F(x), or equivalently a potential of mean force, Φ(x). Our approach, which can be implemented with either molecular dynamics or Monte Carlo sampling, relies on an unbiased estimator of the marginal distribution (eq 16) and requires an initial guess or reference distribution F0(x). In comparisons using a simple model molecule, we find that the binless method generally converges faster than a direct, histogram construction of F(x) and performs

on par with a thermodynamic integration estimate of Φ(x). A flexible feature of our method is that the reference distribution F0 can be used either in the a posteriori evaluation of data sampled from the equilibrium state (eq 16) or else to bias the sampling during the simulation (eqs 27 and 29). Of course, as with any method based on sampling, the presence of free energy barriers between important regions of phase space can significantly slow down the convergence of the desired average; moreover, if the system gets trapped in some region of phase space, then the method can seem to converge but will in fact produce an incorrect result. Both the histogram and the thermodynamic integration methods are susceptible to systematic as well as statistical error. With the histogram method, the optimization of bin size is complicated by the inevitable competition between these two sources of error: with a larger bin size we reduce statistical error but increase systematic error. [For instance, the histogram method underestimates the value of the marginal distribution at a local maximum, F(xmax), since in reality it provides an average of F(x), smoothed over a finite bin width containing the point xmax.] With thermodynamic integration the situation is more subtle. The force-averaging is performed within bins of finite width, which generates some systematic error. (Alternatively, constrained molecular dynamics simulations can be performed at a fixed number of values of the reaction coordinate.1,6-8) The numerical integration of these estimates, to get a potential of mean force, introduces further systematic error, since the integration steps are the finite bin widths. However, since the statistical error is (roughly) unaffected by the size of the bins,9 a small bin size can be chosen, provided each bin is sampled at least once during the simulation. The binless method we introduce is susceptible only to statistical (and not systematic) error, since it uses an identically unbiased estimator of F at a given value of the coordinate of interest. Moreover, the number of points at which we evaluate the marginal distribution is essentially uncoupled from the amount of sampling. Once we have generated the simulation datas specifically, once we have stored the values of x and ∂V/∂x along the trajectoryswe can use these data to estimate the marginal distribution at however many values of the reaction coordinate we desire, since the estimation of F at any one value, jxi, is carried out independently of the estimation at any other value, jxj.

Potential of Mean Force We conclude by identifying several issues that merit further investigation. First, we have assumed in our presentation that the coordinate of interest, x, is a microscopic degree of freedom, such as a Cartesian coordinate of a given atom or the dihedral angle of our model molecule. It is often of interest, however, to obtain the PMF along a generalized coordinate, ξ(Γ), a collective variable defined in terms of many microscopic degrees of freedom.6,7,1 An open question is whether the method that we propose in this paper can be extended to the case of generalized coordinates. Second, since convergence seems to correlate with how accurately the reference distribution F0 approximates the marginal distribution F, the binless approach seems well suited to adaptive methods.7,9,10 Specifically, the running estimate of F can be used to update F0 in an iterative manner. We have not yet tested this idea or investigated how to implement it optimally. Alternatively, one might imagine using our method in combination withsrather than instead ofsthermodynamic integration. [We thank an anonymous referee for this suggestion.] That is, the potential of mean force would first be estimated from a preliminary simulation, using thermodynamic integration; then this estimate would be used as the reference potential V0(x) in a production run using the binless method. In the model system we have chosen to illustrate our method, the presence of the solvent has little effect on the actual potential of mean force, despite relatively strong solute-solvent interactions. It will be important to test our method using more complicated systems, particularly ones for which the solvent substantiallyalterstheequilibriumsamplingofsoluteconfigurations. Finally, while we have developed the binless method for a single coordinate of interest, it would of course be interesting to extend its applicability to a multidimensional set of statistically correlated coordinates (x1, x2, · · · ). Acknowledgment. We gratefully acknowledge useful discussions with Saar Rahav and financial support provided by the University of Maryland. Appendix. Simulation Details For the bond, angle, and dihedral terms in eq 30 we used the following values, adapted from the Amber force field:11 beq ) 0.152 60 nm, kb ) 259 408.0 kJ/(mol nm2), θeq ) 109.5°, kθ ) 334.720 kJ/(mol rad2), and C(0, · · · , 5) ) 2.845 12, -4.100 32, 16.736 00, 2.510 40, -16.736 00, and 0.000 00 kJ/mol. All four atoms in our model molecule were assigned LennardJones parameters  ) 0.457 73 kJ/mol and σ ) 0.339 967 nm. Solvent molecules were assigned parameters s ) 0.77 57 kJ/

J. Phys. Chem. B, Vol. 112, No. 40, 2008 12729 mol and σs ) 0.3542 nm. As discussed in section 3, the only pairwise Lennard-Jones interaction among the four atoms of our molecule was the scaled interaction between atoms 1 and 4. However, in the simulations with solvent, each of the four atoms interacted with every solvent molecule, and the solvent molecules interacted with one another, via the pairwise LennardJones interaction as determined via the usual combination rule.12 For the simulations of the model molecule in vacuum, starting from an initial configuration, we performed energy minimization using steepest descent, until the maximum force on any atom was below 1 kJ/(mol nm). Using the Andersen thermostat13 at T ) 289 K and a 0.1 fs time step for leapfrog Verlet integration, an equilibration stage of 30 ns was followed by 100 ns of sampling. For the simulations of a solvated molecule, we used 850 solvent atoms. Starting from an initial configuration, we performed energy minimization using steepest descent, until the maximum force on any atom was below 10 kJ/(mol nm). Using the Andersen thermostat at T ) 289 K and a 0.1 fs time step for leapfrog Verlet integration, an equilibration stage of 1 ns was followed by 40 ns of sampling, at constant pressure P ) 107.43 kJ/(mol nm3). This sampling interval was divided into 40 trajectories, each of 1 ns duration. Periodic boundary conditions were used with a cutoff radius of 1.6 nm (shift function to 1.7 nm) for the Lennard-Jones calculations. References and Notes (1) Darve, E. In Chipot, C.; Pohorille, A., Eds.; Free Energy Calculations; Springer: Berlin, 2007. (2) Adib, A. B.; Jarzynski, C. J. Chem. Phys. 2005, 122, 014114: 1014114:6. (3) Spoel, D. V. D.; Lindahl, E.; Hess, B.; Groenhof, G.; Mark, A.; Berendsen, H. J. Comput. Chem. 2005, 26, 1701–1718. (4) Ponder, J.; Richards, F. J. Comput. Chem. 1987, 8, 1016–1026. (5) Abe, H.; Braun, W.; Noguti, T.; Go, N. Comput. Chem. 1984, 8, 239–247. (6) den Otter, W. K. J. Chem. Phys. 2000, 112, 7283–7292. (7) Darve, E.; Pohorille, A. J. Chem. Phys. 2001, 115, 9169–9183. (8) Trzesniak, D.; Kunz, A.; van Gunsteren, W. ChemPhysChem 2007, 8, 162–169. (9) Fasnacht, M.; Swendsen, R.; Rosenberg, J. Phys ReV E 2004, 69, 056704. (10) Barducci, A.; Bussi, B.; Parrinello, M. Phys. ReV. Lett. 2008, 100, 020603:1–4. (11) Duan, Y.; Wu, C.; Chowdhury, S.; Lee, M.; Xiong, G.; Zhang, W.; Yang, R.; Cieplak, P.; Luo, R.; Lee, T.; Caldwell, J.; Wang, J.; Kollman, P. J. Comput. Chem. 2003, 24, 1999–2012. (12) van der Spoel, D.; Lindahl, E.; Hess, B.; van Buuren, A.; Apol, E.; Meulenhoff, P.; Tieleman, D.; Sijbers, A.; Feenstra, K.; van Drunen, R.; Berendsen, H. www.gromacs.org, 2005. (13) Frenkel, D.; Smit, B. Understanding Molecular Simulation; Academic Press: New York, 2001.

JP803635E