8166
J. Phys. Chem. B 2010, 114, 8166–8180
New Estimators for Calculating Solvation Entropy and Enthalpy and Comparative Assessments of Their Accuracy and Precision Matthew A. Wyczalkowski, Andreas Vitalis, and Rohit V. Pappu* Department of Biomedical Engineering and Center for Computational Biology, Washington UniVersity in St. Louis, One Brookings DriVe, Campus Box 1097, St. Louis, Missouri 63130 ReceiVed: April 5, 2010
We present two new methods for estimating the entropy and enthalpy decomposition of free energy calculations. These methods are based on temperature derivatives of the Bennett Acceptance Ratio and the Multistate Bennett Acceptance Ratio estimators, respectively. We test the accuracy of these new estimators using a simple one-dimensional model. A detailed assessment of their performance is reported by studying the solvation of N-methylacetamide. Finally, we quantify the free energies of solvation for 11 model compounds using the OPLS-AA force field and a variation of this force field. Thermodynamic decompositions of these calculated free energies are obtained to highlight the utility of these quantities for refining force field parameters by comparing computed free energies and their decompositions to their experimental counterparts. 1. Introduction Free energies of solvation play a fundamental role in selfassembly processes such as protein folding, binding, and aggregation. The free energy of solvation (∆F at fixed volume) quantifies the change in free energy of a system associated with the transfer of a solute from the gas phase into solvent.1 This quantity provides a thermodynamic assessment of the interaction between a solute and its surroundings. One can obtain thermodynamic decompositions of this free energy into energy and entropy (or, in the isothermal-isobaric ensemble, of the free energy ∆G into enthalpy and entropy). From a theoretical/ computational perspective, such decompositions have at least two important applications. First, they provide thermodynamic insights regarding the mechanism of solvation.2,3 The sign of ∆F is governed by the combination of entropy and energy contributions; knowing the relative favorability of each component allows us to develop mechanistic models for the solvation process.4 Second, there is growing interest in improving the accuracy of molecular mechanics parameters to ensure better agreement with experimental free energies of solvation.5-9 Thermodynamic decompositions are experimentally accessible quantities, and reproducing them accurately in a simulation provides an additional point of validation for force field parameters.10 These goals were recognized in early work in the field.11 N-Methylacetamide (NMA), a secondary amide that mimics the peptide backbone, provides a pertinent example. At 25 °C, the free energy of solvation for NMA (∆G ) -10 kcal/mol) is the result of a balance between favorable enthalpy (∆H ) -20 kcal/mol) and negative entropy (-T∆S ) 10 kcal/mol),12 so that the large negative entropy offsets at least half the favorable enthalpy. Graziano has proposed that this entropic contribution derives mainly from the excluded volume penalty associated with the creation of a solute-sized cavity in water.13 Interestingly, NMA has been a challenging system for free energy of solvation calculations. Force field parametrization paradigms that recapitulate experimentally measured free energies of solvation for a range of model compounds, including * Corresponding author. E-mail:
[email protected].
primary amides,6,8 nevertheless yield ∆F estimates for NMA that are 2.0-2.5 kcal/mol lower in magnitude8 than experimental assessments.14 Attempts to remedy this discrepancy have focused on reparameterizing the partial charges for NMA.8 However, while such reparameterizations may yield the correct free energy of solvation, they may do so by overestimating the favorable enthalpy and entropic penalty. This, in turn, may lead to incorrect behavior when such parameters are used in simulations of polypeptides with explicit representation of water. Access to reliable thermodynamic decompositions can play an important role in guiding the refinement of force field parameters. Calculations of free energies of solvation have advanced to the point where computational errors are on par with experimental ones.6,15 In contrast, obtaining thermodynamic decompositions with similar robustness has proven challenging.16 Part of the problem is the inherent difficulty of such calculations, which suffer from statistical errors that are 10-100 times larger than errors in free energy calculations.17 Another problem is methodological in nature. While all estimators are formally correct and should yield the same converged value, some are better able to take advantage of limited data and yield adequately converged results from shorter simulations. Free energy calculations can take advantage of estimators such as the Bennett Accepance Ratio (BAR)18 and Multistate Bennett Acceptance Ratio (MBAR)19 methods that are bridging estimators because they utilize information from multiple simulations, thus improving the reliability of estimates obtained from finite data. Currently available entropy and enthalpy estimators utilize information from only one simulation at a time, thereby discarding data that could be used to improve the estimate. In this work, we address the issue of methodology and present the derivation and assessment of two estimators for thermodynamic decompositions. These estimators are based on temperature derivatives of the BAR and the MBAR free energy estimators and, like their free energy counterparts, utilize information from multiple simulations. We assess the accuracy of the new estimators using a simple model for which the free energy and the corresponding entropy and energy changes are known exactly. Next, we calculate the thermodynamic decomposition of the solvation free energy of
10.1021/jp103050u 2010 American Chemical Society Published on Web 05/26/2010
New Estimators for Solvation Entropy and Enthalpy
J. Phys. Chem. B, Vol. 114, No. 24, 2010 8167
NMA in water. An analysis of the statistical errors associated with this calculation illustrates the relative advantages of the BAR and MBAR entropy estimators over existing estimators but also serves to highlight the difficulties in obtaining thermodynamic decompositions which are of comparable quality to the free energy estimates. Finally, to demonstrate the utility of thermodynamic decompositions of free energies of solvation, we perform such calculations for 11 model compounds using two force fields, OPLS-AA20 and a modified OPLS-AA force field mOPLS-AA,21 that differ in their Lennard-Jones parameters. Ten of the model compounds are analogues of amino acid side chains and were chosen because experimental data are available for the thermodynamic decompositions of their free energies of solvation. The Lennard-Jones parameters in the mOPLS-AA force field were designed for the ABSINTH implicit solvation model21 model which uses free energies of solvation for model compounds to define reference states for computing free energies of solvation of complex solutes in a mean field solvent. Thus, an additional motivation of the comparative calculations is to quantify the degree of agreement between experimental data and those obtained using mOPLS-AA. Here, the primary quantity of interest is ∆F, as these values alone are used in the ABSINTH model, and the thermodynamic decompositions do not enter the implicit solvation paradigm. 2. Methods In calculations designed to estimate the free energy of solvation, the solute molecule is transferred from the gas phase into the solvent using the device of the Kirkwood coupling parameter, λ.22 Introduction of the λ parameter23 into potential functions allows one to vary the degree of coupling between specific molecules in a dense fluid. λ modulates solute-solvent interactions, with the limits λ ) 0 and λ ) 1 corresponding to the pure solvent and solvent plus inserted solute, respectively; intermediate values of λ interpolate between these limits. In our approach, independent equilibrium simulations are performed for a series of λ values drawn from a prescribed λ schedule. These simulations were performed in the canonical (NVT) ensemble. Different estimators were used to analyze data gathered from independent simulations and compute the Helmholtz free energies of solvation, ∆F, and their decompositions into entropies (∆S) and energies (∆U). Operationally, we define
δF ) F(λ1) - F(λ0)
Acceptance Ratio (BAR) and Multistate Bennett Acceptance Ratio (MBAR) estimators for the more general case of arbitrary temperatures and simulation lengths per replica. For a given solute-solvent system, let x denote the position coordinates of all the atoms in the system. The potential energy for this configuration x is written as V(x, λi), where λi refers to a specific value of the coupling parameter. In concise notation, this potential energy is denoted as Vi(x), and the work associated with changing the scaling parameter from λ0 to λ1 for a given x is
δV(x) ≡ V1(x) - V0(x)
2.1.1. Direct Energy Estimator. The direct estimator calculates ∆U as the difference in ensemble averaged potential energies between two states using the relationship U ) 〈V〉. Specifically, with λ0 ) 0 and λK ) 1
∆U ) 〈VK〉λK - 〈V0〉λ0
(4)
The direct estimator, by construction, does not utilize information obtained from simulations that use intermediate λ values. One can define δU(λi) ) 〈Vi+1〉i+1 - 〈Vi〉i to construct the energy profile as a function of λ. 2.1.2. Free Energy Perturbation. The free energy perturbation (FEP) method24 extrapolates information obtained in one simulation to calculate the free energy difference at another λ value. Because this estimator is directional, two independent free energy estimatessforward and reversescan be constructed for a pair of simulations
δFF ) -β-1ln〈exp(-βδV)〉0
(5a)
δFR ) +β-1ln〈exp(+βδV)〉1
(5b)
A temperature derivative of these equations yields the forward and reverse FEP (also known as thermodynamic perturbation) energy estimators10
δUF ) +
〈V1 exp(-βδV)〉0 - 〈V0〉0 〈exp(-βδV)〉0
(6a)
δUR ) -
〈V0 exp(+βδV)〉1 + 〈V1〉1 〈exp(+βδV)〉1
(6b)
(1)
as the change in free energy between any two adjacent λ values, λ1 and λ0, and ∆F is the cumulative sum of the δF across the λ schedule. δU, ∆U, δS, and ∆S are defined in a similar fashion. 2.1. Free Energy and Enthalpy Energy Estimators. There exist different methods to estimate the free energy changes across a λ schedule. For each of these, an analytical temperature derivative of the δF equation yields a formula for either δU or δS. With two of these thermodynamic quantities known, the third is determined from the relationship
(3)
2.1.3. BAR. The BAR free energy estimator for two simulations of equal length and temperature may be written as
〈g*〉 + 0 ) 〈g*〉 - 1
(7a)
(2)
g*+ ≡ [1 + exp(+βδV - βδF)]-1
(7b)
The exception to this is the direct method, which estimates only ∆U and has no corresponding free energy estimator. The estimators in this section are written in terms of dimensional quantities with the same temperature and length assumed for all simulations. Section 5.1 derives expressions for the Bennett
g*- ≡ [1 + exp(-βδV + βδf)]-1
(7c)
δF ) δU - TδS
with
These equations yield an implicit formula for δF and are solved iteratively. Details of this and subsequent derivations are found
8168
J. Phys. Chem. B, Vol. 114, No. 24, 2010
Wyczalkowski et al.
in Section 5, as are formulas for arbitrary simulation lengths Ni and temperatures Ti. Here, the * symbol indicates that all Ni and Ti are equal. A temperature derivative of eq 7 yields an explicit equation for the entropy
TδS )
R*0 - R*1 - δF 〈g*g + *〉 - 0 + 〈g*g + *〉 - 1
(8a)
where
R*0 ) 〈g*V + 0〉0 - 〈g*〉 + 0〈V0〉0 + 〈g*g + *δV〉 0
(8b)
R*1 ) 〈g*V - 1〉1 - 〈g*〉 - 1〈V1〉1 - 〈g*g + *δV〉 1
(8c)
and the converged BAR δF estimate from eq 7 is used in eq 8a. 2.1.4. MBAR. Shirts and Chodera19 recently introduced the MBAR free energy estimator which uses information from all K simulations along a λ schedule to construct an estimate of the free energies Fi all at once, rather than calculating each pairwise δF individually. With an equal number of observations drawn from all simulations, and all simulations performed at the same temperature, the MBAR estimator may be written as (see Section 5) K
∑ 〈µ*〉i j ) 1 for i ) 1, 2, ..., K
(9a)
j
where for convenience we define
(9b)
K
∑ exp[βFk - βVk(x)] k
Equation 9 is a set of K equations which are solved simultaneously for all Fi, with F0 held fixed at 0. The temperature derivative of eq 9 (see Section 5) yields the entropy estimator K
TSi ) -Fi +
K
∑ 〈µiVi〉j + ∑ (〈µiVj〉j - 〈µi〉j〈Vj〉j) j
+
j
K
K
K
K
k
j
k
j
∑ (Fk + TSk) ∑ 〈µiµk〉j - ∑ ∑ 〈µiµkVk〉j for i ) 1, 2, ..., K (10)
Like eq 9, eq 10 is a set of K equations which yield Si. In practice, the Fi are obtained first, then held fixed as the Si are calculated iteratively. 2.2. Validation of Estimators. We tested the accuracy of the new BAR and MBAR estimators by calculating the thermodynamic quantities associated with a simple, onedimensional model system. The Sun model is a dimensionless potential given as25
Vsun(x, λ) ) x4 - 16(1 - λ)x2
This model is useful as a test case because it can be integrated analytically to obtain the exact values of ∆F, ∆U, and T∆S associated with changing λ from 0 to 1. For this system, ∆F quantifies the free energy change associated with a transition from a bistable state to a state with a single minimum (see Figure 1). With β ) 0.02, analytical integration (see Section 5.4) yields the nondimensional thermodynamic quantities ∆F ) 65.8878, ∆U ) 53.1957, and T∆S ) -12.6921. With the exact values of the thermodynamic quantities known, we can unambiguously assess the validity and relative accuracy of different estimators. Since the quantities ∆S and ∆U are not independent, we analyze the estimators in terms of ∆F and ∆U alone. The error F is defined for M independent estimates ∆Fi as εsun
∑ M
exp[βFi - βVi(x)]
≡ µ*(x) i
Figure 1. Sun model potential, eq 11, as a function of coordinate x for various λ between 0 and 1. The potential is symmetric about x ) 0.
(11)
F εsun )
(∆Fi - ∆Fexact)2 /M
(12)
i
where the exact solution ∆Fexact is known; the corresponding U , is defined in analogous fashion. error for ∆U, εsun A simple Monte Carlo sampler was used to construct a data set of 1 million observations for each of 11 λ values, spaced uniformly between 0 and 1. The inverse nondimensional temperature β, used in the sampler as well as the thermodynamic estimates, is 0.02. For each observation at a given λ value, foreign energies corresponding to all other λ values were also computed. A second data set, consisting of 3 λ values (λ ) 0, 0.5, 1) was constructed by discarding data from the 11 λ data set. Being one-dimensional, sampling is rapid, and sequential observations, output every 10 Monte Carlo steps, are fully decorrelated. F U and εsun were calculated For both ∆F and ∆U, the errors εsun as a function of the size of the data set. One hundred consecutive samples, each of size ranging from 1 to 104 observations, were drawn from the entire data set of 1 million observations. ∆Fi and ∆Ui, with i ) 1-100, were calculated for each sample using different estimators, and errors were calculated with eq 12. The performance of the estimators in terms of this error was compared for a range of sample sizes, and we consider the 11 λ and 3 λ schedules separately. Figure 2 plots the error as a function of sample size for ∆F and ∆U at the two different schedules. Most estimators perform similarly, yielding a series of coincident lines of a uniform slope on the log-log plot. As a result, BAR, MBAR, and the FEP estimators yield approximately the same errors for the same sample size N.
New Estimators for Solvation Entropy and Enthalpy
J. Phys. Chem. B, Vol. 114, No. 24, 2010 8169
F U Figure 2. Sun model errors εsun and εsun (see eq 12), which quantify the absolute error of the various estimators for different λ schedules. The 3 λ schedule is (0.0, 0.5, 1.0), and the 11 λ schedule is (0.0, 0.1, ..., 0.9, 1.0).
Increasing the number of λ values from 3 to 11 results in lower errors, so that a Sun model simulation with the finer λ schedule needs to be only about 1/3 as long to achieve the same absolute error. For the 3 λ schedule, both FEP estimators, particularly the FEP R variant, tend to have larger errors than either BAR or MBAR. While the direct ∆U method yields errors on par with others for the coarse 3 λ schedule, its errors are significantly larger for the 11 λ case. The direct ∆U results are identical in both plots because they only use data from the λ ) 0 and λ ) 1 simulations. Figure 2 demonstrates the accuracy of the newly derived BAR and MBAR ∆U estimators. To establish consistency between the εsun and standard errors, these same data were analyzed with block averaging using 100 blocks.26 We find that the standard and εsun errors are virtually identical, and all the conclusions drawn from the Sun model εsun error analysis hold for a block average analysis as well. The one-dimensional Sun model was used to test the accuracy of different free energy and entropy-energy estimators. It also demonstrates that, in theory, it is possible to obtain ∆S and ∆U estimates that are as accurate as ∆F, at least for simple onedimensional systems. Next, we turn to assessing the performance of the new estimators in the context of solvation calculations. 2.3. Calculation of Free Energies of Solvation. We used the homegrown CAMPARI package27,28 to calculate free energies of solvation for different model compounds. Two sets of calculations were carried out for each of the model compounds: the first set used parameters from the OPLS-AA molecular mechanics force field20,29 and the second the modified OPLSAA (mOPLS-AA) parameters, as detailed below. The MBAR analysis for both the free energies and its decompositions was performed with a modified version of the freely available PyMBAR package.19 2.3.1. Scaled Atomic Potentials. The solvation calculation takes place in two legs: first λLJ is scaled from 0 to 1 with λC
) 0, and then λC increases from 0 to 1 with λLJ ) 1. For convenience, in the context of the solvation calculations, we will refer to a single combined λ ) λLJ + λC parameter increasing from 0 to 2. The potential energy of the entire atomic system V(x, λ) may be written as bonded nonbonded Vtot(x, λLJ, λC) ) VW (x) + VSbonded(x) + VW-W (x) nonbonded nonbonded (x, λLJ, λC) + VS-S (x, λLJ, λC) + VW-S
(13) where W and S refer to solvent and solute atoms, respectively. The bonded and solvent-solvent interactions are not modified by λ, whereas the scaled solute-solvent and solute-solute potentials are given by
nonbonded VW-S (x, λLJ, λC) )
NS
NW
i
j
∑ ∑ [VLJ(rij, λLJ) + VC(rij, λC)] (14a) NS
nonbonded VS-S (x, λLJ, λC)
)
∑ ∑ [VLJ(rij, λLJ) + VC(rij, λC)] i
j
(14b) Here, NS and NW denote the number of solute and solvent atoms, respectively, and rij is the distance between atoms i and j. The index j in the VS-S term iterates over all solute atoms that participate in nonbonded interactions with atom i, typically those which are separated by four or more covalent bonds. Scaling of solute-solute interactions allows for a simplified implementation of reaction field electrostatics, as discussed in the next
8170
J. Phys. Chem. B, Vol. 114, No. 24, 2010
Wyczalkowski et al.
TABLE 1: Parameters for the OPLS-AA and mOPLS-AA Force Fieldsa OPLS-AA
mOPLS-AA
atom
σ
ε
σ
ε
q
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 H1 H2 H3 H4 H5 H6 N1 N2 O1 O2 O3
3.500 3.500 3.500 3.500 3.500 3.500 3.500 3.500 3.550 3.550 3.750 2.000 2.000 2.000 2.000 2.500 2.420 3.250 3.250 2.960 3.120 3.070
0.066 0.066 0.066 0.066 0.066 0.066 0.066 0.066 0.070 0.070 0.105 0.000 0.000 0.000 0.000 0.030 0.030 0.170 0.170 0.210 0.170 0.170
3.300 3.300 3.300 3.300 3.300 3.300 3.300 3.300 3.000 3.000 3.000 2.000 2.000 2.000 2.000 2.000 2.000 2.700 2.700 2.700 3.000 3.000
0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.025 0.025 0.025 0.025 0.025 0.025 0.150 0.150 0.200 0.150 0.150
0.020 0.145 0.085 -0.180 -0.120 -0.060 -0.240 -0.065 0.150 -0.115 0.500 0.380 0.300 0.418 0.435 0.060 0.115 -0.760 -0.500 -0.500 -0.683 -0.585
a Column 1 shows the atom types as defined in Figure 3. Columns 2-4 show the Lennard-Jones parameters that make the two force fields different from each other, and column 5 shows the partial charges for each atom, which are identical between the two force fields. The units for different parameters are as follows: σ in [Å], ε in [kcal/mol], and q in [e].
section. VLJ is a modified soft-core Lennard-Jones potential30 given by
VLJ(r, λLJ) ) 4εijλLJ(R-2 - R-1)
(15)
2 R ) 0.5(1 - λLJ ) + r6 /σij6
(16)
where
and
VC(r, λC) ) λC
qiqj r
(17)
The combined Lennard-Jones parameters σij and εij are constructed with geometric combination rules (σij ) (σiσj)1/2, εij ) (εiεj)1/2) for the OPLS-AA parameters and LorentzBerthelot rules (σij ) (σi + σj)/2, εij ) (εiεj)1/2) for the mOPLSAA parameters. These parameters, along with the atomic partial charges qi, are listed for both force fields in Table 1 and are discussed in Section 2.3.2. The solvent water was modeled using the rigid, three-site TIP3P model of Jorgensen et al.31 2.3.2. Force Field Parameters. Eleven model compounds were used in this study; ten of these are analogues of amino acid side chains, and one, NMA, is an analogue of the repeating unit in a polypeptide backbone. For all of these solutes, experimental data are available for free energies of solvation as well as their decompositions into entropies and enthalpies. These compounds constitute a subset of those
studied by the Pande group.6,32 The free energies of solvation and their decompositions obtained from experiment are listed in Table 2. Figure 3 illustrates the molecular structures of the 11 model compounds used in this work. Many of the molecules in this set are small and rigid solutes. Bond lengths and angles were held fixed in the equilibrium values prescribed by the all-atom OPLS-AA force field. We used an internal coordinate stochastic dynamics integrator to sample conformational space for the solute-solvent system, details of which are presented in Section 2.3.3. Vitalis and Pappu21 modified the Lennard-Jones parameters in the OPLS-AA force field to develop the mOPLS-AA parameters for Monte Carlo simulations of polypeptides using the ABSINTH implicit solvent model where sampling is carried out in torsional space using fixed bond lengths and bond angles. Table 1 details the Lennard-Jones parameters and charges associated with each atom type. In general, the σ parameters in mOPLS-AA are smaller than in OPLS-AA. The mOPLS-AA well-depth parameters are larger than the OPLS-AA values for carbon atoms and smaller for nitrogen, oxygen, and nonpolar hydrogen atoms. Additionally, polar hydrogen atoms have nonzero interaction parameters in mOPLS-AA but are “sizeless” in OPLS-AA. In the OPLS paradigm, parameters for LennardJones potentials are derived to reproduce the properties of neat liquids, with the σ parameters aiming to capture liquid densities and the ε parameters heats of vaporization.29 The modified σ and ε mOPLS-AA parameters are based on the estimates of Pauling,33 which reproduce crystal packing and heats of fusion of neutral model compounds. 2.3.3. Simulation Details. For each model compound, we performed two sets of solvation calculations, one based on the OPLS-AA and the other on the mOPLS-AA force fields. These comparative calculations allow us to demonstrate how free energies of solvation and their decompositions into entropies and energies are useful for assessing the accuracy of different force field parameters. Integrator. Stochastic dynamics simulations based on the integration of Langevin equations of motion were used for conformational sampling. The center-of-mass translations and rigid body rotations of all molecules, as well as torsion angles for the flexible solutes, were sampled using the impulse integrator of Skeel and Izaguirre.34 For all of the degrees of freedom, we used a frictional coefficient of γ ) 5 ps-1 and a time step of 2.0 fs. Integrator stability was tested by assessing the convergence of a system of TIP3P water to a target temperature of 298 K, as well as the reproduction of accurate distribution functions and response functions such as the heat capacity. Details of the adaptation of the algorithm of Skeel and Izaguirre for internal coordinate Langevin dynamics will be published elsewhere. In all simulations, an equilibration period of 1.6 ns was followed by a 8 ns production run, with energy output every 0.2 ps, for a total of 40 000 energy observations. For all λi values, the autocorrelation time of the potential energy is less than 0.4 ps. Cutoffs. All simulations were performed using periodic boundary conditions in cubic cells 32 Å to a side. Twin-range cutoffs were employed for both the Lennard-Jones and electrostatic interactions: the full set of interactions was computed if the pairwise distance was below 10 Å; for distances in the 10-14 Å range, we computed the interactions once every four time steps using neighbor lists which were also updated once every four steps. The reaction field method35 was used to calculate corrections due to truncation of long-range electrostatic
New Estimators for Solvation Entropy and Enthalpy
J. Phys. Chem. B, Vol. 114, No. 24, 2010 8171
Figure 3. Visual representation of the molecules for which solvation thermodynamic quantities were calculated. Table 2 provides expanded descriptions of the molecules, and the atom labels reference specific atomic force field parameters in Table 1. Visualization with VMD.58
TABLE 2: Experimental Data for Free Energies of Solvation and Their Thermodynamic Decompositionsa experimental model compd acetamide ethyl alcohol isobutane methane methyl alcohol n-butane N-methylacetamide 4-methyl phenol propionamide n-propane toluene
name ACA EOH IBU MET MOH NBU NMA PCR PPA PRP TOL
AA
∆G
Asn Thr Leu Ala Ser Ile BB Tyr Gln Val Phe
-9.71 -5.0541 2.3241 2.0041 -5.1041 2.0841 -10.0913 -6.1442 -9.3854 1.9641 -0.8841 54
∆H
T∆S
-16.32 -12.05 -4.83 -2.75 -10.25 -5.66 -19.36 -14.18 -17.45 -4.83 -8.10
-6.61 -7.00 -7.15 -4.75 -5.16 -7.74 -9.27 -8.05 -8.07 -6.78 -7.21
NVT (∆V)P 42
55.82 55.1242 83.1055 37.3042 38.2542 76.60.55 74.0442 103.2356 71.5442 67.0042 106.8657
∆NVT
∆U
T∆S
2.259 2.231 3.363 1.510 1.548 3.100 2.996 4.178 2.895 2.711 4.325
-18.58 -14.28 -8.19 -4.26 -11.80 -8.76 -22.36 -18.36 -20.35 -7.54 -12.43
-8.87 -9.23 -10.51 -6.26 -6.70 -10.84 -12.26 -12.22 -10.97 -9.49 -11.54
a
The table also shows the conversions of the decompositions into the NVT ensemble. The units for free energy, enthalpy, and entropy (∆G, ∆H, and T(∆S)P) are kcal/mol, and units for partial molar volumes (∆V)P are cm3/mol. All values are reported in Ben-Naim convention.41 We applied a correction factor of ∆NVT ) (∆V)PTR/κ (see eq 20a) to convert experimental ∆H and T∆S into NVT ensemble values, namely, ∆U and T∆S. Here, TR/κ ) 0.04047 [kcal/cm3] was obtained from experimentally obtained properties of liquid water43 at 298 K.The Helmholtz free energy ∆F in the NVT ensemble is numerically equal to Gibbs free energy ∆G. Columns 1 and 2 list the model compound names and abbreviations (see Figure 3) used in this work. Ten of the eleven model compounds are analogues of amino acid side chains, and three-letter codes of these amino acids are listed in column 3; N-methylacetamide is an analogue of the main repeating unit of polypeptide backbones and is denoted as “BB” in column 3.
interactions. As all of our solutes are neutral, use of the reaction field is appropriate.36 Analytical corrections were applied to account for the Lennard-Jones effects beyond this distance, as discussed in Section 2.3.5. Ensemble and System Preparation. All solvation simulations were performed in the NVT ensemble, as stochastic dynamics based on the Langevin dynamics formalism reliably yields converged statistics from the canonical ensemble.37 Experimental data, specifically the enthalpy and entropy decompositions, were converted to the NVT ensemble values prior to comparison to
results from simulations. Details of these corrections are discussed in the next section. For liquid water with a target density of 1 g/cm3, the central simulation cell is comprised of 1086 TIP3P water molecules. In preparing simulations with the solute, the number of water molecules removed was based on the van der Waals volume of the single solute and neat water density. We evaluated the error associated with this method by performing two additional simulations on the NMA system, one with an additional water molecule and a simulation with one fewer water molecules. It
8172
J. Phys. Chem. B, Vol. 114, No. 24, 2010
Wyczalkowski et al.
TABLE 3: Uncorrected Simulation Data and Corrections for Free Energy and Entropy/Enthalpy Calculated Using MBARa uncorrected
self correction
name
FF
∆F
∆U
T∆S
∆F
∆U
T∆S
LJLR
ACA ACA EOH EOH IBU IBU MET MET MOH MOH NBU NBU NMA NMA PCR PCR PPA PPA PRP PRP TOL TOL
OPLS-AA mOPLS-AA OPLS-AA mOPLS-AA OPLS-AA mOPLS-AA OPLS-AA mOPLS-AA OPLS-AA mOPLS-AA OPLS-AA mOPLS-AA OPLS-AA mOPLS-AA OPLS-AA mOPLS-AA OPLS-AA mOPLS-AA OPLS-AA mOPLS-AA OPLS-AA mOPLS-AA
-37.828 -9.202 -2.961 -5.198 4.102 0.827 2.286 1.793 0.777 -4.974 4.825 1.105 -22.052 -7.487 -6.259 -5.942 -38.154 -9.085 4.983 1.180 4.442 -2.904
-49.612 -18.244 -15.653 -14.913 -11.183 -11.241 -4.341 -4.577 -8.052 -13.777 -9.921 -11.403 -38.062 -21.368 -25.126 -22.358 -52.924 -21.906 -7.580 -9.163 -12.590 -19.462
-11.784 -9.042 -12.692 -9.714 -15.285 -12.068 -6.627 -6.371 -8.829 -8.803 -14.746 -12.507 -16.009 -13.882 -18.867 -16.416 -14.770 -12.822 -12.563 -10.344 -17.032 -16.559
-29.356 -0.384 1.240 -0.160 1.544 -0.600 0.000 0.000 5.109 -0.068 2.244 -0.322 -15.722 -0.227 -0.863 -0.347 -29.795 -0.132 2.490 -0.334 5.157 -0.334
-29.364 -0.384 1.196 -0.195 1.210 -0.624 0.000 0.000 5.109 -0.068 1.894 -0.511 -15.766 -0.212 -0.766 -0.347 -29.853 -0.322 2.352 -0.339 5.157 -0.334
-0.008 -0.000 -0.044 -0.035 -0.334 -0.024 0.000 0.000 -0.000 -0.000 -0.350 -0.189 -0.045 0.015 0.097 -0.000 -0.058 -0.191 -0.138 -0.004 -0.000 0.000
-0.077 -0.056 -0.059 -0.059 -0.087 -0.074 -0.027 -0.021 -0.039 -0.034 -0.087 -0.074 -0.100 -0.073 -0.137 -0.104 -0.097 -0.073 -0.067 -0.056 -0.126 -0.091
a
LJLR is the long-range Lennard-Jones correction.
was found that the errors for both ∆F and ∆U associated with this pressure perturbation were less than the standard errors associated with the simulation. 2.3.4. λ Schedule. The λ schedule for the free energy of solvation calculations was constructed as described previously.38 The rate at which the error of a free energy estimate between two equilibrium simulations decreasessin particular, the error associated with FEP estimatorssis governed by the Fermi swap probability, defined as
〈pswap〉 )
〈〈
1 1 + exp(βδV[x0] - βδV[x1])
〉〉
(18)
0 1
where x0 and x1 are drawn from the λ0 and λ1 ensembles, respectively, and δV(x0) - δV(x1) is the change in the combined potential energy of both systems due to a replica exchange swap. Free energy calculations converge more slowly and have larger errors if the swap probability is low, and we will show that this relationship applies to all ∆F and ∆U estimators. The convergence of such calculations may be improved by performing additional simulations where the swap probability, as calculated from preliminary simulations, is low; equivalently, simulations may be added where var.(∂V/∂λ) is large.38 Shenfeld et al.39 have obtained a similar relationship between the replica exchange swap probability and the convergence rate of ∆F and related both to a measure of thermodynamic length.40 For this work, the λ schedule was refined by adding simulations in the interval 0.2 e λ e 1.0, based on the analysis of a preliminary set of simulations for NMA, giving a 29 λ schedule. This schedule, used for all model compounds unless indicated otherwise, consists of simulations at λ values (0.0, 0.1, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, and 2.0). 2.3.5. Correction Terms and ConWersions between Ensembles. The estimates of thermodynamic quantities obtained from simulations are not immediately comparable to experimental results. Corrections are needed to account for method-
ological artifacts, and results obtained from different ensembles must be converted to a common ensemble to permit comparison. Also, corrections may need to be applied to experimental results to bring them into line with standard states. Self Corrections. In our implementation, the interactions between atoms of a solute are scaled, and the contributions that these self-interactions make to the free energy estimates must be taken into account. This is done by repeating the simulations in the absence of solvent, which yields self-correction terms that are subtracted from the thermodynamic quantities obtained in the presence of solvent. Self-correction terms for all simulations are listed in Table 3. Long-Range Corrections to Lennard-Jones Interactions. These corrections account for errors in the potential energy of the system imposed by cutoffs in the Lennard-Jones potential. Taking into account only the r-6 term (because it decays more slowly than the r-12 term), the correction is calculated for the entire solute molecule as26
Vlr )
∑ i
8πNW ε σ6 3Vrc iW iW
(19)
for each solute atom i, where εiW and σiW represent the combined Lennard-Jones parameters between atom i and TIP3P water oxygen; NW is the number of water molecules; and V is the volume of the box. With rc ) 14 Å, the correction term for each solute molecule is relatively small, varying between -0.02 kcal/mol for methane simulated using the mOPLS-AA force field and -0.13 kcal/mol for toluene simulated using the OPLS-AA force field. This correction is added to both ∆F and ∆U, and its values are listed in Table 3. Standard State Corrections. Experimentally obtained free energies of solvation and their decompositions are reported using a variety of standard states. To make them comparable to computational values, they must be converted to the Ben-Naim standard state convention.41 Such corrections generally need to be applied to the free energies, entropies, and enthalpies.5
New Estimators for Solvation Entropy and Enthalpy
J. Phys. Chem. B, Vol. 114, No. 24, 2010 8173
In practice, we find that free energies reported in the literature frequently have the corrections already applied or else the transfer process is defined such that no such corrections are necessary. Cabani et al.42 report standard state free energies, and only an additional term of β-1(1 - TR) ) 0.549 kcal/mol needs to be added to tabulated ∆H and T∆S values,16 where R is the coefficient of thermal expansion for liquid water. Ensemble Corrections. The solvation calculations, performed in the NVT ensemble, yield values for the Helmholtz free energy ∆F as well as the energy ∆U and the entropy, denoted in this ensemble as (∆S)V. Experimental values are typically given for the isothermal-isobaric (NPT) ensemble, yielding the Gibbs free energy ∆G, the enthalpy ∆H, and the constant pressure entropy, (∆S)P. To compare computational results to experimental values, we converted the latter values into estimates for the NVT ensemble. In the thermodynamic limit, the Helmholtz free energy obtained in an NVT simulation at a given volume is numerically equal to the Gibbs free energy obtained in an NPT simulation whose average volume (and pressure) is the same as the NVT simulation.10 As a result, we compare the numerical values of experimental ∆G and numerical ∆F directly. However, thermodynamic decompositions of free energies are ensemble dependent, and the NVT values can be obtained from the NPT quantities using the relationships5
∆U ) ∆H - (∆V)P
TR κ
T(∆S)V ) T(∆S)P - (∆V)P
(20a)
TR κ
(20b)
where (∆V)P is the partial molar volume of the solute; R ) V-1(∂V/∂T)P,N is the thermal expansion coefficient; and κ ) -V-1(∂V/∂P)T,N is the isothermal compressibility. In principle, it is equally valid to convert either the computational quantities into the NPT ensemble or the experimental values into NVT. However, the calculated TIP3P properties R and κ needed for the NPT to NVT conversion are subject to large errors,31,43,44 and the partial molar volume (∆V)P is only known experimentally, making this conversion unreliable. Conversely, the values of R and κ for liquid water are known to a relatively high degree of accuracy. Thus, we convert all experimental results to the NVT ensemble for comparison to calculated values. For liquid water, R ) 25.7 × 10-5 and κ ) 45.8 × 10-6.43 2.3.6. Error Estimates. Estimates of thermodynamic quantities based on simulations of finite length are subject to two types of error.45 The first is statistical error, which concerns issues of precision and reproducibility. This error quantifies the expected distribution of results given repeated simulations. The second type of error is bias error, which relates to the accuracy or correctness of the estimate. In certain instances (such as nonlinear averages), bias error is observed as a systematic dependence of the mean on the data size.46 In the context of evaluating estimator performance, the bias error is an important consideration, but its calculation generally requires knowledge of the exact value of the quantity of interest, which is typically unavailable. The exact free energies and their decompositions are unknown for the Hamiltonians used in solvation calculations as this requires computing multidimensional configurational integrals; hence, we calculate only statistical errors. We estimate the
TABLE 4: N-Methylacetamide Results Calculated with OPLS-AA Parameters for Various Estimators, Along with Standard Errors Calculated with Block Averaginga Direct FEP F FEP R BAR MBAR
∆F [kcal/mol]
∆U [kcal/mol]
T∆S [kcal/mol]
s -6.422 ( 0.018 -6.453 ( 0.023 -6.430 ( 0.019 -6.431 ( 0.018
-20.751 ( 0.689 -22.167 ( 0.857 -22.251 ( 1.081 -22.542 ( 0.826 -22.396 ( 0.768
s -15.745 ( 0.857 -15.798 ( 1.082 -16.111 ( 0.826 -15.965 ( 0.769
a These results include the effects of applying self- and long-range corrections to Lennard-Jones interactions.
standard error using block averaging over 100 blocks.26 For simplicity, we calculate the block averages of ∆F, ∆U, and T∆S as if they were independent quantities. In fact, only two of these three are independent, but in practice the standard errors of ∆U and T∆S are very similar. 3. Results 3.1. Evaluation of Estimators. Section 2.1 provided an overview of different estimators for calculating free energies based on data from equilibrium simulations for every λ value along a given schedule. Known entropy/energy estimators were presented, and we derived two new entropy estimators based on temperature derivatives of the BAR and MBAR free energy estimators. The new estimators were judged to be accurate based on results from the simple one-dimensional Sun model. In this section, we evaluate the performance of all the estimators for solvation calculations. We first tested the performance of the estimators by investigating the solvation process for N-methylacetamide (NMA). Table 4 lists the free energy of solvation for NMA and its decomposition using the various estimators, with errors estimated by block averaging. All ∆F estimators are relatively well converged, with statistical errors smaller than 0.02 kcal/mol. The errors in ∆U, however, are 50-80 times larger. The Direct method appears to have the lowest statistical error, but its energy estimate is significantly different from the consensus of the other estimators. This suggests that significant bias error may underlie the smaller statistical error observed for this estimator, but no conclusions may be drawn from this analysis alone. Of the remaining estimators, the MBAR method has the lowest ∆U statistical error, followed in order of increasing error by the BAR, FEP F, and FEP R methods. For all estimators but the Direct method, the quantities ∆F and ∆U correspond to the summation of δF and δU estimates; the errors in ∆F and ∆U are an unknown combination of the errors of δF and δU. To better understand the nature of the different estimators, and to identify regions of the λ schedule that might contribute to the collective error, we analyzed δF and δU as a function of λ. Figures 4(a) and (b) plot δF/δλ and δU/δλ versus λ, respectively. The quantity δF/δλ is the finite difference approximation to ∂F/∂λ, the derivative of the potential of mean force,47 and δU/δλ is its energy decomposition. ∆F and ∆U then correspond to the areas underneath the curves of Figures 4(a) and (b), respectively. In Figure 4(a), we find that for all of the estimators δF is converged to the point that differences between them are not discernible by eye, an observation consistent with the results of Table 4. This is not the case for the energy plot, Figure 4(b), which shows large differences in the degree of smoothness of the δU/δλ profile. The direct estimator has a serrated contour, suggesting a large degree of statistical error in the estimate of δU between adjacent simulations. The FEP estimators are
8174
J. Phys. Chem. B, Vol. 114, No. 24, 2010
Wyczalkowski et al.
Figure 4. Plots of δF/δλ and δU/δλ for the solvation of NMA as obtained with various estimators. These quantities are the slope of the potential of mean force and its energy decomposition, along the λ schedule. For the free energies (panel a), all the estimators are consistent, and the differences are nearly imperceptible. Significant differences among the estimators are observed for the energies (panel b), where the direct estimator is subject to large statistical error, while the MBAR estimator yields a smooth profile.
Figure 5. δF error diagnostics for the solvation of NMA under different λ schedules. The average Fermi swap probability (eq 18) is calculated from all data and plotted using the red scale on the right. The different λ schedules, detailed in the text, are indicated with tick marks on the top and bottom axes. All errors tend to be large in regions of the λ schedule where the swap probability is low.
somewhat smoother, followed by BAR, and the δU/δλ profile for the MBAR estimator is by far the smoothest of all. The “smoothness” observed in Figure 4 is quantified for δF in Figure 5 and for δU in Figure 6, where the statistical error of the estimates, obtained by block averaging, is plotted against λ. These figures illustrate the performance of estimators, as quantified by statistical error, along various λ schedules. Four different schedules are considered: the 29 λ schedule, as defined previously, has a uniform spacing of δλ ) 0.1 plus additional simulations in the region 0.2 < λ < 1.0, so that δλ ) 0.05 in that region. The remaining plots have a uniform schedule across the λ range: δλ ) 0.1 for 21 λ; δλ ) 0.2 for 11 λ, and δλ ) 0.5 for 5 λ. The latter three schedules are constructed from a subset of data obtained in the full 29 λ simulation. In Figure 5, the MBAR δF estimator generally has the lowest standard error, followed in increasing order by BAR, FEP F,
and FEP R. This error for all estimators increases considerably with coarsening of the λ schedule. The same observations hold for the δU results in Figure 6, even though the statistical error of the δU estimates is roughly 100 times larger than for δF. Here, the direct method is worse than all other estimators for the 29 λ schedule, but it improves in relation to the other methods as the schedule coarsens. The error of the direct method in fact stays constant, while other techniques deteriorate with fewer λ points. In addition to the standard error, Figures 5 and 6 also plot the average Fermi swap probability which, as described in Section 2.3.4, is a useful diagnostic of the rate of convergence of thermodynamic calculations. The swap probability is predictive of the statistical error of all estimators, for both δF and δU (with the exception of the direct method). All estimators
New Estimators for Solvation Entropy and Enthalpy
J. Phys. Chem. B, Vol. 114, No. 24, 2010 8175
Figure 6. δU error diagnostics for the solvation of NMA under different λ schedules; see Figure 5 for details. Like the δF errors in Figure 5, δU errors for all estimators (except direct) are large in regions of the λ schedule where the swap probability is low. Additional simulations in these regions increase the swap probability and significantly reduce estimator errors. The direct estimator is independent of the schedule and becomes worse in relation to the other estimators as additional simulations are added to the λ schedule.
converge more quickly in regions of the λ schedule where the swap probability is large. 3.2. Application to Model Compounds. In the preceding sections, we introduced and tested new methods for computing entropy/energy decompositions for free energies of solvation that are based on the BAR and MBAR estimators. Here we carry out comparative calculations to quantify differences in ∆F, ∆S, and ∆U for 11 model compounds using the OPLSAA and mOPLS-AA force fields, which differ in the LennardJones parameters used to describe intrasolute and solute-solvent van der Waals interactions, as described in Section 2.3.2. In addition, we also compare the computed values to experimental data. This allows us to dissect the differences between two different paradigms for parametrizing van der Waals interactions, given a common set of the partial charges. Table 5 shows the values obtained for ∆F, ∆U, and T∆S for each of the 11 model compounds using the two force fields, and Table 2 gives the corresponding experimental data. Figure 7 provides a visual comparison of the results for the two force fields, and Figure 8 plots the computational values versus experimental data. To quantify the similarity between calculated and experimental values for ∆F, we computed global measures of similarity. For the OPLS-AA force field, the rms difference between experiment and simulation is 1.6, with the maximal unsigned deviation 3.66 for NMA, and the minimal unsigned deviation 0.04 for toluene. For the mOPLS-AA force field, the rms difference is 1.2, the maximal unsigned deviation 2.75 for NMA, and the minimal unsigned deviation 0.05 for ethanol. Data in Table 5 and Figure 7(a) indicate that the OPLS-AA parameters systematically yield more positivesor less favorablesfree energies of solvation than those obtained using the mOPLS-AA parameters. As a result, agreement between com-
TABLE 5: Results of the Solvation Calculations Calculated Using MBAR with Long-Range Lennard-Jones and Self Corrections Applied name
FF
∆F
∆U
T∆S
ACA ACA EOH EOH IBU IBU MET MET MOH MOH NBU NBU NMA NMA PCR PCR PPA PPA PRP PRP TOL TOL
OPLS-AA mOPLS-AA OPLS-AA mOPLS-AA OPLS-AA mOPLS-AA OPLS-AA mOPLS-AA OPLS-AA mOPLS-AA OPLS-AA mOPLS-AA OPLS-AA mOPLS-AA OPLS-AA mOPLS-AA OPLS-AA mOPLS-AA OPLS-AA mOPLS-AA OPLS-AA mOPLS-AA
-8.549 ( 0.017 -8.875 ( 0.014 -4.260 ( 0.015 -5.098 ( 0.013 2.471 ( 0.016 1.353 ( 0.015 2.259 ( 0.008 1.772 ( 0.007 -4.371 ( 0.013 -4.941 ( 0.011 2.493 ( 0.018 1.353 ( 0.014 -6.431 ( 0.018 -7.333 ( 0.015 -5.533 ( 0.023 -5.700 ( 0.020 -8.455 ( 0.018 -9.027 ( 0.015 2.426 ( 0.013 1.458 ( 0.013 -0.841 ( 0.017 -2.661 ( 0.018
-20.324 ( 0.829 -17.916 ( 0.700 -16.908 ( 0.667 -14.777 ( 0.741 -12.480 ( 0.721 -10.691 ( 0.665 -4.368 ( 0.443 -4.599 ( 0.357 -13.200 ( 0.650 -13.744 ( 0.581 -11.902 ( 0.727 -10.966 ( 0.695 -22.396 ( 0.768 -21.230 ( 0.789 -24.497 ( 1.033 -22.116 ( 0.892 -23.167 ( 0.780 -21.658 ( 0.902 -9.999 ( 0.650 -8.881 ( 0.545 -17.872 ( 0.919 -19.220 ( 0.769
-11.776 ( 0.830 -9.042 ( 0.700 -12.648 ( 0.667 -9.679 ( 0.742 -14.951 ( 0.719 -12.045 ( 0.665 -6.627 ( 0.443 -6.371 ( 0.356 -8.829 ( 0.650 -8.803 ( 0.579 -14.396 ( 0.725 -12.319 ( 0.695 -15.965 ( 0.769 -13.896 ( 0.788 -18.964 ( 1.038 -16.416 ( 0.890 -14.712 ( 0.781 -12.631 ( 0.901 -12.425 ( 0.650 -10.339 ( 0.547 -17.031 ( 0.919 -16.559 ( 0.769
putational ∆F values and experimental data improves with the mOPLS-AA parameters for the polar compounds (acetamide, ethanol, methanol, p-cresol, N-methylacetamide, and propionamide). For nonpolar compounds, specifically linear hydrocarbons, the computational ∆F values obtained using the mOPLSAA parameters are consistently more negative than experimental values, whereas with OPLS-AA parameters they are generally more positive. Finally, ∆F for toluene is too negative with the mOPLS-AA parameters, while the OPLS-AA value matches that from experiment. These findings suggest that the Lennard-Jones parameters in mOPLS-AA lead to systematically more favorable
8176
J. Phys. Chem. B, Vol. 114, No. 24, 2010
Wyczalkowski et al.
Figure 7. Comparison of thermodynamic quantities calculated using OPLS-AA and mOPLS-AA parameters for all compounds listed in Table 5 using the MBAR estimators, with error bars indicating standard errors.
Figure 8. Comparison of calculated thermodynamic quantities to experimental values. Both OPLS-AA and mOPLS-AA results were calculated with MBAR.
free energies of solvation. On average, the σ values are smaller in mOPLS-AA versus OPLS-AA, and one may hypothesize that the observed trends are due to a diminution in overall hydrophobicity. Access to reasonably accurate entropy/energy decompositions allows us to directly test this hypothesis. Panels (b) and (c) of Figures 7 and 8 show the decompositions of ∆F into T∆S and ∆U, and these data are tabulated in Table 5. As with the case of NMA discussed previously, the statistical errors for the decompositions are about 2 orders of magnitude larger than they are for ∆F, complicating the interpretation of results. The source of these errors is discussed in the next section. Nevertheless, a few systematic trends are revealed in the comparisons of the two sets of decompositions, both with respect to each other and to experimental data. With the exception of toluene, the ∆U values are systematically more negative for OPLS-AA than for mOPLS-AA. Conversely, in all cases, the OPLS-AA parameters make more negative entropic contributions to the free energy of solvation. Comparing the computational values for T∆S to experimental data, we find that for the OPLS-AA force field the rms difference is 3.92, the maximal unsigned deviation 6.74 (p-cresol), and the minimal unsigned deviation 0.37 (methane). For the mOPLSAA force field, the rms difference with experiment is 2.30, the maximal unsigned deviation 5.02 (toluene), and the minimal unsigned deviation 0.11 for methane. This latter model compound illustrates the central difference between the two force fields: methane is the closest approximation to a spherical molecule, and the entropic penalty for creating a methane-sized spherical cavity in TIP3P water is smaller for the mOPLS-AA parameters.
4. Discussion 4.1. Thermodynamic Estimators. The simplicity of the direct estimator is appealing, and its statistical error measure in Table 4 is relatively small. However, the ∆U estimate deviates significantly from the consensus of all other estimators, suggesting caution in gauging its performance by this analysis alone. The deficits of the direct energy estimator for the related problem of obtaining a ∂U/∂λ profile are illustrated in Figures 4 and 6. Since the direct estimator is independent of the schedule, its error for all δU is about the same as the ∆U error. Where ∆U of a process is large, the direct estimator may be a good choice, particularly if the λ schedule is coarse. On the basis of Figures 5 and 6, the FEP estimators for both δF and δU suffer from significant errors which decrease with an improved λ schedule. In general, FEP R is worse than FEP F, a known result which stems from the fact that the insertion of a particle (forward direction) constrains the phase space of a system and leads to faster convergence. This is true for both the free energy and energy calculations.17,48 The BAR and MBAR methods perform better than the FEP estimators for both δF and δU calculations, with MBAR yielding lower statistical errors than BAR in all cases. The distinction between the two methods becomes more pronounced with an increasing number of simulations along the λ schedule, where the MBAR estimator can take advantage of information from non-neighbor simulations to improve its estimate. For a coarse schedule, non-neighboring ensembles are sufficiently different that little additional information can be gleaned from
New Estimators for Solvation Entropy and Enthalpy them, and BAR, which considers only neighboring simulations, performs similarly to MBAR. While the MBAR estimator obtains significantly lower errors for δU estimates and thus a smooth δU/δλ profile, its performance improvement versus other estimators as measured by the error of the ∆U estimate in Table 4 is modest. By construction, the MBAR free energy estimator minimizes the variance of the δF estimates, and this is reflected in the internal consistency of the δF/δλ and δU/δλ profile. However, this does not lead to a decrease of the same magnitude in the combined ∆U error, as the individual δU estimates are not independent. This effect is even more pronounced for ∆F where the lowest statistical error is found for FEP-F, whereas Figure 5 suggests MBAR should provide a clear benefit on account of lower statistical errors in the δF. In general, we expect that the quality of the estimates using all of the nondirect estimators, specifically BAR and MBAR, can be improved through longer sampling or additional simulations along the λ schedule. The performance of MBAR for free energy calculations and its promise for thermodynamic decompositions must also be considered from a computational efficiency standpoint. During a calculation with K simulations, each with a different λ value, the FEP estimators require one foreign energy to be calculated and written by each simulation during each evaluation step, and the BAR method requires two. The MBAR method, by contrast, requires K - 1 foreign energies to be evaluated and saved to disk. The postprocessing of such data is also more timeconsuming for the MBAR method than for the others. Such considerations will need to be taken into account when considering the total performance of the estimators. 4.2. Convergence Rates. In solvation calculations, the standard errors of ∆U are 50-100 times larger than ∆F, consistent with observations in the literature.49 For the Sun model, however, the errors of the two quantities are nearly identical. We can understand the difference between the systems by considering the source of the distinct ∆F and ∆U convergence rates.17 Most of the interactions in the solvation calculations are between pairs of water molecules, which dominate the potential energy of the system and are independent of λ. The thermal fluctuations of these interactions swamp out the relatively small energy changes due to the scaled potential, requiring long simulations to make out the signal from the scaling process. By contrast, free energy calculations generally consider ensemble averages of δV, where the contributions which are independent of λ tend to cancel. Consequently, the free energies for the solvation calculations converge much more quickly than do the energies. (This remains true for all the estimators considered here, including MBAR.) In the case of the Sun model, however, this distinction does not hold; there, the λ scaling directly affects the sole degree of freedom, and both the free energy and the energy converge at the same rate. 4.3. Solvation Calculations. Computational values for ∆F are statistically reliable and can be used to make quantitative comparisons to experimental data.32 Conversely, the thermodynamic decompositions into entropies and energies are typically 2 orders of magnitude larger, irrespective of the specific model compound.17 Despite the lower reliability of the thermodynamic decompositions, we can discern trends to rationalize the ∆F comparisons between experimental data and the computational values. On the basis of Figure 7 and Table 5, we find that: • ∆FOPLS-AA > ∆FmOPLS-AA (mOPLS-AA more favorable), • ∆UOPLS-AA < ∆UmOPLS-AA (OPLS-AA more favorable), and • T∆SOPLS-AA < T∆SmOPLS-AA (mOPLS-AA more favorable).
J. Phys. Chem. B, Vol. 114, No. 24, 2010 8177 It appears that the source of the improved overall agreement between experimental data and computational values obtained using the mOPLS-AA force field is due to the reduced entropic penalty associated with the modified force field. The σ parameters tend to be smaller for the mOPLS-AA parameter set, resulting in smaller atomic volumes from which the water must be displaced. This results in a less negative solvation entropy, which is governed by the cavity size and appears to rationalize the results for n-butane, isobutane, and propane. This difference between the OPLS-AA and mOPLS-AA LennardJones parameters is consistent with previous work16 and theoretical predictions.50 In the OPLS-AA force field, polar hydrogen atoms typically are “sizeless” in that their σ values are zero. This is not the case with the mOPLS-AA parameters and provides an explanation for many of the observed differences in ∆U estimates for polar molecules. Beyond this, the ε parameters for mOPLS-AA can be larger or smaller than for OPLS-AA, with most molecules a mix of increased and reduced dispersion forces; this prevents a straightforward interpretation of the observation that ∆U for mOPLS-AA tends to be more positive and less favorable. For NMA, the computational ∆F values for both force fields disagree with experimental data. However, the disagreement is smaller for the mOPLS-AA force field. The entropy-energy decomposition shows that both force fields yield similar estimates, within error, for ∆U and that this result is close to the experimental value of -22.4 kcal/mol. The main contribution to the difference in ∆F values comes from the entropic terms: for OPLS-AA, T∆S ) -16.0 ( 0.8 kcal/mol, whereas for mOPLS-AA, T∆S ) -13.9 ( 0.8 kcal/mol. Both these values are more negative than the experimental value of -12.3 kcal/ mol, but the mOPLS-AA is closer, resulting in an improved ∆F estimate and suggesting directions for further force field refinement while also highlighting the need for a more holistic approach to parameter optimization instead of focusing solely on the reparameterization of partial charges.8 5. Derivations 5.1. Definitions and General Identities. The free energy, energy, entropy, and potential energy of simulation isFi, Ui, Si, and Vi, respectivelysare normalized by the inverse temperature βi ) (kBTi)-1, to yield the nondimensional forms of these quantities
fi ) βiFi,
ui ) βiUi,
si ) βiSi,
Vi ) βiVi
(21a) With ∂Fi/∂T ) -Si, temperature derivatives of these include
∂Vi Vi )∂T Ti
(21b)
∂δf 1 ) - (τf1 - f0 + T0δs) ∂T T0
(21c)
∂fi fi ) - - si, ∂T Ti and with δf ) f1 - f0
where τ ) T1/T0. The Boltzmann probability distribution
8178
J. Phys. Chem. B, Vol. 114, No. 24, 2010
Fi(x) )
exp(-Vi(x))
∫ dy exp(-Vi(y))
Wyczalkowski et al.
(22a)
N0[〈g+V0〉0 - 〈g+〉0〈V0〉0 + 〈g+g-(τV1 - V0)〉0 (τf1 - f0 + T0δs)〈g+g-〉0] ) N1[τ〈g-V1〉1 - τ〈g-〉1〈V1〉1 - 〈g+g-(τV1 - V0)〉1 +
has the temperature derivative
∂Fi(x) Fi(x) (V - 〈Vi〉i) ) ∂T Ti i
(τf1 - f0 + T0δs)〈g+g-〉1] (27)
(22b) Solving for δs yields
with 〈A〉i ) ∫dxFi(x)A(x). The Fermi function is defined18 as
g(a) ) 1/[1 + exp(a)]
T0δs )
N0R0 - N1R1 + f0 - τf1 N0〈g+g-〉0 + N1〈g+g-〉1
(28a)
(23a) where
and its derivative is
R0 ) 〈g+V0〉0 - 〈g+〉0〈V0〉0 + 〈g+g-(τV1 - V0)〉0 ∂g(a) exp(a) ∂a 1 ∂a )) -g(a)g(-a) ∂T 1 + exp(a) 1 + exp(a) ∂T ∂T (23b) 5.2. BAR. Bennett18 derived the following free energy estimator (eq 9 in that reference)
Q0 〈g(V0 - V1 + C)〉1 ) exp(C) Q1 〈g(V1 - V0 - C)〉0
(24a)
(28b) R1 ) τ〈g-V1〉1 - τ〈g-〉1〈V1〉1 - 〈g+g-(τV1 - V0)〉1 (28c) Equations 28 reduce to eqs 8 for N0 ) N1 and T0 ) T1. 5.3. MBAR. The MBAR free energy estimator is defined as eq 11 in ref 19 Nj
K
Q0 N1 C ) ln Q1 N0
fi ) -ln
(24b)
∑∑ j)1 n)1
exp[-Vi(xjn)] K
∑ N exp[f k
k
for i ) 1, 2, ..., K
- Vk(xjn)]
k)1
(29) with δf ) ln(Q0/Q1), which holds for an arbitrary number of observations per simulation Ni and temperatures Ti. Equations 24a are typically solved iteratively until C converges. Here, however, we substitute 24b into 24a directly to obtain
N0〈g+〉0 ) N1〈g-〉1
where K is the number of simulations and Nj is the number of observations drawn from the jth simulation. Substituting the empirical estimator ΣnNjAn ) Nj〈A〉j and defining
(25a) µi(x) ≡
with
exp(fi - Vi(x)) K
(30a)
∑ Nk exp(fk - Vk(x)) k
g+(x) ≡ [1 + exp(+V1(x) - V0(x) - δf - ln N1 /N0)]-1 (25b) g-(x) ≡ [1 + exp(-V1(x) + V0(x) + δf + ln N1 /N0)]-1 (25c) Equations 7 are specific to N0 ) N1 and T0 ) T1, as indicated by the * symbol. We obtain the entropy equation by taking the temperature derivative of eq 25a and multiplying by T0
T0
∂ {N ∂T 0
∫ dxF0(x)g+(x) ) N1 ∫ dxF1(x)g-(x)} (26)
which, with identities from Section 5.1, becomes
Equation 29 can be written as K
∑ Nj〈µi〉j ) 1
for i ) 1, 2, ..., K
(30b)
j
with x drawn from a probability distribution corresponding to Vj(x). We obtain the MBAR entropy estimator from the temperature derivative of eq 30b
Ti
∂ { ∂T
With the identity
K
∑ Nj ∫ dxFj(x)µi(x) ) 1} j
(31)
New Estimators for Solvation Entropy and Enthalpy
[
∂µi 1 ) - (fi + Tisi)µi - µiVi ∂T Ti
K
J. Phys. Chem. B, Vol. 114, No. 24, 2010 8179
T
∑ Nk Tki {(fk + k
]
Tksk)µiµk - µiµkVk}
(32)
we find
Tisi ) -fi + +
K
K
j
k
∑∑
K
K
j
j
T
∑ Nj〈µiVi〉j + ∑ Nj Tji (〈µiVj〉j - 〈µi〉j〈Vj〉j) Ti NjNk [(fk + Tksk)〈µiµk〉j - 〈µiµkVk〉j] Tk for i ) 1, 2, ..., K
(33) Equations 9 and 10 in the body of the paper correspond to eqs 30 and 34 for all Ni and Ti the same. 5.4. Sun Model. Given a one-dimensional potential V(x, λ) free energy is given as
F(λ) ) -β-1 ln
∫-∞∞ dx exp[-βV(x, λ)]
(34)
and the free energy change ∆F ) F(1) - F(0). For the Sun model, eq 11, we can integrate eq 34 and obtain ∆F analytically51,52
∆F ) β-1 ln(√2πe32β(I-1/4(32β) + I1/4(32β))) -1
β
( ) ( 45 )
2Γ
ln
4
√β
(35)
where Γ is the gamma function and In is a modified Bessel function of the first kind. We can obtain the energy change by differentiating ∂ β∆F ∂β 32(I-3/4(32β) + I-1/4(32β) + I1/4(32β) + I3/4(32β)) ) I-1/4(32β) + I1/4(32β)
∆U )
(36) For β ) 0.02, ∆F ) 65.8878, ∆U ) 53.1957, and T∆S ) -12.6921. The analytical derivations for the Sun model were performed with Mathematica.53 Acknowledgment. We thank Lev Gelb, Nathan Baker, and Albert Mao for useful discussions. This work was supported by grant MCB 0718924 from the National Science Foundation. References and Notes (1) Ben-Naim. A. SolVation Thermodynamics; Plenum Press: New York, 1987. (2) Smith, D. E.; Haymet, A. D. J. Free energy, entropy and internal energy of hydrophobic interactions: Computer simulations. J. Chem. Phys. 1993, 98 (8), 6445–6454. (3) Durell, S. R.; Wallqvist, A. Atomic-scale analysis of the solvation thermodynamics of hydrophobic hydration. Biophys. J. 1996, 71 (4), 1695– 706.
(4) Mobley, D. L.; Dill, K. A. Binding of small molecule ligands to proteins: “what you see” is not always “what you get”. Structure 2009, 17 (4), 489–498. (5) Kubo, M. M.; Gallicchio, E.; Levy, R. M. Thermodynamic decomposition of hydration free energies by computer simulation: Application to amines, oxides and sulfides. J. Phys. Chem. B 1997, 101, 10527– 10534. (6) Shirts, M. R.; Pitera, J. W.; Swope, W. C.; Pande, V. S. Extremely precise free energy calculations of amino acid side chain analogs: Comparison of common molecular mechanics force fields for proteins. J. Chem. Phys. 2003, 119 (11), 5740–5761. (7) MacKerell, A. D.; Bashford, D.; Bellott, M.; Dunbrack, R. L.; Evanseck, J. D.; Field, M. J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; JosephMcCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F. T. K.; Mattos, C.; Michnick, S.; Ngo, T.; Nguyen, D. T.; Prodhom, B.; Reiher, W. E.; Roux, B.; Schlenkrich, M.; Smith, J. C.; Stote, R.; Straub, J.; Watanabe, M.; Wiorkiewicz-Kuczera, J.; Yin, D.; Karplus, M. All-atom empirical potential for molecular modeling and dynamics studies of proteins. J. Phys. Chem. B 1998, 102 (18), 3586–3616. (8) Udier-Blagovic´, M.; De Tirado, P. M.; Pearlman, S. A.; Jorgensen, W. L. Accuracy of free energies of hydration using CM1 and CM3 atomic charges. J. Comput. Chem. 2004, 25 (11), 1322–1332. (9) Horinek, D.; Mamatkulov, S. I.; Netz, R. R. Rational design of ion force fields based on thermodynamic solvation properties. J. Chem. Phys. 2009, 130 (12), 124507. (10) Levy, R. M.; Gallicchio, E. Computer simulations with explicit solvent: Recent progress in the thermodynamic decomposition of free energies and in modeling electrostatic effects. Annu. ReV. Phys. Chem. 1998, 49, 531–567. (11) Fleischman, S. H.; Brooks, C. L., III. Thermodynamics of aqueous solvation - solution properties of alcohols and alkanes. J. Chem. Phys. 1987, 87 (5), 3029–3037. (12) Makhatadze, G. I.; Lopez, M. M.; Privalov, P. L. Heat capacities of protein functional groups. Biophys. Chem. 1997, 64 (1-3), 93–101. (13) Graziano, G. Hydration thermodynamics of N-methylacetamide. J. Phys. Soc. Jpn. 2000, 69 (11), 3720–3725. (14) Wolfenden, R. Interaction of the peptide bond with solvent water: A vapor phase analysis. Biochemistry 1978, 17 (1), 201–204. (15) Mobley, D. L.; Dumont, E.; Chodera, J. D.; Dill, K. A. Comparison of charge models for fixed-charge force fields: Small-molecule hydration free energies in explicit solvent. J. Phys. Chem. B 2007, 111, 2242–2254. (16) Gallicchio, E.; Kubo, M. M.; Levy, R. M. Enthalpy-entropy and cavity decomposition of alkane hydration free energies: Numerical results and implications for theories of hydrophobic solvation. J. Phys. Chem. B 2000, 104, 6271–6285. (17) Lu, N.; Kofke, D. A.; Woolf, T. B. Staging is more important than perturbation method for computation of enthalpy and entropy changes in complex systems. J. Phys. Chem. B 2003, 107 (23), 5598–5611. (18) Bennett, H. Efficient estimation of free energy differences from Monte Carlo data. J. Comput. Phys. 1976, (22), 245–268. (19) Shirts, M. R.; Chodera, J. D. Statistically optimal analysis of samples from multiple equilibrium states. J. Chem. Phys. 2008, 129 (12), 124105. (20) Jorgensen, W. L.; Maxwell, D. S.; Tirado-Rives, J. Development and testing of the OPLS all-atom force field on conformational energetics and properties of organic liquids. J. Am. Chem. Soc. 1996, 118, 11225– 11236. (21) Vitalis, A.; Pappu, R. V. Absinth: A new continuum solvation model for simulations of polypeptides in aqueous solutions. J. Comput. Chem. 2009, 30 (5), 673–699. (22) Kirkwood, J. Statistical mechanics of fluid mixtures. J. Chem. Phys. 1935, 3 (5), 300–313. (23) Mezei, M.; Beveridge, D. L. Free energy simulations. Ann. N.Y. Acad. Sci. 1986, 482 (1), 1–23. (24) Zwanzig, R. W. High-temperature equation of state by a perturbation method. I. Nonpolar gases. J. Chem. Phys. 1954, 22 (8), 1420–1426. (25) Sun, S. X. Equilibrium free energies from path sampling of nonequilibrium trajectories. J. Chem. Phys. 2003, 118 (13), 5769–5775. (26) Allen, M. P.; Tildesley, D. J. Computer Simulation of Liquids; Oxford University Press: New York, 1987. (27) Vitalis, A.; Pappu, R. V. Methods for Monte Carlo simulations of biomacromolecules. Ann. Rep. Comput. Chem. 2009, 5, 49–76. (28) Vitalis, A. Steffen, A. Lyle, N. Mao, A. Pappu. R. V. Campari: A software suite for modeling and computational analysis of macromolecular properties across resolutions and interfaces, in preparation. (29) Kaminski, G. A.; Friesner, R. A.; Tirado-Rives, J.; Jorgensen, W. L. Evaluation and reparametrization of the OPLS-AA force field for proteins via comparison with accurate quantum chemical calculations on peptides. J. Phys. Chem. B 2001, 105 (28), 6474–6487. (30) Beutler, T. C.; Mark, A. E.; van Scheik, R. C.; Gerber, P. R.; van Gunsteren, W. F. Avoiding singularities and numerical instabilities in free energy calculations based on molecular mechanics simulations. Chem. Phys. Lett. 1994, 222, 529–539.
8180
J. Phys. Chem. B, Vol. 114, No. 24, 2010
(31) Jorgensen, W.; Chandrasekhar, J.; Madura, J.; Impey, R. W.; Klein, M. L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 1983, 79 (2), 926–935. (32) Shirts, M. R.; Pande, V. S. Solvation free energies of amino acid side chain analogs for common molecular mechanics water models. J. Chem. Phys. 2005, 122, 134508. (33) Pauling, L. General Chemistry; W. H. Freeman Press: San Francisco, CA, 1970. (34) Skeel, R. D.; Izaguirre, J. A. An impulse integrator for Langevin dynamics. Mol. Phys. 2002, 100 (24), 3885–3891. (35) Onsager, L. Electric moments of molecules in liquids. J. Am. Chem. Soc. 1936, 58, 1486–1493. (36) Garde, S.; Hummer, G.; Paulaitis, M. E. Free energy of hydration of a molecular ionic solute: Tetramethylammonium ion. J. Chem. Phys. 1998, 108 (4), 1552–1561. (37) Hunenberger, P. Thermostat algorithms for molecular dynamics simulations. AdV. Polym. Sci. 2005, 173, 105–147. (38) Wyczalkowski, M. A.; Pappu, R. V. Satisfying the fluctuation theorem in free-energy calculations with Hamiltonian replica exchange. Phys. ReV. E 2008, 77 (2), 026104. (39) Shenfeld, K.; Xu, H.; Eastwood, M. P.; Dror, R. O.; Shaw, D. E. Minimizing thermodynamic length to select intermediate states for freeenergy calculations and replica-exchange simulations. Phys. ReV. E 2009, 80 (4), 046705. (40) Crooks, G. E. Measuring thermodynamic length. Phys. ReV. Lett. 2007, 99 (10), 100602. (41) Ben-Naim, A.; Marcus, Y. Solvation thermodynamics of nonionic solutes. J. Chem. Phys. 1984, 81 (4), 2016–2027. (42) Cabani, S.; Gianni, P.; Mollica, V.; Lepori, L. Group contributions to the thermodynamic properties of non-ionic organic solutes in dilute aqueous-solution. J. Solution Chem. 1981, 10 (8), 563–595. (43) Jorgensen, W. L.; Jenson, C. Temperature dependence of TIP3P, SPC, and TIP4P water from NPT Monte Carlo simulations: Seeking temperatures of maximum density. J. Comput. Chem. 1998, 19 (10), 1179– 1186. (44) Mahoney, M. W.; Jorgensen, W. L. A five-site model for liquid water and the reproduction of the density anomaly by rigid, nonpolarizable potential functions. J. Chem. Phys. 2000, 112, 8910–8922.
Wyczalkowski et al. (45) Lu, N. D.; Kofke, D. A. Accuracy of free-energy perturbation calculations in molecular simulation. I. modeling. J. Chem. Phys. 2001, 114 (17), 7303–7311. (46) Zuckerman, M.; Woolf, T. B. Systematic finite-sampling inaccuracy in free energy differences and other nonlinear quantities. J. Stat. Phys. 2004, 114 (5-6), 1303–1323. (47) Roux, B.; Simonson, T. Implicit solvent models. Biophys. Chem. 1999, 78, 1–20. (48) Wu, D.; Kofke, D. A. Asymmetric bias in free-energy perturbation measurements using two hamiltonian-based models. Phys. ReV. E 2004, 70 (6), 066702. (49) Trzesniak, D.; van Gunsteren, W. F. Pathway dependence of the efficiency of calculating free energy and entropy of solute-solute association in water. Chem. Phys. 2006, 330 (3), 410–416. (50) Pratt, L. R.; Chandler, D. Theory of the hydrophobic effect. J. Chem. Phys. 1977, 67, 3683. (51) Oberhofer, H.; Dellago, C.; Geissler, P. L. Biased sampling of nonequilibrium trajectories: can fast switching simulations outperform conventional free energy calculation methods. J. Phys. Chem. B 2005, 109 (14), 6902–15. (52) Nummela, J.; Yassin, F.; Andricioaei, I. Entropy-energy decomposition from nonequilibrium work trajectories. J. Chem. Phys. 2008, 128 (2), 024104. (53) Wolfram Research. Mathematica 7.0.; Wolfram Research Inc.: Champaign, IL, 2008. (54) Avbelj, F.; Luo, P.; Baldwin, R. L. Energetics of the interaction between water and the helical peptide group and its role in determining helix propensities. Proc. Natl. Acad. Sci. U.S.A. 2000, 97 (20), 10786–91. (55) Moore, J. C.; Battino, R.; Rettich, T. R.; Handa, Y. P.; Wilhelm, E. Partial molar volumes of gases at infinite dilution in water at 298.15 K. J. Chem. Eng. Data 1982, 27 (1), 22–24. (56) Hnedkovsky, L.; Degrange, S.; Cibulka, I. Partial molar volumes of organic solutes in water. I. O-, m-, and p-cresol at temperatures 298 K to 573 K. J. Chem. Thermodyn. 1998, 30 (5), 557–569. (57) Wilhelm, E.; Battino, R.; Wilcock, R. Low-pressure solubility of gases in liquid water. Chem. ReV. 1977, 77, 219. (58) Humphrey, W.; Dalke, A.; Schulten, K. VMD - Visual Molecular Dynamics. J. Mol. Graphics 1996, 14, 33–38.
JP103050U