Article pubs.acs.org/JCTC
Improving the Efficiency of Non-equilibrium Sampling in the Aqueous Environment via Implicit-Solvent Simulations Hui Liu,† Fu Chen,† Huiyong Sun,† Dan Li,† and Tingjun Hou*,†,‡ †
College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China State Key Lab of CAD & CG, Zhejiang University, Hangzhou, Zhejiang 310058, China
‡
ABSTRACT: By means of estimators based on non-equilibrium work, equilibrium free energy differences or potentials of mean force (PMFs) of a system of interest can be computed from biased molecular dynamics (MD) simulations. The approach, however, is often plagued by slow conformational sampling and poor convergence, especially when the solvent effects are taken into account. Here, as a possible way to alleviate the problem, several widely used implicit-solvent models, which are derived from the analytic generalized Born (GB) equation and implemented in the AMBER suite of programs, were employed in free energy calculations based on non-equilibrium work and evaluated for their abilities to emulate explicit water. As a test case, pulling MD simulations were carried out on an alanine polypeptide with different solvent models and protocols, followed by comparisons of the reconstructed PMF profiles along the unfolding coordinate. The results show that when employing the non-equilibrium work method, sampling with an implicitsolvent model is several times faster and, more importantly, converges more rapidly than that with explicit water due to reduction of dissipation. Among the assessed GB models, the Neck variants outperform the OBC and HCT variants in terms of accuracy, whereas their computational costs are comparable. In addition, for the best-performing models, the impact of the solventaccessible surface area (SASA) dependent nonpolar solvation term was also examined. The present study highlights the advantages of implicit-solvent models for non-equilibrium sampling.
■
INTRODUCTION Quantitative estimation of the free energy difference between two thermodynamic states or the free energy changes of a system along selected degrees of freedom is a key topic in computational molecular science and has been of long-standing interest.1 This is due to the fact that free energy is a central quantity that governs the direction and extent of a chemical or physical change, thereby providing a thermodynamic underpinning for the mechanistic interpretation of such an event. On the basis of statistical mechanics and computer simulations, a variety of theoretically rigorous methods have been devised for calculating free energies over the past few decades.2−5 Of those established methods, the so-called non-equilibrium work method5 is seemingly special. It is based on the nonequilibrium work theorems put forth by Jarzynski5 and generalized by Crooks.6 Unlike traditional approaches that explore the phase space in an equilibrium process, this method relies on sampling of a system driven away from equilibrium. In practice, its implementations are usually achieved via pulling molecular dynamics (MD) simulations (also referred to as steered MD simulations),7,8 which mimic the single-molecule force spectroscopy9 in silico. In such a simulation, an external guiding potential, usually time-varying, is imposed along a chosen direction of conformational space (i.e., a reaction coordinate (RC)), causing the system to switch from one thermodynamic state to another. A number of simulations have to be conducted repeatedly while measuring the accumulated work, and then these ensembles of realizations of the switching © 2017 American Chemical Society
process can be collected to recover the equilibrium free energy profile along the predefined RC by means of free energy estimators derived from the non-equilibrium work relations.5,6 Compared with some others, the non-equilibrium work method is embarrassingly parallelizable in nature and therefore even well-suited for high-throughput computing on a distributed infrastructure or heterogeneous hardware. It has been utilized in studies of mechanical unfolding/folding of biological macromolecules,10,11 facilitated transport of substrates,12 association of proteins with ligands,13 and so on. In spite of the rapid development of computer hardware, estimating free energy at a high level of theoretical rigor is still a computationally demanding task nowadays, thus hindering its routine use. The high computational cost is mainly ascribed to the time-consuming stage of conformational sampling based on atomistic simulations, especially when the effects of bulk solvent are taken into account. Nevertheless, accurate description of the aqueous environment is critical for the delineation of chemical and biological phenomena at the molecular level. Indeed, the most detailed and thorough representation of the solvent effects entails explicit inclusion of solvent molecules in simulations. However, it is sometimes too computationally costly. In this scenario, it appears that implicitsolvent models, which implicitly represent solvent as a continuum medium as opposed to individual solvent molecules, Received: November 22, 2016 Published: March 15, 2017 1827
DOI: 10.1021/acs.jctc.6b01139 J. Chem. Theory Comput. 2017, 13, 1827−1836
Article
Journal of Chemical Theory and Computation may be a more efficient alternative for speeding up the sampling.14−16 However, because these approximate models are actually resulted from a compromise between loss of accuracy and gain in speed, preliminary assessment of their performance in free energy calculations is undoubtedly necessary. Despite the fact that there exist numerous implicit-solvent models, the ones derived from the analytic generalized Born (GB) equation17 have emerged as the most commonly used models in the field of atomistic simulations. Accordingly, considerable efforts have been invested in recent years to explore the applicability and capability of these models in free energy calculations. For instance, different GB models have been used with some success in reproducing hydration free energies of small molecules via alchemical transformations.18−20 Moreover, for biological macromolecules, Zacharias and co-workers have evaluated the accuracies of several GB models in estimating absolute binding free energies of protein−peptide complexes with umbrella sampling21 and relative changes of binding free energy associated with mutations in protein−peptide complexes using free energy perturbation,22 and reasonable results have been observed in both cases. In these studies, the employed free energy methods are all based on equilibrium sampling, and as for the non-equilibrium sampling based methods, Bureau et al. have recently modeled the stretching of deca-alanine using the adaptive steered MD simulations23 in both explicit and implicit water.24 However, only one GB model25 was used in their work,24 and some widely used or newly developed models have not yet been evaluated. Hence, it still remains unclear how well these GB models perform in nonequilibrium free energy calculations. To this end, here, we examined the accuracy and efficiency of several popular GB models, which are implemented in the AMBER molecular simulation package,26 for conformational sampling in non-equilibrium work based free energy calculations. As a paradigmatic model for validating non-equilibrium work based protocols,10,24,27−29 the mechanical unfolding of an alanine polypeptide from a compact α-helix to an extended coil was simulated in the aqueous environment represented by different solvent models and at different pulling velocities and, in some cases, with or without the nonpolar solvation contributions. The resulting free energy profiles along the unfolding pathway were reconstructed and compared. Our investigation highlights the advantages of implicit-solvent models for non-equilibrium sampling and provides insight into future development of the models.
Figure 1. Representative α-helical (left) and extended (right) conformations of the deca-alanine peptide. The N- and C-termini are in blue and red, respectively. The conformational transition between these two states was simulated and enforced by exerting external pulling forces in this work. The RC (ξ) for biased MD simulations was defined by the interatomic distance between the nitrogen atom in the first alanine residue at the N-terminus and another nitrogen atom in the capping group at the C-terminus.
barostat,31 respectively. A time step of 2 fs was used to integrate the equations of motion, permitted by constraining the fast stretching of the hydrogen atoms covalently linked to heavy atoms.32,33 The Coulomb potentials were handled by the smooth particle-mesh Ewald (PME) method34 with a directspace cutoff of 10 Å. The same threshold value was also used for truncation of the Lennard-Jones potentials, while long-range analytic corrections were applied to the energy and pressure. All of the unbiased and subsequent biased MD simulations were performed with the AMBER 14 suite of programs,26 adopting the AMBER ff14SB force field.35 Umbrella Sampling. For comparison, the equilibrated system was subjected to umbrella sampling (US)4 simulations to calculate the potential of mean force (PMF) profiles of the helix−coil transition of deca-alanine. As illustrated in Figure 1, the RC used here was the end-to-end distance (ξ) which was specifically defined by the distance between the two nitrogen atoms at two ends: one is in the first alanine residue at the Nterminus, and the other is in the capping group at the Cterminus. Thirty-nine umbrella windows were evenly spaced and centered from ξ0 = 14 to 33 Å along the RC, and the harmonic force constant for restraining the end-to-end distance was generally set to 10 kcal mol−1 Å−2. Each umbrella window was propagated for 5.5 ns in the canonical ensemble (T = 300 K) with the TIP3P water, and the first 0.5 ns was discarded as equilibration. Other computational details were the same as the conventional MD simulations described above. The biased probability distributions of states were reweighted using the weighted histogram analysis method (WHAM),36 by which the unbiased PMFs were recovered. The WHAM 2.0.9 program was utilized for the reweighting. Pulling Simulations. Pulling simulations were carried out by elongating the end-to-end distance of the equilibrated decaalanine helix, as defined in the US simulations (Figure 1), from 14 to 33 Å. Elongations were conducted at constant velocities (v = 0.1, 1, 10, and 100 Å/ns) for an individual simulation by imposing a moving harmonic potential with a force constant of 7.2 kcal mol−1 Å−2. The temperatures in all simulations were maintained constant (T = 300 K), and the collision frequency of Langevin dynamics was 5 ps−1 if not otherwise stated. For explicit-solvent simulations, the computational details were the same as in the canonical-ensemble equilibration phase described above. For implicit-solvent simulations, up to five GB models were employed, which are hereafter denoted as
■
MATERIALS AND METHODS System Construction and Equilibration. The starting structure of the benchmark model was a deca-alanine helix consisting of 10 alanine residues (Ac-Ala10-NH2, Figure 1), which was derived from the one previously used by Park et al.10 We appended an acetyl group to the N-terminus and a neutral amino group to the C-terminus using the LEaP program of AmberTools 15, followed by energy minimization in vacuum. The relaxed model was immersed in an isometric truncated octahedral box filled with explicit water molecules described by the TIP3P model,30 leaving at least a 15 Å buffering zone away from any boundary of the box. After a series of energy minimization and thermalization, the system was equilibrated in the isothermal−isobaric ensemble (T = 300 K and P = 1 bar) and subsequently the canonical ensemble (T = 300 K). The temperature and pressure were controlled by Langevin dynamics (γ = 5 ps−1) and the Berendsen weak-coupling 1828
DOI: 10.1021/acs.jctc.6b01139 J. Chem. Theory Comput. 2017, 13, 1827−1836
Article
Journal of Chemical Theory and Computation GBHCT,37 GBOBCI,25 GBOBCII,25 GBNeck,38 and GBNeck239 and invoked by specifying igb =1, 2, 5, 7, and 8 in the input options of the PMEMD program of AMBER, respectively. For each model, the recommended intrinsic atomic radii set was used, according to the corresponding original publication; that is, the “mbondi” set for GBHCT,37 the “mbondi2” set for GBOBCI and GBOBCII,25 the “bondi” set for GBNeck,38 and the “mbondi3” set for GBNeck2.39 When the nonpolar contributions to solvation free energy were taken into account, a solvent-accessible surface area (SASA) dependent term was employed, and the SASA of the peptide was estimated using the linear combinations of pairwise overlaps (LCPO) algorithm,40 with a surface tension coefficient of 0.005 kcal mol−1 Å−2. In all implicit-solvent simulations, the interior and exterior dielectric constants were set to 1.0 and 78.5, respectively, and an infinite cutoff was used for calculating both the nonbonded interactions and the effective Born radii. Reconstruction of PMFs from Pulling Simulations. The mechanical work done on the system was measured during the pulling simulations, and the unperturbed free energy or PMF profiles were reconstructed with respect to the end-to-end distance. For quasi-equilibrium pulling, the switching process occurred very slowly, and thus the free energy difference (ΔF) was given by the average of accumulated mechanical work (W) over N realizations of the process: ΔF ≈
1 N
Figure 2. PMFs of the deca-alanine peptide along its helix−coil transition pathway calculated using the umbrella sampling method with different durations of sampling. All simulations were performed in explicit water described by the TIP3P model.
with a 5 ns/window sampling as the reference PMF for comparison throughout this work. As can be seen from Figure 2, the yielded PMF is roughly flat along the majority of the unfolding coordinate, differing significantly from the previously reported one that was generated from the simulations in vacuum using a different force field.10 Besides, the free energy barrier between the pure helix and extended conformations is ∼3 kcal/mol and much lower than the value in vacuum with a different force field.10 The choice of the deca-alanine peptide as the benchmark model is for the sake of simplicity and due to its extensive use in previous studies on free energy calculations. At room temperature, its stable conformation is a compact α-helix. Therefore, the helix−coil transformation simulated here is not a spontaneous process, implying a positive value for the free energy change between the final and initial states (i.e., the unfolding free energy), which agrees well with our result. Comparison of Explicit- and Implicit-Solvent Models for Quasi-equilibrium Pulling Simulations. Next, we performed pulling simulations to enforce the unfolding of deca-alanine repeatedly by imposing moving harmonic restraints and reconstructed the unfolding PMFs from the repetitions of the transition process. Initially, the switching process from the folded (α-helix) to unfolded (extended coil) states was simulated at a very low velocity (v = 0.1 Å/ns) so that it was possible to mimic a quasi-reversible thermodynamic process, and thus the mechanical work done on the system was approximately equal to the free energy change of the transition.10 The pulling simulations were carried out in the TIP3P water and then repeated with up to five different GB models to evaluate their performance. The generated PMFs are shown in Figure 3A. According to the resulting profiles, it is suggested that when using explicit solvent, the PMF yielded from the pulling simulations fits well with that from the US simulations, thus confirming the validity of quasi-equilibrium pulling. On the other hand, when implicit solvents were employed, some of the evaluated GB models could reproduce the result of their explicit counterpart well, and there are only slight deviations. For quantitative measurement of the accuracy, we utilized the root-mean-squared deviation (RMSD, σ) between the calculated (F) and reference (Fref) free energies as a metric,41 which is defined as
N
∑ Wi
(1)
i=1
For non-equilibrium pulling, as the pulling force constant was large, which obeys the stiff-spring approximation,10 the Jarzynski equality5 was thus directly employed: ⎛ ΔF ⎞ exp⎜ − ⎟= ⎝ kBT ⎠
⎛ W ⎞ exp⎜ − ⎟ ⎝ kBT ⎠
≈
1 N
N
⎛
Wi ⎞ ⎟ ⎝ kBT ⎠
∑ exp⎜− i=1
(2)
where kB is the Boltzmann constant, T is the initial temperature of the system, and ⟨...⟩ denotes an ensemble average. Python 2.6.6 and NumPy 1.4.1 were used for data analysis.
■
RESULTS AND DISCUSSION Reference PMF Calculated from Umbrella Sampling Simulations in Explicit Water. In order to get a reference for later assessment, we first employed US,4 which is known to be rigorous and reliable, to calculate the PMF of deca-alanine with respect to the chosen RC of unfolding (i.e., the distance between the two terminal nitrogen atoms). The simulations were performed in explicit water, and the TIP3P model30 was chosen because it is arguably the most extensively used water model in the field of MD simulations. Although there exist other explicit water models, most of the implicit-solvent models tested here are parametrized by comparing the simulation results with the TIP3P model, so we chose it as the reference explicit-solvent model. It should be noted that, for a different explicit water model, the results could be slightly different. Several resulting PMFs with increasing durations of sampling for each umbrella window are presented in Figure 2. It is evident that the differences between adjacent profiles are inconspicuous after a 2 ns/window sampling phase and there is no observable difference between the profiles after a 4 ns/ window sampling phase, suggesting that the result is convergent. Hence we decided to use the profile generated
σ= 1829
1 N
N
∑ (F(ξi) − Fref (ξi) + C)2 i=1
(3) DOI: 10.1021/acs.jctc.6b01139 J. Chem. Theory Comput. 2017, 13, 1827−1836
Article
Journal of Chemical Theory and Computation
Figure 3. PMFs of the deca-alanine peptide along its helix−coil transition pathway reconstructed from (A) quasi-equilibrium and (B−D) nonequilibrium pulling simulations with different explicit- and implicit-solvent models. For quasi-equilibrium simulations, the PMFs were obtained by arithmetic averaging over the accumulated work, whereas, in the cases of non-equilibrium simulations, the PMFs were calculated by exponential averaging based on the Jarzynski estimator.
Comparison of Explicit- and Implicit-Solvent Models for Non-equilibrium Pulling Simulations: Accuracy and Efficiency. Although the quasi-equilibrium pulling simulations could yield reasonable results, the implementations are computationally expensive (a duration of 190 ns for an individual realization and totaling 570 ns for triplicate simulations), which appears to be impractical in most real-life cases. Pulling with a higher velocity, which is relatively fast and drives the helix−coil transition away from the thermal equilibrium regime, can be easily made parallelized and thus be affordable. Thanks to the non-equilibrium work relations,5,6 the equilibrium PMFs can be recovered from the realizations of the irreversible process. For comparison, we conducted the non-equilibrium pulling simulations at three increasing velocities that are 1−3 orders of magnitude larger than that used for the quasi-equilibrium pulling simulations. For an individual velocity, different numbers of realizations were assigned (Table 1) in order to make the total computational cost approximately the same for all. A sufficiently large moving harmonic constant (i.e., a “stiff spring”)10,24 was used throughout, thus ensuring the direct application of the Jarzynski equality (eq 2). The yielded PMFs are presented in Figure 3B−D, and the related RMSDs are listed in Table 1. Compared with the results of the quasi-equilibrium simulations, more significant deviations from the reference value can be observed. Overall, for most of the models, either explicit or implicit, worsening of the accuracy with increasing pulling velocities can be noted in PMFs and in
where N is the number of selected data points, and C is an arbitrary constant for minimizing the RMSD and can be determined from C=
1 N
N
∑ (Fref (ξi) − F(ξi)) i=1
(4)
For each PMF, we picked a data point every 0.2 Å along the RC and compared it with the reference PMF. As listed in Table 1, most of the calculated RMSDs between the implicit-solvent simulations and the reference explicit-solvent simulations are below 1 kcal/mol, indicating that the solvent effects could be approximated by these models with high accuracy in terms of free energy. Among all, the PMF generated with the GBNeck model shows the lowest deviation (0.3 kcal/mol) from the reference, which is very close to the result of the TIP3P water model (0.2 kcal/mol), followed by the GBNeck2 model. Two variants of the GBOBC model generated nearly identical PMFs with a RMSD of ∼1 kcal/mol. The worst one comes from the GBHCT model which gives a 1.8 kcal/mol deviation from the reference PMF. Additionally, we evaluated the unfolding free energies (ΔFunfold) and made a comparison with the calculated value of the US simulations. As summarized in Table 1, the GBNeck model gives the best result, whereas the result of the GBNeck2 model is also good and below 1 kcal/mol. The GBHCT model still produced the largest deviation in terms of the unfolding free energy. 1830
DOI: 10.1021/acs.jctc.6b01139 J. Chem. Theory Comput. 2017, 13, 1827−1836
Article
Journal of Chemical Theory and Computation
such as the GB models, the nonbonded interactions exerted by the external media on the simulated target are emulated by effective potential energies that depend solely on the atomic coordinates of the solute. The effective potentials can be hypothetically decomposed into two decoupled components, that is, the electrostatic (Esol,es) and nonpolar (Esol,np) parts. For most of the GB models, the former is calculated according to the formulation introduced by Still et al.:17
Table 1. Comparison of Accuracies of Different Explicit- and Implicit-Solvent Models for Free Energy Calculations with Pulling Simulations solvent model TIP3P
GBHCT
GBOBCI
GBOBCII
GBNeck
GBNeck2
GBNeck/SA
GBNeck2/SA
pulling velocity (Å/ns)
no. of simulations
RMSDa (kcal/mol)
ΔΔFunfoldb (kcal/mol)
0.1 1.0 10.0 100.0 0.1 1.0 10.0 100.0 0.1 1.0 10.0 100.0 0.1 1.0 10.0 100.0 0.1 1.0 10.0 100.0 0.1 1.0 10.0 100.0 0.1 1.0 10.0 100.0 0.1 1.0 10.0 100.0
3 10 100 1000 3 10 100 1000 3 10 100 1000 3 10 100 1000 3 10 100 1000 3 10 100 1000 3 10 100 1000 3 10 100 1000
0.2 1.1 4.3 9.3 1.8 1.8 2.9 4.4 1.0 1.0 1.1 4.1 0.9 1.1 1.6 3.8 0.3 0.2 0.3 2.0 0.6 0.7 0.6 3.3 0.4 0.4 0.5 2.8 0.9 1.1 1.4 4.1
0.3 2.3 10.8 26.3 4.0 3.2 6.1 11.8 1.8 1.9 1.7 10.8 1.8 2.0 2.6 9.7 −0.5 −0.7 −0.7 4.3 0.8 0.9 0.0 9.1 0.3 0.3 0.2 6.6 1.5 2.1 2.4 9.9
1⎛ 1 ⎞ Esol,es = − ⎜1 − ⎟∑ 2⎝ εwater ⎠ i , j
qiqj ⎛ rij 2 ⎞ rij 2 + αiαj exp⎜ − 4α α ⎟ ⎝ i j⎠ (5)
in which εwater denotes the relative permittivity of bulk water, qi and qj are partial charges of atoms i and j, respectively, αi and αj are effective Born radii of these atoms, and rij is the interatomic distance. In fact, different GB models being evaluated here differ primarily in the strategies for computing the effective Born radii, which represent the extent of burial for individual atoms in an arbitrarily shaped solute molecule. The GBHCT model is the oldest one that uses an efficient pairwise descreening method to approximate the effective Born radii;37 however, it was found to be not good for large macromolecules. Later modification by Onufriev et al. produced two variants of GBOBC which are suitable for applications with proteins.25 Compared with them, a so-called “Neck” correction included in the GBNeck38and GBNeck239 models makes a further improvement to the GBHCT model. Apparently, the accuracies of these models in calculating free energies are in accordance with their performance in evaluating the effective Born radii. Both accuracy and efficiency are important criteria for evaluation of methodologies in computational chemistry. Thus, we also compared the computational speed for all solvent models, and a quantitative analysis of computational costs is given in Figure 4. In our test case, the non-equilibrium simulations using the GB models are ∼3−4 times faster than
a
The PMF obtained from umbrella sampling simulations in explicit water was used as the reference. bDifference between the calculated and reference free energies of unfolding, where the reference value (2.7 kcal/mol) was calculated by using umbrella sampling in explicit water.
the related RMSDs. When making a comparison among different implicit models, the general trend of computational accuracy is similar for all the three runs, which is shown to be GBNeck ≥ GBNeck2 > GBOBCI ≈ GBOBCII > GBHCT. According to Zacharias and co-workers,21,22 it has been found that, for either pathway or alchemical free energy methods, the GBNeck variants performed better than the other GB models available in the AMBER package. Our results show that a similar trend also holds for the non-equilibrium work method. It is worth noting that for the variants of the GBNeck models the results are reasonable even with a pulling velocity of 10 Å/ns. Interestingly, compared with the explicit water, it can be observed that all GB models, even the least accurate GBHCT model, yielded smaller deviations when the switching process was relatively fast and driven away from equilibrium; given the fact that the numbers of simulations were the same for all models, it means that using GB models substantially improved the convergence of sampling. To better understand the differences in accuracy for the evaluated GB models, we now make a brief introduction to the theoretical background. In typical continuum solvation models,
Figure 4. Computational speeds of different explicit- and implicitsolvent models for non-equilibrium pulling simulations. The metric, relative speed, is defined as ⟨tref⟩/⟨tsim⟩, in which tref and tsim denote the elapsed wall-clock time for a reference explicit-solvent (∼2800 TIP3P water molecules) and specified simulations with the same pulling velocity (v = 10.0 Å/ns), respectively, and both are averaged over 100 repetitions. All of the simulations were carried out with a NVIDIA Tesla K20 graphic card, utilizing the CUDA version of PMEMD in AMBER 14. For more computational details, please refer to Materials and Methods. The real speedup of conformational sampling by using an implicit-solvent model can be much larger than the values presented in this figure, because it is attributed to not only the algorithmic speedup but also the improved convergence, as discussed in the text. 1831
DOI: 10.1021/acs.jctc.6b01139 J. Chem. Theory Comput. 2017, 13, 1827−1836
Article
Journal of Chemical Theory and Computation the one using the TIP3P water. The variants of the HCT and OBC models show a similar and the fastest speed, while the Neck variants are a little slower; however, the difference in speed among the tested GB models is less than 25%. The speedup is not surprising, because the implicit treatment of solvent is thought to be algorithmically faster than the explicit one, due to the absence of the solvent−solvent and solvent− solute interactions which are computationally intensive. On the other hand, more importantly, improved efficiency for nonequilibrium sampling is not only attributed to the fast computational speed but also the increased convergence by using implicit solvents, as is mentioned above. So, why does using an implicit-solvent model accelerate the convergence for the non-equilibrium work method? When a system is perturbed by the variation of external parameters, the process is generally accompanied by the dissipated work (Wdiss), which is given by Wdiss = W − ΔF
(6) 42
and reflects the Hamiltonian lag that develops between the actual state of the system and the equilibrium state corresponding to the current parameter value.43 Figure 5
Figure 6. Ensemble averages of work done on the system during nonequilibrium pulling simulations with different solvent models and pulling velocities showing the differences of dissipation.
Figure 5. Schematic distributions of work values for a microscopic system subjected to putative irreversible processes, for which there is ⟨Wdiss,s1⟩ < ⟨Wdiss,s2⟩ < ⟨Wdiss,s3⟩. The gray dashed line shows the exact value of free energy difference of interest.
helix−coil transition of the deca-alanine peptide. Evidently, increasing of pulling velocities yielded larger ⟨Wdiss⟩, thus making the estimates converge more slowly. More importantly, as shown in Figure 6, it is clear that the accelerated convergence by using an implicit-solvent model is due to reduction of dissipation in comparison with the explicit counterpart. It still remains a question why using an implicit-solvent model can reduce dissipation, and we seek to give a rationale for this. From the definition of Helmholtz free energy,
shows three putative distributions of work done on the same microscopic system during irreversible processes, for which the initial and final states are identical. We use it to illustrate the relation between dissipation and convergence when using the Jarzynski equality (eq 2). As is well-known, the exponential average in the equality depends crucially on a small fraction of realizations that yield a negative dissipated work and transiently violate the second law of thermodynamics (i.e., the work distributes on the left side of the dashed line in Figure 5).44 As is illustrated, a larger ⟨Wdiss⟩ makes the distribution of work wider, and thus the occurrence probabilities of the required work samples become smaller, which in turn requires more realizations to be simulated. In other words, it makes the estimates converge more slowly. Hence, it can be concluded that the convergence of the non-equilibrium work method when using the Jarzynski equality is related to ⟨Wdiss⟩ and reducing dissipation is a way to accelerate the convergence. Now, we come back to our test case. From eq 6, it can be simply derived that ⟨Wdiss⟩ = ⟨W ⟩ − ΔF
(8)
F ≡ U − TS
we can simply get (9)
ΔF = ΔU − T ΔS
By putting the first law of thermodynamics, ΔU = Q + W
(10)
and eq 6 into eq 9, we can derive a linear relationship between the dissipated work and the increase of entropy, ⎛ Q⎞ Wdiss = T ⎜ΔS − ⎟ = T ΔStotal ⎝ T⎠
(7)
Given that ΔF is the same for all, ⟨Wdiss⟩ can thus be simply inferred from ⟨W⟩ according to eq 7. Figure 6 shows the ensemble-average work (⟨W⟩) done on the system during the
(11)
and thus ⟨Wdiss⟩ = T ⟨ΔStotal⟩ 1832
(12) DOI: 10.1021/acs.jctc.6b01139 J. Chem. Theory Comput. 2017, 13, 1827−1836
Article
Journal of Chemical Theory and Computation
Figure 7. PMFs of the deca-alanine peptide along its helix−coil transition pathway reconstructed from (A) quasi-equilibrium and (B−D) nonequilibrium pulling simulations. The selected GB models with/without the SASA-dependent nonpolar solvation term were used in the simulations. For quasi-equilibrium simulations, the PMFs were obtained by arithmetic averaging over the accumulated work, whereas, in the cases of nonequilibrium simulations, the PMFs were calculated by exponential averaging based on the Jarzynski estimator.
hand, when comparing the speed of computation, it is found that, for both models, the inclusion of the nonpolar term brought a large amount of extra computation which hurts the performance with an ∼65% loss in speed, even though they are still faster than the use of explicit water (Figure 4). This is due to the fact that we were using the GPU implementation of AMBER,45 but currently the SASA term is still handled by a CPU, leading to frequent synchronization between the GPU and CPU memory. With the pure CPU implementation, however, the extra cost of SASA calculation would not be so significant. Despite its extensive use, the SASA dependence of the nonpolar part of solvent effects is actually problematic. When comparing the SASA based forces with the nonpolar forces generated by explicit solvent, Wagoner and Baker have found no significant correlation between the two.46 Chen and Brooks have also argued that this simple model tended to overestimate nonpolar interactions.47 While both repulsive and attractive components are involved in nonpolar solvation effects, the SASA based model seems to only account for the former. Although there are more accurate schemes available for dealing with the nonpolar solvation contributions,48,49 the lack of analytical derivatives or high computational expense renders them inappropriate for MD simulations. In this context, development of fast analytic methods for accurately describing the nonpolar solvation effects is necessary, yet challenging. Fortunately, it has been shown that even only the electrostatic part can achieve reasonable results in previous reports39,50 as well as our simulations, which can be partly ascribed to the fact that the magnitude of the nonpolar term is typically much smaller than the electrostatic contribution.46 Influence of Collision Frequency of Langevin Dynamics Thermostat. In our simulations, the Langevin dynamics thermostat was used to couple the simulation temperature. According to previous studies,51,52 the collision frequency (γ) of Langevin dynamics has an impact on the equilibrium
For simulations in explicit solvent, the increase of entropy of the solvent (⟨ΔSsolvent⟩) makes a considerable contribution to ⟨ΔStotal⟩; however, due to the absence of explicit-solvent molecules, this contribution is totally omitted in implicitsolvent simulations, thus leading to less dissipation in comparison with explicit-solvent simulations. Influence of the Nonpolar Solvation Term on the Accuracy and Speed of Computation of Implicit-Solvent Models. As mentioned above, the effective potential introduced by the GB models can be supplemented with an additional term representing the nonpolar interactions of solvent effects. However, the nonpolar term is often ignored in practical MD simulations. As a result, we only incorporated the electrostatic part of solvent effects into evaluation of potential energies and forces in the former simulations. To ascertain whether inclusion of a nonpolar solvation free energy term (Esol,np) makes things better, we also conducted simulations by including a SASA-dependent term, which is the only available one in the AMBER package for MD simulations, in the potential energy function:17 N
Esol,np = γ ∑ Ai i=1
(13)
where γ is the surface tension coefficient and Ai is the SASA of the ith atom. The two best-performing GB models, namely, the GBNeck and GBNeck2 models, were selected for the additional simulations, where they were denoted as GBNeck/SA and GBNeck2/SA, respectively. As shown in Figure 7, the inclusion of the nonpolar term generally enlarged the free energy value as well as the deviations from the reference PMF. This is confirmed quantitatively by the RMSDs between the PMF estimates and the reference PMF, as listed in Table 1. Hence, in our test case, it appears that inclusion of the SASA based nonpolar term did not make a significantly positive contribution to the accuracy. On the other 1833
DOI: 10.1021/acs.jctc.6b01139 J. Chem. Theory Comput. 2017, 13, 1827−1836
Article
Journal of Chemical Theory and Computation
Figure 8. PMFs of the deca-alanine peptide along its helix−coil transition pathway reconstructed from (A) quasi-equilibrium and (B−D) nonequilibrium pulling simulations. Different values of collision frequency (γ) of Langevin dynamics were used in the simulations. For quasi-equilibrium simulations, the PMFs were obtained by arithmetic averaging over the accumulated work, whereas, in the cases of non-equilibrium simulations, the PMFs were calculated by exponential averaging based on the Jarzynski estimator.
conformational sampling for implicit-solvent simulations; that is, a lower value leads to a faster sampling. In order to investigate its influence on non-equilibrium sampling, we conducted additional simulations for the two best-performing GB models, namely, GBNeck and GBNeck2, by using a much lower collision frequency (γ = 0.01 ps−1) compared with the value (γ = 5 ps−1) used throughout the above-mentioned simulations. The resulting PMFs are shown in Figure 8. It is found that when the pulling velocity is low or moderate (Figure 8A−C), the results are nearly the same for different values of collision frequency; however, under the high pulling velocity (Figure 8D), the convergence is significantly improved by using a lower collision frequency. Then we compared the dissipated work. As shown in Figure 9, the dissipated work is significantly reduced by using a lower collision frequency under the high pulling velocity (Figure 9C), but it is not so significant under the low or moderate pulling velocity (Figure 9A,B). The results show that when conducting non-equilibrium sampling under a high velocity, the use of a low collision frequency could further improve the convergence of sampling in our test case.
■
CONCLUSION Implicit-solvent models, particularly the GB models, are now becoming increasingly popular for rapid free energy calculations, to which field we have paid special attention. Most notably, combined with molecular mechanics calculations, the models are often employed in less theoretically rigorous approaches to estimate the free energy of binding,53 and acceptable accuracy and affordable cost can be achieved according to our previous reports.54−59 For more rigorous methods, when the GB models are utilized for conformational sampling, recent studies are quite encouraging in the context of equilibrium sampling.18−22
Figure 9. Ensemble averages of work done on the system during nonequilibrium pulling simulations with different solvent models, pulling velocities, and collision frequencies (γ) of Langevin dynamics showing the differences of dissipation.
1834
DOI: 10.1021/acs.jctc.6b01139 J. Chem. Theory Comput. 2017, 13, 1827−1836
Article
Journal of Chemical Theory and Computation
(12) Jensen, M. Ø.; Park, S.; Tajkhorshid, E.; Schulten, K. Energetics of Glycerol Conduction through Aquaglyceroporin Glpf. Proc. Natl. Acad. Sci. U. S. A. 2002, 99, 6731−6736. (13) Nicolini, P.; Frezzato, D.; Gellini, C.; Bizzarri, M.; Chelli, R. Toward Quantitative Estimates of Binding Affinities for Protein− Ligand Systems Involving Large Inhibitor Compounds: A Steered Molecular Dynamics Simulation Route. J. Comput. Chem. 2013, 34, 1561−1576. (14) Feig, M.; Brooks, C. L., III Recent Advances in the Development and Application of Implicit Solvent Models in Biomolecule Simulations. Curr. Opin. Struct. Biol. 2004, 14, 217−224. (15) Chen, J.; Brooks, C. L., III; Khandogin, J. Recent Advances in Implicit Solvent-Based Methods for Biomolecular Simulations. Curr. Opin. Struct. Biol. 2008, 18, 140−148. (16) Kleinjung, J.; Fraternali, F. Design and Application of Implicit Solvent Models in Biomolecular Simulations. Curr. Opin. Struct. Biol. 2014, 25, 126−134. (17) Still, W. C.; Tempczyk, A.; Hawley, R. C.; Hendrickson, T. Semianalytical Treatment of Solvation for Molecular Mechanics and Dynamics. J. Am. Chem. Soc. 1990, 112, 6127−6129. (18) Mobley, D. L.; Dill, K. A.; Chodera, J. D. Treating Entropy and Conformational Changes in Implicit Solvent Simulations of Small Molecules. J. Phys. Chem. B 2008, 112, 938−946. (19) Shivakumar, D.; Deng, Y.; Roux, B. Computations of Absolute Solvation Free Energies of Small Molecules Using Explicit and Implicit Solvent Model. J. Chem. Theory Comput. 2009, 5, 919−930. (20) Knight, J. L.; Brooks, C. L. Surveying Implicit Solvent Models for Estimating Small Molecule Absolute Hydration Free Energies. J. Comput. Chem. 2011, 32, 2909−2923. (21) Zeller, F.; Zacharias, M. Evaluation of Generalized Born Model Accuracy for Absolute Binding Free Energy Calculations. J. Phys. Chem. B 2014, 118, 7467−7474. (22) Ostermeir, K.; Zacharias, M. Rapid Alchemical Free Energy Calculation Employing a Generalized Born Implicit Solvent Model. J. Phys. Chem. B 2015, 119, 968−975. (23) Ozer, G.; Valeev, E. F.; Quirk, S.; Hernandez, R. Adaptive Steered Molecular Dynamics of the Long-Distance Unfolding of Neuropeptide Y. J. Chem. Theory Comput. 2010, 6, 3026−3038. (24) Bureau, H. R.; Merz, D. R.; Hershkovits, E.; Quirk, S.; Hernandez, R. Constrained Unfolding of a Helical Peptide: Implicit Versus Explicit Solvents. PLoS One 2015, 10, e0127034. (25) Onufriev, A.; Bashford, D.; Case, D. A. Exploring Protein Native States and Large-Scale Conformational Changes with a Modified Generalized Born Model. Proteins: Struct., Funct., Bioinf. 2004, 55, 383−394. (26) Salomon-Ferrer, R.; Case, D. A.; Walker, R. C. An Overview of the Amber Biomolecular Simulation Package. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2013, 3, 198−210. (27) Minh, D. D. L.; McCammon, J. A. Springs and Speeds in Free Energy Reconstruction from Irreversible Single-Molecule Pulling Experiments. J. Phys. Chem. B 2008, 112, 5892−5897. (28) Oberhofer, H.; Dellago, C. Efficient Extraction of Free Energy Profiles from Nonequilibrium Experiments. J. Comput. Chem. 2009, 30, 1726−1736. (29) Ozer, G.; Quirk, S.; Hernandez, R. Thermodynamics of Decaalanine Stretching in Water Obtained by Adaptive Steered Molecular Dynamics Simulations. J. Chem. Theory Comput. 2012, 8, 4837−4844. (30) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. Comparison of Simple Potential Functions for Simulating Liquid Water. J. Chem. Phys. 1983, 79, 926−935. (31) Berendsen, H. J. C.; Postma, J. P. M.; van Gunsteren, W. F.; DiNola, A.; Haak, J. R. Molecular Dynamics with Coupling to an External Bath. J. Chem. Phys. 1984, 81, 3684−3690. (32) Ryckaert, J.-P.; Ciccotti, G.; Berendsen, H. J. C. Numerical Integration of the Cartesian Equations of Motion of a System with Constraints: Molecular Dynamics of N-Alkanes. J. Comput. Phys. 1977, 23, 327−341.
Here, we focus on their performance in the other aspect, that is, the non-equilibrium sampling based free energy methods. For this reason, we evaluated the performance of five GB models with the non-equilibrium work method. As a result, we found that two evaluated models, GBNeck and GBNeck2, outperformed the others in terms of accuracy, which was comparable to the explicit-solvent model. On the other hand, as expected, substantial increases in the computational speed of sampling via implicit-solvent simulations over the explicit counterpart were observed. Furthermore, compared with the results of explicit-solvent simulations, the convergence of sampling was found to be significantly accelerated by using the implicit-solvent models due to reduction of dissipation, which is important for a reliable estimation of free energy and thus highlights the advantage of implicit-solvent models for non-equilibrium sampling. In view of the foregoing factors, we can arrive at the conclusion that using an accurate implicitsolvent model to mimic bulk solvent is an effective way to improve the efficiency of sampling for the non-equilibrium work method.
■
AUTHOR INFORMATION
Corresponding Author
*Tel.: +86-571-88208412. E-mail:
[email protected]. ORCID
Tingjun Hou: 0000-0001-7227-2580 Funding
This study was supported by the National Major Basic Research Program of China (Grants 2016YFA0501701 and 2016YFB0201700) and the National Science Foundation of China (Grants 21575128 and 81302679). Notes
The authors declare no competing financial interest.
■
REFERENCES
(1) Chipot, C. Frontiers in Free-Energy Calculations of Biological Systems. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2014, 4, 71−89. (2) Kirkwood, J. G. Statistical Mechanics of Fluid Mixtures. J. Chem. Phys. 1935, 3, 300−313. (3) Zwanzig, R. W. High-Temperature Equation of State by a Perturbation Method. I. Nonpolar Gases. J. Chem. Phys. 1954, 22, 1420−1426. (4) Torrie, G. M.; Valleau, J. P. Nonphysical Sampling Distributions in Monte Carlo Free-Energy Estimation: Umbrella Sampling. J. Comput. Phys. 1977, 23, 187−199. (5) Jarzynski, C. Nonequilibrium Equality for Free Energy Differences. Phys. Rev. Lett. 1997, 78, 2690−2693. (6) Crooks, G. E. Entropy Production Fluctuation Theorem and the Nonequilibrium Work Relation for Free Energy Differences. Phys. Rev. E: Stat. Phys., Plasmas, Fluids, Relat. Interdiscip. Top. 1999, 60, 2721− 2726. (7) Grubmuller, H.; Heymann, B.; Tavan, P. Ligand Binding: Molecular Mechanics Calculation of the Streptavidin Biotin Rupture Force. Science 1996, 271, 997−999. (8) Izrailev, S.; Stepaniants, S.; Balsera, M.; Oono, Y.; Schulten, K. Molecular Dynamics Study of Unbinding of the Avidin-Biotin Complex. Biophys. J. 1997, 72, 1568−1581. (9) Neuman, K. C.; Nagy, A. Single-Molecule Force Spectroscopy: Optical Tweezers, Magnetic Tweezers and Atomic Force Microscopy. Nat. Methods 2008, 5, 491−505. (10) Park, S.; Khalili-Araghi, F.; Tajkhorshid, E.; Schulten, K. Free Energy Calculation from Steered Molecular Dynamics Simulations Using Jarzynski’s Equality. J. Chem. Phys. 2003, 119, 3559−3566. (11) Colizzi, F.; Bussi, G. RNA Unwinding from Reweighted Pulling Simulations. J. Am. Chem. Soc. 2012, 134, 5173−5179. 1835
DOI: 10.1021/acs.jctc.6b01139 J. Chem. Theory Comput. 2017, 13, 1827−1836
Article
Journal of Chemical Theory and Computation (33) Miyamoto, S.; Kollman, P. A. SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithm for Rigid Water Models. J. Comput. Chem. 1992, 13, 952−962. (34) Essmann, U.; Perera, L.; Berkowitz, M. L.; Darden, T.; Lee, H.; Pedersen, L. G. A Smooth Particle Mesh Ewald Method. J. Chem. Phys. 1995, 103, 8577−8593. (35) Maier, J. A.; Martinez, C.; Kasavajhala, K.; Wickstrom, L.; Hauser, K. E.; Simmerling, C. ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. J. Chem. Theory Comput. 2015, 11, 3696−3713. (36) Kumar, S.; Rosenberg, J. M.; Bouzida, D.; Swendsen, R. H.; Kollman, P. A. The Weighted Histogram Analysis Method for FreeEnergy Calculations on Biomolecules. I. The Method. J. Comput. Chem. 1992, 13, 1011−1021. (37) Hawkins, G. D.; Cramer, C. J.; Truhlar, D. G. Parametrized Models of Aqueous Free Energies of Solvation Based on Pairwise Descreening of Solute Atomic Charges from a Dielectric Medium. J. Phys. Chem. 1996, 100, 19824−19839. (38) Mongan, J.; Simmerling, C.; McCammon, J. A.; Case, D. A.; Onufriev, A. Generalized Born Model with a Simple, Robust Molecular Volume Correction. J. Chem. Theory Comput. 2007, 3, 156−169. (39) Nguyen, H.; Roe, D. R.; Simmerling, C. Improved Generalized Born Solvent Model Parameters for Protein Simulations. J. Chem. Theory Comput. 2013, 9, 2020−2034. (40) Weiser, J.; Shenkin, P. S.; Still, W. C. Approximate Atomic Surfaces from Linear Combinations of Pairwise Overlaps (LCPO). J. Comput. Chem. 1999, 20, 217−230. (41) Nicolini, P.; Procacci, P.; Chelli, R. Hummer and Szabo-Like Potential of Mean Force Estimator for Bidirectional Nonequilibrium Pulling Experiments/Simulations. J. Phys. Chem. B 2010, 114, 9546− 9554. (42) Pearlman, D. A.; Kollman, P. A. The Lag between the Hamiltonian and the System Configuration in Free Energy Perturbation Calculations. J. Chem. Phys. 1989, 91, 7831−7839. (43) Vaikuntanathan, S.; Jarzynski, C. Dissipation and Lag in Irreversible Processes. EPL 2009, 87, 60005. (44) Jarzynski, C. Equalities and Inequalities: Irreversibility and the Second Law of Thermodynamics at the Nanoscale. Annu. Rev. Condens. Matter Phys. 2011, 2, 329−351. (45) Götz, A. W.; Williamson, M. J.; Xu, D.; Poole, D.; Le Grand, S.; Walker, R. C. Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 1. Generalized Born. J. Chem. Theory Comput. 2012, 8, 1542−1555. (46) Wagoner, J.; Baker, N. A. Solvation Forces on Biomolecular Structures: A Comparison of Explicit Solvent and Poisson−Boltzmann Models. J. Comput. Chem. 2004, 25, 1623−1629. (47) Chen, J.; Brooks, C. L., III. Implicit Modeling of Nonpolar Solvation for Simulating Protein Folding and Conformational Transitions. Phys. Chem. Chem. Phys. 2008, 10, 471−481. (48) Wagoner, J. A.; Baker, N. A. Assessing Implicit Models for Nonpolar Mean Solvation Forces: The Importance of Dispersion and Volume Terms. Proc. Natl. Acad. Sci. U. S. A. 2006, 103, 8331−8336. (49) Tan, C.; Tan, Y.-H.; Luo, R. Implicit Nonpolar Solvent Models. J. Phys. Chem. B 2007, 111, 12263−12274. (50) Nguyen, H.; Maier, J.; Huang, H.; Perrone, V.; Simmerling, C. Folding Simulations for Proteins with Diverse Topologies Are Accessible in Days with a Physics-Based Force Field and Implicit Solvent. J. Am. Chem. Soc. 2014, 136, 13959−13962. (51) Feig, M. Kinetics from Implicit Solvent Simulations of Biomolecules as a Function of Viscosity. J. Chem. Theory Comput. 2007, 3, 1734−1748. (52) Anandakrishnan, R.; Drozdetski, A.; Walker, R. C.; Onufriev, A. V. Speed of Conformational Change: Comparing Explicit and Implicit Solvent Molecular Dynamics Simulations. Biophys. J. 2015, 108, 1153− 1164. (53) Gohlke, H.; Kiel, C.; Case, D. A. Insights into Protein−Protein Binding by Binding Free Energy Calculation and Free Energy Decomposition for the Ras−Raf and Ras−RalGDS Complexes. J. Mol. Biol. 2003, 330, 891−913.
(54) Hou, T.; Wang, J.; Li, Y.; Wang, W. Assessing the Performance of the MM/PBSA and MM/GBSA Methods. 1. The Accuracy of Binding Free Energy Calculations Based on Molecular Dynamics Simulations. J. Chem. Inf. Model. 2011, 51, 69−82. (55) Hou, T.; Wang, J.; Li, Y.; Wang, W. Assessing the Performance of the Molecular Mechanics/Poisson Boltzmann Surface Area and Molecular Mechanics/Generalized Born Surface Area Methods. II. The Accuracy of Ranking Poses Generated from Docking. J. Comput. Chem. 2011, 32, 866−877. (56) Xu, L.; Sun, H.; Li, Y.; Wang, J.; Hou, T. Assessing the Performance of MM/PBSA and MM/GBSA Methods. 3. The Impact of Force Fields and Ligand Charge Models. J. Phys. Chem. B 2013, 117, 8408−8421. (57) Sun, H.; Li, Y.; Tian, S.; Xu, L.; Hou, T. Assessing the Performance of MM/PBSA and MM/GBSA Methods. 4. Accuracies of MM/PBSA and MM/GBSA Methodologies Evaluated by Various Simulation Protocols Using PDBbind Data Set. Phys. Chem. Chem. Phys. 2014, 16, 16719−16729. (58) Sun, H.; Li, Y.; Shen, M.; Tian, S.; Xu, L.; Pan, P.; Guan, Y.; Hou, T. Assessing the Performance of MM/PBSA and MM/GBSA Methods. 5. Improved Docking Performance Using High Solute Dielectric Constant MM/GBSA and MM/PBSA Rescoring. Phys. Chem. Chem. Phys. 2014, 16, 22035−22045. (59) Chen, F.; Liu, H.; Sun, H.; Pan, P.; Li, Y.; Li, D.; Hou, T. Assessing the Performance of the MM/PBSA and MM/GBSA Methods. 6. Capability to Predict Protein-Protein Binding Free Energies and Re-Rank Binding Poses Generated by Protein-Protein Docking. Phys. Chem. Chem. Phys. 2016, 18, 22129−22139.
1836
DOI: 10.1021/acs.jctc.6b01139 J. Chem. Theory Comput. 2017, 13, 1827−1836