Letter pubs.acs.org/JPCL
Achieving Rigorous Accelerated Conformational Sampling in Explicit Solvent Urmi Doshi and Donald Hamelberg* Department of Chemistry and the Center for Biotechnology and Drug Design, Georgia State University, P.O. Box 3965, Atlanta, Georgia 30302-3965, United States S Supporting Information *
ABSTRACT: Molecular dynamics simulations can provide valuable atomistic insights into biomolecular function. However, the accuracy of molecular simulations on general-purpose computers depends on the time scale of the events of interest. Advanced simulation methods, such as accelerated molecular dynamics, have shown tremendous promise in sampling the conformational dynamics of biomolecules, where standard molecular dynamics simulations are nonergodic. Here we present a sampling method based on accelerated molecular dynamics in which rotatable dihedral angles and nonbonded interactions are boosted separately. This method (RaMD-db) is a different implementation of the dual-boost accelerated molecular dynamics, introduced earlier. The advantage is that this method speeds up sampling of the conformational space of biomolecules in explicit solvent, as the degrees of freedom most relevant for conformational transitions are accelerated. We tested RaMD-db on one of the most difficult sampling problems − protein folding. Starting from fully extended polypeptide chains, two fast folding α-helical proteins (Trpcage and the double mutant of C-terminal fragment of Villin headpiece) and a designed β-hairpin (Chignolin) were completely folded to their native structures in very short simulation time. Multiple folding/unfolding transitions could be observed in a single trajectory. Our results show that RaMD-db is a promisingly fast and efficient sampling method for conformational transitions in explicit solvent. RaMD-db thus opens new avenues for understanding biomolecular self-assembly and functional dynamics occurring on long time and length scales. SECTION: Biophysical Chemistry and Biomolecules
M
advances have opened avenues to investigate processes that are of biological importance and occur on long time scales and length scales. Of particular interest has been the folding of proteins, a long-standing problem in biophysical chemistry.14 Starting from an unstructured polypeptide chain, a protein selfassembles into a well-defined 3-D conformation, as a result of simultaneous transitions of several degrees of freedom over large length scales. Protein folding thus presents a daunting challenge of conformational sampling and an ideal model process for testing enhanced sampling methodologies. Many natural and designed small, fast folding proteins (or domains) that typically fold on the microsecond time scale have now been characterized experimentally and investigated computationally.15−17 With improvements in the force fields, it has been possible to fold and unfold representatives from different
olecular dynamics (MD) simulations have now become a routine tool in understanding conformational dynamics in biomolecules and gaining atomic-level insights into mechanisms of biomolecular functions. However, the accuracy of the results from MD depends on two factors: how accurately the force field is able to model the intra- and intermolecular physical forces in solvated biomolecules and how accurately (and desirably efficiently) the relevant conformational sampling space is explored. Addressing the force field and sampling problems is interdependent and expected to continue for a long time, as there is room for substantial improvements in both areas. Ongoing efforts to extend MD to time scales beyond microseconds have resulted in many notable advanced sampling techniques,1,2 remarkable progress in specialized computational architectures,3−5 and parallel and distributed computing.6 Concurrently, continued validation of existing physical force fields for proteins7 and nucleic acids8,9 has led to notable improvements in key backbone10−12 and side-chain13 torsional parameters. These © 2014 American Chemical Society
Received: January 26, 2014 Accepted: March 12, 2014 Published: March 12, 2014 1217
dx.doi.org/10.1021/jz500179a | J. Phys. Chem. Lett. 2014, 5, 1217−1224
The Journal of Physical Chemistry Letters
Letter
the total potential energy (see later).47 This dual boost approach was shown to perform better in sampling the conformational space (i.e., backbone ϕ−ψ space) in model peptides with increased efficiency and faster convergence rates as compared with conventional MD. Moreover, the rate of loop formation in a model peptide consisting of nine residues was also increased as a result of speeding the diffusive motions. We present the RaMD method in combination with dual boosting that differs from the previous implementation. We further report on the performance evaluation of this RaMDdual boost (RaMD-db) method for the most notorious conformational sampling problem, that is, protein folding. The main focus of our current study is to address the problem of efficient conformational sampling. Therefore, we use the force field that has been proven to be reasonably robust and to successfully fold proteins.13,48 Specifically, we employ two αhelical proteins − the designed Trp-cage 20-residue miniprotein49 and the engineered double mutant (i.e., Lys65Nle/ Lys70Nle with Asn68His) of the 35-residue C-terminal subdomain of the villin headpiece (henceforth referred to as Villin-Nle/Nle)50 and two β-hairpins, the designed hairpin Chignolin variant51 and the β-hairpin-forming peptide derived from Nuclear factor erythroid 2-related factor 2 (Nrf2 hairpin).52 Experimentally, Trpcage exhibits a folding rate of 4 μs at 296 K53 and Villin-Nle/Nle is estimated to fold in 0.7 μs at 300 K50 (as compared with the wild-type villin that folds in 4.3 ± 0.5 μs at 300 K). The Chignolin variant has the same central sequence as the originally designed Chignolin, except for the terminal Gly residues that are modified to Tyr. Despite being a 10-residue peptide, Chignolin could be crystallized and has been regarded as a mini-protein.51 The 16-mer peptide derived from the Nrf2 protein has also been shown to independently form a stable β-hairpin structure.52 Because of their small sizes and fast folding nature, they are ideal candidates for atomistic simulations and have been the subject of several computational studies in implicit28,31−33,54,55 and explicit19,24−26,35,52,56−61 solvent. For each protein or peptide, we started with a fully extended structure in our all-atom explicit solvent simulations with the RaMD-db method. In brief, a continuous and non-negative bias potential, ΔV(r), is added to the potential energy, V(r), when it falls below a boost energy, E, which is set prior to starting the aMD run.37 The potential energy surface is not modified when V(r) ≥ E. In the aMD method, the bias potential is expressed as ΔV(r) = (E − V(r))2/(α + E − V(r)), where α is a preset tuning parameter that, along with E, defines the level of acceleration. This form of the bias potential prevents the derivative of the potential energy (and hence the force) from being discontinuous at points on r when V(r) = E. The potential energy functional most commonly used in fixed-charged physical force fields, that is, V(r) = VB(r) + Vθ(r) + VD(r) + VNB(r), includes terms arising from oscillations around equilibrium bonds, VB(r), oscillations around equilibrium angles, Vθ(r), rotations around torsional angles, VD(r), and pairwise nonbonded van der Waals and electrostatic interactions, VNB(r). In the dual-boost method previously introduced,47 all degrees of freedom were explicitly accelerated by adding a boost to the total potential, with an extra secondary boost applied only to the total dihedral angles: V*(r) = [VB(r) + Vθ(r) + [VD(r) + ΔVD(r)] + VNB(r)] +ΔV(r). In the RaMD-db method, a bias potential, ΔVRotD(r), is applied only to the rotatable torsional energy, VRotD(r), and a second, separate boost, ΔVNB(r), is added to only the potential * (r)= energy of the nonbonded interactions, that is, VRaMD‑db
structural classes in long atomistic MD simulations on the specialized supercomputer, Anton.18,19 Except for these and a handful of other unbiased MD studies,16,20 protein folding simulations on the more commonly available general-purpose computational resources have been implemented with the use of enhanced sampling methods, including replica exchange MD,21−23 transition path sampling,24 bias-exchange metadynamics,25,26 replica exchange with solute tempering,27 and integrated tempering MD.28 While some of the simulations utilized coarse-grained29,30 models or implicit solvation,31−34 others that were carried out in all-atom details have mainly employed replica exchange.24,35 Despite being the most extensively used method for protein folding, replica exchange suffers from the limitation of using large number of replicas for even small solvated proteins and spending vast amount of simulation time at undesired temperatures. We have developed an improved enhanced sampling approach36 based on accelerated MD37 that allows simulating without constraining the phase space of biomolecules. One needs not specify the reaction coordinates or collective variables before setting up an accelerated MD run because prior knowledge of the topography of the potential energy landscape is not required. By introducing a bias potential, the escape rates from minima on the potential energy surface are accelerated. Subsequently, thermodynamic properties on the original potential can be accurately recovered in a straightforward post-reweighting procedure. In principle, kinetic properties of the system on the original potential can also be retrieved if certain conditions are met.38,39 When acceleration is applied, typically the slower modes are sped up, thus allowing the investigation of long time-scale events. Accelerated MD has been extensively used for studying a variety of biological problems including native-state dynamics in proteins,40−42 catalytic mechanisms of HIV-1 protease,43 and cyclophilin A (a cis−trans isomerase)44 and the role of conformational dynamics in the catalytic function of cyclophilin A.45 It has been utilized for testing and refining AMBER force-field parameters for peptide bond torsions.11 One other major advantage of accelerated MD is that one can tune the level of acceleration applied, which is generally not possible in other biasing methods. This feature has permitted increasing the rate of conformational sampling in proteins and test the appropriate level of acceleration required to reproduce experimental NMR observables such as chemical shifts, S2 order parameters, residual dipolar and scalar J couplings.40,46 In the previous implementations of accelerated MD, the boost potential has been applied ordinarily to all torsional degrees of freedom.37 However, recently, we have shown that accelerating only the rotatable torsions (the RaMD method), the degrees of freedom most pertinent to conformational changes, one can reproduce the original free-energy surface with significantly fewer statistical errors, introduced due to reweighting of the configurations.36 In addition to the internal rotational dihedrals, the slow diffusion of the surrounding solvent can also impede conformational transitions, especially when large length-scale displacements are involved. The diffusive solvent motions are dominantly controlled by the nonbonded interactions, that is, the van der Waals and the pairwise electrostatic interaction energy terms of the force field. Therefore, in the past, an accelerated MD approach has been implemented in which the total torsional degrees of freedom and the diffusive motions were accelerated simultaneously by applying two bias potentials − one only to the total dihedral energy and the other one to 1218
dx.doi.org/10.1021/jz500179a | J. Phys. Chem. Lett. 2014, 5, 1217−1224
The Journal of Physical Chemistry Letters
Letter
Figure 1. Folded structure of Trp-cage and root-mean-squared deviation with respect to the experimental structure along the trajectory. (Left panel) Simulations starting from an extended configuration sampled a fully folded structure (red) with root-mean-squared deviation (RMSD) of 0.45 Å from the reference experimental structure (green), that is, the first structure from PDB ID: 1L2Y. (Right panel) RMSD to the native structure as a function of time in the 1.7 μs long trajectory. RMSDs reported in this Figure were calculated using Cα atoms of all residues and heavy atoms of Trp6 (shown as sticks in the left panel).
Figure 2. Reweighted 2-D free-energy landscapes of Trp-cage and test for convergence. Free-energy profiles projected onto (A) RMSDCα,Trp, that is, RMSD to native structure 1L2Y, calculated with Cα atoms of all residues and heavy atoms of Trp6 and radius of gyration, Rg, computed using all atoms. (B) RMSD of the polyproline region (green in the right panel) after rms fitting the backbone of all the residues (Supporting Information) and RMSD of the α-helical region (RMSDhelix) shown in red in the Trp-cage structure (top right). (C) RMSDCα,Trp and RMSDhelix. (D) RMSD of residues involved in forming the hydrophobic core (Trp6, Pro12, Pro17−19) shown in yellow in the Trp-cage structure and distance between Cγ and Cζ atoms (cyan spheres in the Trp-cage structure) of Asp9 and Arg16 (cyan sticks), respectively. Asp9 and Arg16 are known to form a stabilizing salt bridge (orange). Two-dimensional Gaussian kernel was used for probability density estimation (Supporting Information). Contour lines are drawn every 5 kcal/mol. (E) Folding free energy, ΔGf, as a function of simulation lengths converged to a value of −0.88 kcal/mol after 500 ns.
VT(r) + [VRotD(r)+ ΔVRotD(r)] + [VNB(r) + ΔVNB(r)]. Here VT(r) represents the total potential energy arising from oscillations around equilibrium bonds and angles as well as from nonrotatable and improper torsions, that is, VT(r) = VB(r) + Vθ(r) + VnonRotD(r) + VImprD(r). Therefore, only the rotatable
torsions and nonbonded interactions directly experience the acceleration, while the rest of the degrees of freedom are simulated on their original potential. The force on the modified * (r) = −∇VT(r) − potential is then given by F* = −∇VRaMD‑db [∇V RotD (r)(α RotD /(α RotD + E RotD − V RotD (r))) 2 ] − 1219
dx.doi.org/10.1021/jz500179a | J. Phys. Chem. Lett. 2014, 5, 1217−1224
The Journal of Physical Chemistry Letters
Letter
Figure 3. Superposition of experimental and folded structures of Villine-Nle/Nle and time series of Cα-RMSD with respect to the experimental structure. (Left) Starting from an extended polypeptide chain, Villin-Nle/Nle folded to a structure (red) with RMSD of 0.46 Å with respect to the PDB structure 2F4K (green). (Right) Cα-RMSD of residues 4−32 along two independent RaMD-db trajectories (top and bottom).
[∇VNB(r)(αNB/(αNB + ENB − VNB(r)))2]. Equilibrium properties on the original potential are retrieved by reweighting each configuration by β[ΔVRotD(r) + ΔVNB(r)], where β is 1/kBT. As compared with boosting all degrees of freedom, the advantage of accelerating only the diffusive degrees of freedom between protein−solvent, (intra) protein−protein, solvent− solvent, and rotational torsions of protein is the relatively reduced magnitudes of the statistical weights and thereby increased statistical accuracy in obtaining the true equilibrium properties. Moreover, applying acceleration to the already faster motions such as bond vibrations and angle rotations or nonrotatable torsions that do not alter significantly upon conformational changes will have less impact on speeding up the overall conformational sampling of a biomolecule. From a single RaMD-db trajectory, we could reproducibly observe several folding−unfolding transitions in Trpcage with an average folding time (Supporting Information) of 78 ns at 300 K for the trajectory of ∼1700 ns shown in Figure 1. A previous study on Trpcage using normal MD on Anton obtained an average folding time of 14 μs at 296 K.19 Therefore, under similar conditions (i.e., at 300 K), the RaMDdb method sped up the average folding time in Trpcage by a factor of ∼180 times. Configurations with RMSD (calculated with Cα and heavy atoms of Trp6)