Determining Protein Folding Pathway and Associated Energetics

Jan 25, 2017 - Replica exchange molecular dynamics (REMD) and integrated-tempering-sampling (ITS) are two representative enhanced sampling methods ...
0 downloads 0 Views 6MB Size
Article pubs.acs.org/JCTC

Determining Protein Folding Pathway and Associated Energetics through Partitioned Integrated-Tempering-Sampling Simulation Qiang Shao,*,† Jiye Shi,*,‡ and Weiliang Zhu† †

Drug Discovery and Design Center, CAS Key Laboratory of Receptor Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China ‡ UCB Biopharma SPRL, Chemin du Foriest, 1420 Braine-l’Alleud, Belgium S Supporting Information *

ABSTRACT: Replica exchange molecular dynamics (REMD) and integrated-tempering-sampling (ITS) are two representative enhanced sampling methods which utilize parallel and integrated tempering approaches, respectively. In this work, a partitioned integrated-tempering-sampling (P-ITS) method is proposed which takes advantage of the benefits of both parallel and integrated tempering approaches. Using P-ITS, the folding pathways of a series of proteins with diverse native structures are explored on multidimensional free-energy landscapes, and the associated thermodynamics are evaluated. In comparison to the original form of ITS, P-ITS improves the sampling efficiency and measures the folding/unfolding thermodynamic quantities more consistently with experimental data. In comparison to REMD, P-ITS significantly reduces the requirement of computational resources and meanwhile achieves similar simulation results. The observed structural characterizations of transition and intermediate states of the proteins under study are in good agreement with previous experimental and simulation studies on the same proteins and homologues. Therefore, the P-ITS method has great potential in simulating the structural dynamics of complex biomolecular systems.



INTRODUCTION Proteins are inherently dynamic which display a broad range of structural dynamics in functional motions spanning from protein folding, allosteric regulation, to ligand binding, etc.1−6 The associated free-energy landscapes are usually rugged,7−9 containing not only low-energy functional states which can be visualized experimentally to high resolution but also metastable states which are sparsely populated and thus difficult to be resolved experimentally. Proteins transit among states following the most easily accessible directions that require the least energy ascent on multidimensional free-energy landscapes. Therefore, computing the detailed maps of free-energy landscapes is indispensable for a deep understanding of the molecular mechanisms of protein functional motions that are not easily accessible from experimental measures. In computational biology, molecular dynamics (MD) is an attractive approach to produce time-resolved trajectory at an atomistic level which provides access to construct free-energy landscapes underlying protein motion. The full exploration of © 2017 American Chemical Society

slow functional motion occurring on a long time-scale is, however, insufficient for conventional MD simulation. It is noteworthy that recent encouraging advances in computational power promote MD simulation to reach an important milestone: the use of specialized hardware such as Anton10 or GPUs11,12 and a cloud-computing platform13 extends the simulation time-scale to hundreds of microseconds and even milliseconds. In such a time-scale level, biological events, e.g., the folding and unfolding of fast-fold proteins,14−19 the conformational transitions of specific enzymes, and GPCR proteins,13,20,21 as well as spontaneous drug binding to target proteins can be sampled repeatedly.22,23 However, although the application of specialized computer hardware and cloudcomputing platform may indicate the tendency of biomolecular simulation in the future, the contemporary availability of such initiatives in broad scientific community is limited. In light of Received: September 30, 2016 Published: January 25, 2017 1229

DOI: 10.1021/acs.jctc.6b00967 J. Chem. Theory Comput. 2017, 13, 1229−1243

Article

Journal of Chemical Theory and Computation

requirement of computational resources or the difficulty in determining converged parameters in the presence of a large collection of degrees of freedom. A partitioned ITS approach (hereinafter referred to as PITS) is proposed here to combine the potential energy modification technique from the original form of the ITS method (hereinafter referred to as single-trajectory ITS) and the conformation exchange idea from REMD. A large temperature range, which is integrated in single trajectory in the original form of ITS, is now divided into n continuous small ranges (T0 ∼ T1, ... Tn‑1 ∼ Tn, from low to high, see the schematic in Figure 1) and partitioned into n trajectories

this, a bunch of enhanced conformational sampling algorithms have been developed to sample the configuration space of protein in accelerated manners.24−27 Since the probabilities of global states follow a Boltzmann distribution in equilibrium, in a conventional MD simulation, all conformational states should be observed (visited) with populations substantially following their Boltzmann-weighted ones. Raising the temperature is a natural way of accelerating the dynamics of the simulated system,28,29 which, however, has a side effect: while the rate of process increases with temperature, the relative probabilities of state visiting and event occurring might be different than those at the original temperature of interest. “Tempering” based enhanced sampling methods have been proposed to remove such a side effect which use high temperature as a way to overcome barriers30,31 and meanwhile utilize specific protocols to reweight the equilibrium distribution that is possibly deviated from the canonical Boltzmann distribution. Replica exchange molecular dynamics (REMD) is a representative enhanced sampling method which employs a parallel tempering protocol to make extensive equilibrium sampling of biomolecules. In REMD, parallel trajectories (replicas) at discrete temperatures are performed. Simulated conformations from neighboring replicas are exchanged at set intervals which allow REMD to make a random walk on temperature space (and potential energy surface as well). Metropolis criterion is utilized to control the acceptance probability of conformation exchange which preserves the Boltzmann distribution at each temperature.32 In the original and most common form of REMD, only temperature and potential energy need to be communicated among replicas, making it simple to implement for computation. On the other hand, to get efficient sampling in REMD, the number of replicas in general increases as N1/2, where N is the number of degrees of freedom of the system.31,33 Therefore, a large amount of parallel replicas are embarrassingly needed in REMD simulation of complex systems containing a large number of degrees of freedom, leading to the huge requirement of computational resources accordingly. Besides the parallel tempering, an integrated tempering sampling (ITS) method has been also developed which utilizes the temperatures integrated in single MD trajectory to modify the sampling distribution over the desired potential energy range,34,35 similar to the enveloping distribution sampling method.36,37 To do so, a sum-over-temperature non-Boltzmann distribution factor is introduced in molecular simulation and a series of parameters involved are set to force the sampling evenly distributed in a broad range of potential energy and thus allow for the conformation collection on a smoothened potential energy surface. The real equilibrium distribution of each conformational state can be reweighted back to the Boltzmann-weighted one using a specific reweighting factor. In contrast to REMD, the ITS method avoids running parallel replicas and operating conformation exchange and thus requires much fewer computational resources.38,39 On the other hand, the ability of ITS in smoothening potential energy surface solely depends on the converged data of the parameters in the non-Boltzmann distribution factor, which is, however, difficult to obtain in the simulations of large-size and/or complex systems that usually contain rigid potential energies. Therefore, although both REMD and ITS methods have been widely used in small biomolecular systems,32,40−47 their application in largesize and complex systems is largely limited by either the huge

Figure 1. Schematic description of the partitioned ITS (P-ITS) method.

(replicas), accordingly. The ITS technique is then used in individual replicas containing partitioned temperature ranges to enhance the sampling in the corresponding potential energy ranges. To make a sufficient sampling with all relevant conformations involved, conformations from neighboring replicas could be exchanged at set intervals. The simulation data in all P-ITS replicas are combined eventually to cover the entire configuration space and construct global free-energy landscapes. In comparison to single-trajectory ITS, P-ITS alleviates the difficulties in smoothening the entire potential energy surface of the complex system through the separate sampling in individual partitioned replicas (covering shortened potential energy ranges). On the other hand, in comparison to REMD, P-ITS can reduce the number of replicas used in simulation since a range of temperatures but not a single temperature is covered in each replica. Therefore, in comparison to single-trajectory ITS and REMD, the P-ITS approach proposed here could be implemented for computation more conveniently and has a larger potential in modeling complex systems. To demonstrate the reliability of the P-ITS method in conformational sampling and free-energy evaluation, the folding of a variety of proteins/peptides was simulated at an atomiclevel using such a method (see proteins/peptides and their amino-acid sequences in Table S1 in the Supporting Information). First, three simple peptides including a K8A mutant of the thermostable Trpcage variant TC10b (TC10bK8A),48 a chignolin variant β-hairpin named CLN025,49 and the C-terminal β-hairpin (residues 41−56) from the B1 domain of protein G (GB 1p)50 were used as test models. The characterized structural properties and relevant energetics of folding pathways of the three peptides measured by P-ITS simulations were compared to those from accompanying singletrajectory ITS simulations, which clearly indicate better 1230

DOI: 10.1021/acs.jctc.6b00967 J. Chem. Theory Comput. 2017, 13, 1229−1243

Article

Journal of Chemical Theory and Computation

the total distribution. We define Pi,k = ni,k∫ re−βi,kVi′(r)dr and preselect the ratios between Pi,k’s for all k between 1 and N. N Using the normalized quantities ( pi , k = Pi , k /∑k = 1 Pi , k ), we then 0 }. For the desired even set fixed expectation values {pi,k th distribution of Pi,k’s in the i replica, pi,k is simply equal to 1/ N. In a preliminary iteration process, pi,k can be calculated iteratively. Accordingly, ni,k can be calculated and updated to the converged values according to the ratios between neighboring pi,k values. Let us define a series of number mi,k:

conformational sampling ability of P-ITS than single-trajectory ITS. In addition, the P-ITS simulations were also compared to various previously reported REMD simulations including Nguyen et al.51 and an accompanying explicit solvent REMD simulation, indicating that P-ITS requires much less computational resources than REMD to achieve consistent results. Second, the folding pathways of two more complex proteins, Fip35 WW domain variant GTT52 and the third Ig-binding domain of protein G from streptococcal bacteria (GB3),53 were explored on multidimensional free-energy landscapes by using P-ITS simulations to demonstrate the applicability of P-ITS in complex proteins and reveal their respective folding mechanisms. Finally, the folding/unfolding thermodynamic quantities of the proteins/peptides were compared directly to available experimental data, indicating the efficiency of the P-ITS method in free-energy evaluation.

⎧ 1 k=1 mi , k = ⎨ ⎩ ni , k + 1/ni , k 1 < k ≤ N ⎪

The original series of n i,k is related to m i,k by k ni , k = ni ,1 ∏ j = 1 mi , j . The preliminary iteration process for achieving converged values of discrete ni,k’s in each replica of P-ITS is performed as follows: 1. A set of initial values of mi,k’s (m0i,k) is chosen to calculate the initial values n0i,k, which are then used to run a preliminary MD simulation. 2. From the trajectory of preliminary MD, pi,k are calculated (p i,k (0)), and a new set of m i,k ’ s is achieved by



COMPUTATIONAL METHODS Basic Concept of P-ITS Simulation. The potential energy of the simulated system can be modified through the following integration function over temperature in an integrated tempering approach34,35 V′ = −

1 ln β0

where β =

1 kBT

∫β f (β)e−βV dβ

p (0)

mi , k (1) = mi , k (0) p i ,k

(1)

i ,k+1

pi , k (1)

(kB is the Boltzmann constant and T is the

pi , k + 1 (1)

temperature), β0 is the inverse temperature of the simulation system, and V is the “real” unmodified potential energy of a system given by a detailed force field. From an enhanced sampling perspective, the sampling range of potential energy V′ in eq 1 can be expanded large enough to cover the conformations with all energy levels. The expansion of the potential energy range solely depends on the temperature range β’s integrated in eq 1. To alleviate the sampling difficulty over a large energy range, the entire temperature range can be partitioned into n small ranges. The conformational sampling is then made separately in n replicas containing corresponding small temperature ranges. Then the modified energy can be rewritten as a summation over the integration in continuous small temperature ranges: ⎛ 1 V ′ = ∑ Vi′ = ∑ ⎜⎜ − ln β0 i=1 i=1 ⎝ n

n

⎞ f (βi )e−βiVi dβi ⎟⎟ βi ⎠

j−2

+

n

i=1

i=1

N ⎞ 1 ln ∑ ni , k e−βi ,kVi ⎟⎟ ⎝ β0 k = 1 ⎠

(2)

(6)

∑k = 1 βi , k ni , k e−βi ,kVi ∂Vi′ = Fi N ∂r β ∑k = 1 ni , k e−βi ,kVi

(7)

where Fi is the molecular force in the original system of ith replica. Conformation Exchange among Replicas in P-ITS. To ensure the configuration space of protein is fully sampled, the conformations from neighboring replicas are exchanged in the long-time production runs. Since all conformations in P-ITS simulation are sampled with non-Boltzmann distribution and their real visiting probabilities are supposed to be specifically reweighted for the statistical analysis of the ensemble average of any quantities (see “Statistical Analysis Constructing FreeEnergy Landscapes” below), in the present study, the conformation exchange is simply enforced at constant intervals (per 5.0 ns) in P-ITS simulations to optimally increase the sampling efficiency.

(3)

i , k Vi

k=1

⎤ pk (l)pk + 1 (l) )⎥ ⎥⎦

N

Fi′ = −

N

∑ ni ,ke−β

⎡ p (j − 1) ⎢ p (j − 1)p (j − 1) k k k+1 pk + 1 (j − 1) pk (l)pk + 1 (l) ) ⎢⎣

mi , k (j − 1) j−1 ∑l = 0

Until ni,k’s in each replica converge, the preliminary iteration process can be stopped. The long-time production run is then performed using the converged ni,k’s for each replica of P-ITS. The biased force in the simulation is

To fulfill the thorough sampling in the entire large energy range, the potential energies must be sampled evenly in all n partitioned temperature ranges, respectively. The potential energy distribution function in the ith replica is p(Vi ′) = e−βVi ′ =

∑ l=0



n

, in order to fulfill the requirement of

= 1 in the next iteration step (even distribution over the

mi , k (j) =



∑ Vi′ = ∑ ⎜⎜−

(0)

energy range sampled). A new set of values of ni,k’s is calculated from mi,k (1) and used in the next iteration step. 3. Repeating step 2 to update the values of mi,k’s as well as ni,k’s: From the second iteration step, history information on mi,k’s in earlier steps is involved in the calculation of new mi,k’s in the jth step:

As each of the n small temperature ranges contains N temperatures, the form of f (βi) can be taken as a sum of delta N functions ( f (βi ) = ∑k = 1 ni , k δ(βi − βi , k )), converting eq 2 into V′ =

(5)

(4)

In the case that an even distribution can be achieved in each of the N subsets, the parameters ni,k’s need to be determined by the requirement that each term in eq 4 contributes equally to 1231

DOI: 10.1021/acs.jctc.6b00967 J. Chem. Theory Comput. 2017, 13, 1229−1243

Article

Journal of Chemical Theory and Computation

equilibrated at 300 K for 1 ns using Langevin dynamics with the collision frequency of 2 ps−1. The values of ni,k’s were predetermined for individual replicas of P-ITS and single-trajectory ITS simulations, respectively. Starting from the final structure in the short-time equilibrium process, the long-time production run of each trajectory in either P-ITS or single-trajectory ITS simulation was performed with a step size of 0.002 ps and at the constant temperature of 300 K. The calculation data were collected every 2.0 ps. Simulation Settings for Explicit Solvent P-ITS and REMD Simulations. To indicate the less computational requirement of P-ITS compared to REMD, we used CLN025 as a protein model to run explicit solvent P-ITS and standard REMD simulations. The native structure of CLN025 was used as the initial structure, which was solvated in a cubic box containing 800 water molecules with the size of 36.0 × 34.2 × 33.0 Å3. Two Na+ cations were added to neutralize the protein charges. The force fields are the FF14SBonlysc all-atom force field51 for protein and the TIP3P model62 for water. The temperature range was set as 300−450 K.63,64 For such a system, only 2 replicas were needed in P-ITS simulation (300− 370 K, 370−450 K). In contrast, 16 replicas were required in REMD simulation with the temperature spacing determined by the Web server (http://folding.bmc.uu.se/remd/)65 and optimized by the algorithm of Nadler and Hansmann66 (300.0 K, 306.2 K, 316.7 K, 325.3 K, 334.3 K, 343.4 K, 352.8 K, 362.5 K, 372.4 K, 382.6 K, 393.1 K, 403.9 K, 414.9 K, 426.3 K, 439.0 K, 450.0 K). The constructed system of CLN025 in aqueous solution was initially minimized for 50000 steps with the protein being fixed using harmonic restraint (force constant = 10.0 kcal mol−1 Å−2). Then the system was heated upon 300 K and equilibrated for 5 ns with harmonic restraints applied on protein Cα atoms (force constant = 10.0 kcal mol−1 Å−2). The equilibrated system was subsequently used for production runs of P-ITS and REMD. In the former simulation, the production runs of two replicas were performed at constant temperature of 300 K and constant pressure of 1 atm (NPT ensemble) but with potential energy modified via the ITS method. The conformation exchange was enforced per 5.0 ns. Meanwhile, the latter simulation was run in the NVT ensemble, and the conformation exchanges between neighboring replicas were attempted every 1000 steps (the exchange acceptance probabilities were maintained at ∼20%). For both simulations, the SHAKE algorithm61 was used to fix all covalent bonds involving hydrogen atoms, and periodic boundary conditions were used to avoid edge effects. The Particle Mesh Ewald method was applied to treat long-range electrostatic interactions,67 and the cutoff distance for long-range terms was set as 10.0 Å. The Langevin dynamics with a collision frequency of 3.0 ps−1 was adopted to control the temperature of the system. The coordinates were saved every 5000 steps.

To indicate the influence of the manner of conformation exchange on simulation results, we also try the conformation exchange controlled by Metropolis criterion, which is supposed to guarantee the exchange process to converge toward the equilibrium distribution54−57 w(xi → xj) = min{1, e−[β0(Vj′(xi) − Vi ′(xi) + Vi ′(xj) − Vj′(xj))]}

(8)

where V′j (xi) − V′i (xi) is the difference of the modified potential energies calculated using the ITS parameters in jth and ith replicas, respectively, on the basis of the conformation from the ith replica, and V′i (xj) − V′j (xj) is the difference of the modified potential energies for the conformation from the jth replica. β0 is 1 the inverse temperature of the simulation system ( β0 = k T ). B 0

The conformation exchange is attempted every 5.0 ns. The comparison between the two conformation exchange manners can be seen in the “Results” section. Statistical Analysis Constructing Free-Energy Landscapes. Since the converged calculation of P-ITS simulation yields a biased non-Boltzmann distribution on the configuration space, after the simulation data is collected in production runs, the unbiased distribution function can be easily recovered by timing a reweighting factor: N

W = e β(Vi ′(r) − Vi(r)) = e−βVi(r)/p(Vi ′) = e−βVi(r)/ ∑ ni , k e−βi ,kVi k=1

(9)

The true reweighted ensemble average of a quantity A, ⟨A⟩, can be then achieved: ⟨A⟩ = ⟨AW⟩/⟨W⟩. The free-energy landscape along any n-dimensional reaction coordinate(s) X = (x1, ... xn) is given by ΔG(X ) = −

P(X ) 1 ln Pmax β

(10)

where P(X) is the unbiased probability distribution along the reaction coordinate(s) X which is obtained through a histogram analysis of the P-ITS snapshots taking into account their real visiting probabilities recovered by reweighting factor W; Pmax is the maximum of the distribution which is subtracted to ensure that ΔG(X) = 0 for the lowest free-energy minimum.58 Simulation Settings of P-ITS and Single-trajectory ITS for Protein Folding. All-atom molecular dynamics simulations were performed using the AMBER14 package.59 Protein atoms were modeled by the AMBER FF14SBonlysc all-atom force field,51 and the solvent effects were mimicked by the recently improved generalized Born (GB) implicit solvent model, GB-Neck2 (igb = 8).60 The combination of the abovementioned force field and implicit solvent model performed well in the folding simulations of proteins with diverse topologies.51 The salt concentration was set to 0, and the default surface tension was 0.005 kcal/mol/Å2. The SHAKE algorithm61 with a relative geometric tolerance of 10−5 was used to constrain all chemical bonds. The nonbonded cutoff of 30 Å was used in any simulations. For each protein/peptide, an initial extended structure generated by the tleap program in AMBER14 was first subjected to the preprocess performed by conventional MD including the minimization, heating-up, and short-time equilibrium at the desired temperature. After 3000 steps of minimization, the temperature of the simulation system was increased to 300 K within 200 ps using velocity rearrangement from a Maxwell−Boltzmann distribution. Then the system was



RESULTS The Effects of Parameter Selection in P-ITS Simulation. The crucial parameters in partitioned ITS (P-ITS) which may influence the simulation include the entire temperature range and the replica number n. A temperature range of 280−420 K was set for all proteins in the present simulations, which, as indicated in our previous studies, should be wide enough to cover all conformations with various energy levels for proteins with a size of 10−73 amino acids.43,44,46,47,68−71 To investigate the effects of the replica 1232

DOI: 10.1021/acs.jctc.6b00967 J. Chem. Theory Comput. 2017, 13, 1229−1243

Article

Journal of Chemical Theory and Computation Table 1. Simulation Parameters for P-ITS and Single-Trajectory ITS Simulations under Study P-ITS

single-trajectory ITS

protein

replicas

temp (K)

Ni,k’s

simulation time (μs)

temp (K)

Nk’s

simulation time (μs)

CLN025

2

280−350 350−420 280−320 320−360 360−420 280−350 350−420 280−350 350−420 280−320 320−360 360−420 280−320 320−360 360−420 280−320 320−360 360−420

40 40 30 30 30 40 40 40 40 40 40 40 80 80 80 80 80 80

0.8 0.8 0.6 0.6 0.6 1.0 1.0 0.8 0.8 1.8 1.8 1.8 1.7 1.7 1.7 4.2 4.2 4.2

280−420

80

0.7

280−420

100

1.0

280−420

100

3.6

3

GB 1p

2

Trpcage

2 3

GTT

3

GB3

3

Figure 2. (A-C) Time series of potential energies of Trpcage in individual replicas of 3-replica and 2-replica P-ITS and single-trajectory ITS simulations, respectively. (D-F) The distribution of potential energies sampled in individual replicas of 3-replica and 2-replica P-ITS and singletrajectory ITS (green line indicates the distribution of potential energy sampled in overall energy range of P-ITS). (G-I) Time series of root-meansquare deviation (RMSD) in individual replicas of 3-replica and 2-replica P-ITS and single-trajectory ITS simulations, respectively.

We ran hierarchical clustering analysis72 to illuminate the populated structure ensembles of Trpcage and CLN025 in individual replicas of their 2-replica and 3-replica P-ITS simulations, respectively. As shown in Figure 3, various structures spanning from compact native conformations to extended and fully unfolded conformations can be collected, suggesting that the configuration space of protein folding can be sufficiently sampled under the present temperature range. In addition, similar populated structures can be commonly seen between P-ITS simulations with different replica numbers for each peptide. Next, we calculated one-dimensional free-energy profiles as the function of backbone RMSD with respect to the native structure for all P-ITS simulations of Trpcage and CLN025 (Figure S2). For each peptide, the free-energy profiles

number on P-ITS results, for Trpcage and CLN025 peptides, we partitioned the above-mentioned temperature range into two (280−350 K, 350−420 K) or three replicas (280−320 K, 320−360 K, and 360−420 K), respectively. The results of the corresponding 2-replica and 3-replica P-ITS simulations were then compared with each other. The discrete temperatures are spaced evenly in individual replicas, and the detailed numbers are given in Table 1. Figures 2A and 2B show the time series of the potential energies in individual replicas for 3-replica and 2replica P-ITS simulations of Trpcage, and Figures S1A and S1B are for CLN025. One can seen that no matter what the replica number is used in P-ITS, the potential energies are sampled evenly in the respective energy ranges. 1233

DOI: 10.1021/acs.jctc.6b00967 J. Chem. Theory Comput. 2017, 13, 1229−1243

Article

Journal of Chemical Theory and Computation

Figure 3. Top five clusters measured in individual replicas of (A) 3-replica and (B) 2-replica P-ITS simulations of Trpcage and (C) 3-replica and (D) 2-replica P-ITS simulations of CLN025, respectively. Under the representative structure of each cluster is shown the RMSD value. The native hydrophobic side-chains are shown with licorice representation as they are formed in structures.

share the same features in 2-replica and 3-replica P-ITS simulations. Therefore, as the configuration space is sufficiently sampled, the detailed number of replicas used in P-ITS should have little influence on the simulation results. We then chose 2 replicas for GB 1p (see the trajectories and populated structure ensembles in Figures S3 and S4) and 3 replicas for GTT and GB3 in their P-ITS simulations, respectively. Another issue of P-ITS is the conformation exchange manner. For all P-ITS simulations here, the conformation exchanges among replicas were enforced at constant intervals. Next, we also ran 2-replica P-ITS simulations on Trpcage and CLN025 with the conformation exchange attempted by Metropolis criterion (eq 8) as control tests. The simulation trajectories represented by time series of potential energies and RMSD, the distribution of potential energies, and the populated structure ensembles sampled in individual replicas are depicted in Figures S5 for Trpcage and S6 for CLN025. In addition, the calculated one-dimensional free-energy profiles are shown in Figure S2. One can see that for each peptide, the populated structure ensembles sampled in individual replicas and the features of the global free-energy profiles are very similar between the 2-replica P-ITS simulations with the conformation exchanges enforced at constant intervals or controlled by Metropolis criterion, suggesting that the two conformation exchange manners have similar contribution for the configuration space sampling of small and simple protein systems such as Trpcage and CLN025. P-ITS Folding Various Experimental Native Structures. Starting from extended structures without any formed native structural elements, the atomically detailed P-ITS simulations can fold all proteins and peptides into their experimental native structures successfully including solely α-helix (Trpcage), solely β-sheets (CLN025, GB 1p, and Fip35 WW domain variant GTT), and α/β mixed structure (GB3 domain). As shown in Figure 4, the structures with lowest RMSDs which have been captured in simulations are very similar to their native structures. To the best of our knowledge, the present study is a first report for the successful folding of GB3 domain in an all-

Figure 4. Comparison of the structures with lowest RMSDs (blue) in P-ITS simulations started from extended conformations to the experimentally measured native structures (red). Under each structure is shown the protein name, chain length, and backbone RMSD value.

atom MD simulation without introducing any a priori knowledge of native structure (the folding of GB3 has been only reported in a metadynamics simulation guided by NMR chemical shifts73). More Efficient Conformational Sampling in P-ITS than in Single-Trajectory ITS. The P-ITS method is supposed to improve the sampling ability of molecular simulation on the potential energy surface, providing effective and reliable access to explore the transition pathway on multidimensional freeenergy landscapes. The efficiency of the P-ITS method in potential energy sampling and conformation collection was assessed through comparison of the simulation results of Trpcage, CLN025, and GB 1p by P-ITS to those by singletrajectory ITS simulations covering the same overall temperature ranges. All simulation parameters can be seen in Table 1. In addition, the folding pathways and associated thermodynamic quantities of the tested peptides measured by P-ITS and single-trajectory ITS simulations were also compared to experimental data as well as the results of REMD simulations. From Figure 2C, one can see that the single-trajectory ITS can repeatedly sample the high and low potential energy regions in a broad range, but the energy distribution in the overall range in single-trajectory ITS is less even than those in 1234

DOI: 10.1021/acs.jctc.6b00967 J. Chem. Theory Comput. 2017, 13, 1229−1243

Article

Journal of Chemical Theory and Computation

lower free-energy values than the counterpart in the latter simulation, which is probably induced by better sampling of the metastable structures in P-ITS simulation (see Figure 2). Hierarchical clustering analysis72 was performed on the trajectories to determine the representative conformations of all important states in the free-energy profiles. Four states including the folded (F: RMSD ≤ 1.6 Å), unfolded (U: 7.1 Å ≤ RMSD ≤ 7.5 Å), and two partially folded states (P1: 4.2 Å ≤ RMSD ≤ 4.6 Å; P2: 2.9 Å ≤ RMSD ≤ 3.3 Å) were observed (Figure 5B), sharing exactly the same structures in the two kinds of simulations. Therefore, P-ITS simulation can capture the same folding pathway as single-trajectory ITS but meanwhile increase the statistical weights of all transient states relative to the folded one along the folding pathway and thus decrease the unfolding barrier of Trpcage. We calculated the average helicity (fraction of α-helix content) of individual residues within the N-terminal helix region (Asp1-Gly10). Interestingly, the site-specific helicity in P-ITS simulation (inset in Figure 5A) is much closer to that measured by explicitsolvent REMD simulation implemented with the RSFF2 force field.74 Combining Figures 5A and 5B, one can see that the folding of Trpcage is a multiple-state transition (U → P1 → P2 → F) in which the two partially folded states (P1 and P2) serve as intermediates. Starting from the fully extended structure, Trpcage readily collapses into an unfolded state in which only the N-terminal helix is partially formed and the hydrophobic side-chains of Trp6 and Pro12 are packed. The further folding of Trpcage into P1 and P2 intermediates includes the following: 1) the formation of 310-helix (Gly11-Ser14), 2) the optimization of the Trp6-Pro12 hydrophobic contact to its native configuration and the formation of Asp9-Arg16 saltbridge, and 3) the packing of the C-terminal polyproline II region against Tyr3 and Trp6. The N-terminal helix can be fully folded, and the native hydrophobic contacts between the Cterminal polyproline II region and N-terminal helix can be correctly formed unless the F state is reached. It is noteworthy that the presence of the folding intermediate state of Trpcage has been reported in multiple experiments.75,76 The experimentally suggested intermediate structure contains a large amount of α-helical content, the well packing between Trp6 and Pro12 and the formation of Asp9-Arg16 salt-bridge, which are consistent with the present observations. In addition, the structural features of the unfolded state of Trpcage have been also revealed by experiments. Besides the UVRS experiment that suggested the presence of the residual αhelical structure in the unfolded state,76 the Photo-CIDNP NMR experiment also observed that the unfolded state adopts a compact structure.77 This unfolded structure possesses not only notable non-native hydrophobic contacts but also a native one (Trp6-Pro12). Meanwhile, the native hydrophobic contacts among Trp6 and Pro18, Pro19 are absent, and the Asp9-Arg16 salt-bridge is not formed either. The representative structure of the unfolded state of Trpcage identified by present simulations is depicted in Figure S8. In this compact unfolded structure, a portion of the α-helical structure is folded and the Trp6-Pro12 hydrophobic contact is kept, whereas neither the native contacts between Trp6 and the C-terminal Pro18 and Pro19 nor the Asp9-Arg16 salt-bridge is formed. Therefore, the simulated unfolded state of Trpcage possesses the structural features consistent with the experimental measures.76,77 The room-temperature free-energy profile as the function of backbone RMSD was also drawn for two β-structured peptides

P-ITS simulations (black line in Figure 2F vs green lines in Figures 2D and 2E). The time series of RMSD with respect to the native structure of Trpcage was calculated in each replica: the smaller the RMSD is, the closer the structure is to the native one. Accordingly, although various structures with a large range of RMSD values can be captured in P-ITS (Figures 2G, 2H) as well as single-trajectory ITS (Figure 2I), the metastable structures with large RMSDs are visited more frequently in the former simulations than in the latter. Therefore, the conformational sampling is more effective in P-ITS simulation. More even distribution of the potential energy and more effective conformational sampling of P-ITS than single-trajectory ITS can be also seen in Figures S1 and S3 for CLN025 and GB 1p peptides. Comparison of the Folding Free-Energy Landscapes Explored by P-ITS and Single-Trajectory ITS. As the configuration space is well sampled, the free-energy landscape underlying protein folding can be mapped. Since no predefined collective coordinates are introduced in P-ITS simulation and only potential energy is modified to change the distribution of conformational sampling, the real visiting probability of each conformation can be reweighted by a factor with only temperature and potential energy involved (eq 9). The freeenergy profile can be spanned along any reaction coordinates. To indicate the simulation convergence, we calculated onedimensional free-energy profiles at different simulation times for P-ITS and single-trajectory ITS simulations of Trpcage, CLN025, and GB 1p. For the former two peptides, the 3replica P-ITS of Trpcage and 2-replica P-ITS of CLN025 were used in the detailed free-energy landscape analyses here and thereinafter. As shown in Figure S7, for each tested peptide, the free-energy profiles have constant features within a large range of simulation times for both P-ITS and single-trajectory ITS simulations, indicating the convergence of simulation results. More importantly, the converged results can be obtained within the same accumulated simulation times in the two kinds of simulations, suggesting that the use of more replicas in P-ITS than in single-trajectory ITS requires no extra simulation time. Figure 5A show the room-temperature (300 K) free-energy profile as a function of backbone RMSD of Trpcage. One can see that the free-energy profile generated by P-ITS has a similar feature to that of single-trajectory ITS except that the transient states along the folding pathway in the former simulation have

Figure 5. (A) One-dimensional free-energy profile as a function of backbone RMSD with respect to the native structure of Trpcage at 300 K measured by P-ITS and single-trajectory ITS simulations. Inset is the comparison of the average helicity of individual residues of Trpcage calculated in P-ITS and single-trajectory ITS simulations to that in the reported explicit-solvent REMD simulation (data from ref 74). (B) Representative conformations of the distinct states indicated in free-energy profiles (the hydrophobic side-chains involved in native hydrophobic contacts are shown with licorice representation). 1235

DOI: 10.1021/acs.jctc.6b00967 J. Chem. Theory Comput. 2017, 13, 1229−1243

Article

Journal of Chemical Theory and Computation

Figure 6. One-dimensional free-energy profile as a function of backbone RMSD with respect to the native structure of (A) CLN025 and (B) GB 1p peptides at 300 K measured by P-ITS and single-trajectory ITS simulations, respectively. (C, D) Representative conformations of the distinct states of CLN025 indicated in free-energy profiles by P-ITS and single-trajectory ITS simulations. (E, F) Representative conformations of the distinct states of GB 1p indicated in free-energy profiles by P-ITS and single-trajectory ITS simulations. Hydrophobic side-chains involved in native hydrophobic contacts are shown with licorice representation (green colored in one strand and orange colored in the other).

of CLN025 and GB 1p (Figures 6A and 6B). P-ITS simulation always generates a similar shaped free-energy profile as that by single-trajectory ITS simulation but meanwhile more heavily weighs transient states, leading to lower free-energies of such states. All important states along the folding pathways including unfolded (U), helix (H), transition (T), and folded (F) states again share the same structures in the two kinds of simulations for both peptides (Figure 6C-D for CLN025 and Figure 6E-F for GB 1p simulated by P-ITS and single-trajectory ITS, respectively). We notice that the folding free-energy barriers of the tested peptides are only of the order of ∼2kBT, allowing a conventional MD simulation on the time-scale of μs to capture the folding event of protein. However, it is very difficult to reach global equilibrium allowing the statistical weighting of all functionally correlated states in single conventional trajectory that requires the simulation much longer than the longest relaxation time of the system. Therefore, single observation(s) of the folding event in conventional MD simulation is certainly not enough to quantitatively identify the folding pathway and provide detailed folding/unfolding thermodynamic quantities. A main issue of the folding mechanism of β-hairpin is the formation order of its three structural elements: the turn configuration, the cross-strand hydrophobic core cluster, and the backbone hydrogen bond assembling.78,79 Multiple computational studies have suggested that the formation of the native turn configuration drives the folding of βhairpin.69,80−83 Here, we calculated the average turn formation probabilities for individual residues of CLN025 in the structures without the formation of any native structural elements. As shown in Figures 7A and 7B, P-ITS and singletrajectory ITS simulations indicate similar site-specific turn probabilities: even in the case of the overall configuration of the β-hairpin not being formed, the individual residues in the turn region (Pro4-Gly7) of CLN025 have high turn probabilities. The turn probabilities of turn residues reach maxima, whereas those for residues out of turn region reach minima as the native turn is formed (see the insets in Figures 7A and 7B). The P-ITS and single-trajectory ITS simulations generate similar results for

Figure 7. Probability of residue-level turn structure formation as no native structural elements are formed: (A, B) CLN025 and (C, D) GB 1p in P-ITS and single-trajectory ITS simulations, respectively. The corresponding probability when the native turn is formed is shown in the insets.

GB 1p as well in which the turn residues also have relatively high turn probabilities (Figures 7C and 7D). Meanwhile, however, the residues near to C-terminus also have high turn probabilities, which are probably induced by the presence of an artificial helix structure of GB 1p in which the C-terminal residues adopt turn configurations (Figure S9). Removing the effects from helix structures, the turn has intrinsically high forming propensity compared to the other structural elements in both CLN025 and GB 1p. The hydrogen bonding donor−acceptor distance of individual backbone hydrogen bonds along the strands was calculated as the function of the number of hydrogen bonds formed in the folding of CLN025 and GB 1p. As shown in Figure S10, for both peptides, the formation of detailed hydrogen bonds follows the order from the inner to outer along the strands in both P-ITS and single-trajectory ITS simulations, suggesting that the hydrogen bonds are assembled following a “zip-out” model.84,85 Accordingly, in the representative conformations of the transition states of CLN025 and GB 1p (Figure 6C−F), not 1236

DOI: 10.1021/acs.jctc.6b00967 J. Chem. Theory Comput. 2017, 13, 1229−1243

Article

Journal of Chemical Theory and Computation

the salt-bridge distance between D9 and R16 side-chains, which contribute to the native structure of Trpcage (Figure S14). Similar features can be seen between the P-ITS and singletrajectory ITS simulations here and the previous explicit solvent REMD simulation for Trpcage (Figure 2 in ref 86). The differentiation of P-ITS and REMD in computational requirement can be further amplified in the explicit solvent system. We used CLN025 as a protein model to run P-ITS and REMD simulations in explicit solvent, respectively. The detailed description of the simulations can be seen in the “Computational Methods” section. The initial configuration of the simulation system is shown in Figure S15A. Under the same temperature range, the P-ITS can be performed using only 2 replicas whereas REMD needs 16 replicas, and the corresponding accumulated simulation times are ∼0.9 μs and ∼1.2 μs, respectively. The overlapping of the potential energies among replicas, the simulation replicas represented by time series of RMSD, and the fraction of the folded state at different temperatures for P-ITS and REMD simulations are depicted in Figure 8. The temperature space sampled in a representative

only the turn configurations but also the hydrogen bonds nearby the turn region are formed, whereas the hydrogen bonds close to the termini are not formed. In summary, both P-ITS and single-trajectory ITS simulations suggest the same folding pathway of β-hairpin, which is initiated and driven by the turn formation, consistent with a large number of previous simulation studies.69,80−83 Finally, two-dimensional free-energy landscapes as the function of RMSD and Rg (radius of gyration) are depicted in Figure S11 for the three tested peptides. The free-energy landscapes generated by P-ITS and singletrajectories ITS simulations always have similar features, further showing the consistency between these two kinds of simulations. Less Requirement of Computational Resources in PITS than in REMD. In addition to the thermodynamics at the target temperature (e.g., 300 K), as a result of the extensively sampled temperature and energy ranges, ITS relevant simulation allows the assessment of the thermodynamic quantities of protein folding in a temperature range covered in simulation via the statistical analysis using a temperatureadjustable reweighting factor (eq 9). The folding/unfolding equilibrium can be also measured by any order parameters at various temperatures. Figure S12 indicates the distribution of backbone RMSD at multiple temperatures for the three tested peptides measured by P-ITS and single-trajectory ITS simulations. For each peptide modeled by either simulation, the RMSD distributions roughly exhibit two peaks: one peak located at low RMSD position corresponds to the folded state with its height decreasing with temperature, the other peak at large RMSD position corresponds to various unfolded states with the height increasing with temperature. Similar temperature effects on RMSD distribution can be also seen for GTT simulated by P-ITS (Figure S13). Using the same force field (AMBER FF14SBonlysc) and implicit solvent model (GB-Neck2), Nguyen et al. have performed REMD simulations on the folding of a series of proteins including CLN025, Trpcage (TC5B), and GTT.51 Six replicas covering the temperature range of 275−424 K were used for CLN025 (275.1 K, 300.0 K, 327.2 K, 356.8 K, 389.1 K, 424.3 K), 8 replicas with the temperature range of 264−413 K were used for Trpcage (264.0 K, 281.4 K, 300.0 K, 319.8 K, 340.9 K, 363.3 K, 387.3 K, 412.9 K), and 10 replicas with the temperature range of 250−389 K were used for GTT (250.0 K, 262.6 K, 275.8 K, 289.7 K, 304.3 K, 319.6 K, 335.6 K, 352.5 K, 370.3 K, 388.9 K), respectively. In contrast, only 2 replicas were used for CLN025, and 3 replicas were used for Trpcage and GTT in the present P-ITS simulations which cover similar temperature ranges as the above-mentioned REMD simulations. For each peptide, the RMSD distribution measured by REMD (See Figures S4 for CLN025, Figure S8 for Trpcage, and Figure 21 for GTT in the Supporting Information of ref 51.) has a similar feature which is influenced by temperature in the same manner as that measured by P-ITS simulation. Therefore, one can see that P-ITS simulation could achieve consistent results as REMD simulation but with less computational resources. Day et al. ran explicit solvent REMD simulation on Trpcage with the force fields of AMBER FF99SB for protein and TIP3P for water.86 A total of 40 replicas were used to cover the temperature range of 280−540 K. To make a comparison to this REMD simulation, we calculated the temperature dependent distributions of the hydrogen bonding distance between W6 side-chain Nε atom and R16 backbone O atom as well as

Figure 8. (A-B) The distribution of potential energies sampled in individual replicas and (C-D) time series of RMSD in individual replicas in explicit solvent P-ITS and REMD simulations of CLN025, respectively. (E) The temperature dependence of the fraction of the folded state of CLN025 measured in two simulations. For clarity, eight out of 16 replicas of REMD are displayed in part D.

replica of REMD is shown in Figure S15B. One can see from Figure 8 that in explicit solvent simulation, 2-replica P-ITS can yield the similar temperature dependent folded state fraction as that by 16-replica REMD. Comparison of Temperature Dependent Folding/ Unfolding Thermodynamics Evaluated by P-ITS and Experiments. The unfolding free-energy can be calculated by Pf 1 ΔGU = ln β 1 − Pf (11) where Pf is the fraction of the folded state. The folded state can be defined using the cutoff of RMSD from the free-energy profile (e.g., RMSD ≤ 1.6 Å for Trpcage in Figure 5A, RMSD ≤ 1.5 Å for CLN025 in Figure 6A, and RMSD ≤ 2.2 Å for GB 1p in Figure 6B). Pf at various temperatures can be measured from the reweighted probability in trajectories accordingly. The unfolding entropy and enthalpy can be then calculated by ΔSU = −(∂ΔGU/∂T)P and ΔHU = ΔGU + TΔSU, respectively. The unfolding enthalpy ΔHU is a key experimental quantity which 1237

DOI: 10.1021/acs.jctc.6b00967 J. Chem. Theory Comput. 2017, 13, 1229−1243

Article

Journal of Chemical Theory and Computation can be used to evaluate the accuracy of MD simulation.14,15,74 Here, ΔHU of the three tested peptides was calculated at the temperature range of 280−370 K (Figure 9). One can see that

Table 2. Comparison of the Calculated Unfolding Thermodynamics by P-ITS and Single-Trajectory ITS Simulations to Experimental Data unfolding thermodynamics protein Trpcage GB 1p CLN025 GTT

P-ITS

single-trajectory ITS

expt

± ± ± ±

10.2 ± 0.2a 13.8 ± 0.2b 17.4 ± 0.1c

13 ± 2a48 39.0b85 11.1c49 31.8c87

16.8 24.1 10.4 34.2

0.6a 0.3b 0.2c 0.2c

ΔHU at 300 K (kcal/mol). bΔSU (cal/K/mol) at 297 K. cΔHU at melting temperature (kcal/mol). a

N

Q=

N

∑i =res1 ∑ j =i 1

1 0

1 + e10(dij − (dij + 1)) Nres ∑i = 1 Ni

(12)

where Ni’s are the contacts of residue i, dij is the Cα-Cα distance (measured in unit of Å) between residues i and j at any frame in simulation trajectory, and d0ij is the same distance in the native state. The native contacts are defined as the sum of longrange native Cα-Cα contacts between residues which are separated by at least 7 residues in sequence and whose Cα atoms are closer than 10 Å for more than 80% of the time in the folded state. The value of Q is unity for native state and becomes smaller for less native-like contact profiles. For the complex protein system, Q is supposed to be a better reaction coordinate than RMSD in describing the folding process.16,88 Figures 10A and S17 show one-dimensional free-energy profiles as the function of Q for GTT and GB3 at 300 K, respectively. One can see that among multiple states, the folded state with the largest Q value has the lowest free-energy, indicating that the force field used here models the intraprotein interactions for each protein suitably. The largest free-energy barrier involved in the folding transition of GTT is ∼3.2 kBT, at a similar level (1−5 kBT) as calculated on one-dimensional freeenergy surfaces in previous simulation reports of Fip35 WW domain.14,89,90 The folding of GTT consists of the formation of two antiparallel β-hairpins (hairpin 1 (Gly7-His23) and hairpin 2 (Tyr19-Gln29) from N- to C-terminus). To further indicate the folding pathway of GTT, we calculated a two-dimensional freeenergy landscape as the function of the RMSDs of individual hairpin structures in GTT (Figure 10B). The representative conformations of all distinct states in one-dimensional and twodimensional free-energy profiles were measured by clustering analyses, respectively. Among these states, two clusters of helix states (H1 and H2) with varied α-helix contents are presented. In addition, an unfolded (U) state is defined which consists of mainly coiled structures. On the other hand, besides the folded (F) state with all native structural elements formed, there is another state in which hairpin 1 is roughly folded, whereas hairpin 2 is overfolded with the original coiled part at the Cterminus forming an extra hairpin structure (0.5 < Q < 0.7 in Figure 10A; 3.0 Å ≤ RMSDhairpin1 ≤ 4.0 and 2.8 Å ≤ RMSDhairpin2 ≤ 3.4 Å in Figure 10B). This additional state, which is defined as an F′ state here, is probably induced by the use of an implicit solvent model which exacerbates the secondary structure bias of the force field in molecular simulation.91−93 Transition and intermediate states can be observed among various well-populated states. The presence of plenty of states makes the free-energy profile spanned along single reaction coordinate (Figure 10A)

Figure 9. Temperature dependence of the unfolding enthalpy ΔHU of (A) Trpcage, (B) CLN025, and (C) GB 1p, respectively.

at low temperature region, the value of ΔHU measured by PITS is slightly larger than that measured by a single-trajectory ITS method. This tendency is opposite at relatively high temperature region. The experimental ΔHU is available for Trpcage at 300 K (not melting temperature)48 and CLN025 and GTT at their melting temperatures (Tm = 343 K for CLN025 and 371 ± 2 K for GTT, respectively).49,87 In addition, the unfolding entropy (ΔSU) but not ΔHU has been reported for GB 1p at 297 K.85 Therefore, the ΔHU of Trpcage at 300 K, ΔSU of GB 1p at 297 K, ΔHU of CLN025, and GTT at their simulated melting temperatures were calculated and compared to experimental data accordingly. The melting temperatures of CLN025 and GTT were measured from the profiles of specific heat capacities as a function of temperature (CP = (⟨V2⟩ − ⟨V⟩2)/kT2, Figures S16A and S16B) where the peak positions suggest the melting temperatures of 326 and 345 K for CLN025 simulated by PITS and single-trajectory ITS and 360 K for GTT simulated by P-ITS. The single-trajectory ITS simulation thus gave a better evaluation for the melting temperature of CLN025. Then the ΔHU of CLN025 and GTT at their respective simulated melting temperatures were evaluated from the calculated ΔHU profiles (Figures 9B for CLN025 and S16C for GTT). The unfolding thermodynamic quantities of all the above-mentioned proteins are organized in Table 2. We can see that the results from P-ITS simulations are in great agreement with the experimental data. Dominant Folding Pathways of GTT and GB3 Explored by P-ITS Simulations. While the ability of P-ITS in conformational sampling and free-energy evaluation is validated in the tests of three peptides, we attempt to use this method to explore the folding pathways of more complex proteins. Q is a reaction coordinate defined by Shaw and co-workers which can be used to monitor the fraction of native contacts formed at each frame14 1238

DOI: 10.1021/acs.jctc.6b00967 J. Chem. Theory Comput. 2017, 13, 1229−1243

Article

Journal of Chemical Theory and Computation

Figure 10. (A) One-dimensional free-energy profile as a function of Q. (B) Two-dimensional free-energy landscape as the function of backbone RMSDs of the two hairpin regions of GTT. Both profiles are measured at 300 K from the data of P-ITS simulation, and the contours in the latter two-dimensional profile are spaced at intervals of 0.5 kBT. (C) Representative conformations of the distinct states indicated in free-energy profiles and the corresponding folding pathways of GTT.

inadequate for demonstrating a detailed folding pathway of GTT. Indeed, the analysis of the structural and energetic characterizations of these states in a two-dimensional freeenergy landscape reveals multiple parallel folding pathways for GTT: besides the formation of helix states (U → H1, H2) and the folding of the artificial overfolded F′ state (U → I′ → T′ → F′), there are two folding pathways leading to the correctly folded state (path 1: U → T1 → F; path 2: U → T2 → F, see Figure 10C). Hairpin 1 is readily folded, but hairpin 2 is not folded in the T1 transition state. In contrast, hairpin 1 is only partially folded, whereas hairpin 2 is readily folded in the T2 transition state. In addition, T1 has a lower free-energy value than T2 (Figure 10B), suggesting that path 1 is the dominant folding pathway of GTT. Therefore, GTT is folded via two parallel pathways, and the formation of hairpin 1 takes precedence over hairpin 2 in the dominant folding pathway. This folding mechanism is generally in agreement with scenarios reported in previous experiments and simulations of Fip35 WW domain.94−97 So far, only one piece of literature reported the folding simulation of α/β protein GB3, which has been performed at relatively high temperature (330 K) using a metadynamics technique incorporating NMR chemical shifts as collective variables.73 Here, without introducing any collective variables, we ran P-ITS simulation on GB3 and analyzed the folding pathway at room-temperature. The conformation sampled in the simulation can reach the lowest RMSD of 3.56 Å (Figure 4). In addition, the conformation ensembles with the lowest free-energy value in the one-dimensional free-energy profile are also native-like (Figure S17), suggesting that the present P-ITS simulation can predict the native-like structure of GB3 without the requirement of a priori knowledge. Using the RMSDs of individual hairpin structures in GB3 (hairpin 1 (Gln2-Lys19) at the N-terminus and hairpin 2 (Gly41-Glu56) at the C-terminus, respectively) as the reaction coordinates, we calculated a two-dimensional free-energy landscape (Figure 11). Multiple states exist in the free-energy landscape including the states containing a large amount of αhelical contents (defined as H state) and two misfolded states (M1 and M2) in which hairpin 1 and hairpin 2 are docked with each other in wrong directions. More importantly, a connection can be found between folded (F) and unfolded (U) states via a

Figure 11. Two-dimensional free-energy landscape as the function of backbone RMSDs of the two hairpin regions of GB3 at 300 K and representative conformations of the distinct states in the free-energy landscape. The contours are spaced at intervals of 0.5 kBT in the freeenergy landscape. In each protein conformation, colors code the secondary structural elements as defined in VMD software.

transition (T) state and an intermediate (I) state in the freeenergy landscape. Along the lowest free-energy folding pathway (U → T→I → F, as shown by the orange dashed line in Figure 11), the global structure of hairpin 1 is folded but with non-native features in the U state, which contains a two-amino-acid register shift with RMSD of ∼5.0 Å to the native state. This non-native hairpin structure is induced by the formation of a non-native turn configuration: as shown in a Ramachandran sampling distribution (Figure 12), not only the type I turn involving residues of N8-T11 in the native β-hairpin structure but also a non-native I′ turn exist. In contrast, no non-native features are observed on hairpin 2, and its native structure is folded gradually until the I state. Both β-hairpins are well folded in the F state. On the other hand, the α-helix (Ala23-Asn37) in the middle position of GB3 is not folded in the T state, is only 1239

DOI: 10.1021/acs.jctc.6b00967 J. Chem. Theory Comput. 2017, 13, 1229−1243

Article

Journal of Chemical Theory and Computation

temperature range in single trajectory, and the equilibrium distribution at each temperature is reweighted back to a Boltzmann-weighted one using a specific reweighting factor.34,38 While the performance of REMD and ITS has been studied by many research groups focusing on various aspects, their application in the complex system is largely hindered by huge computational resource requirements from the former approach and the difficulty of calculation convergence in the latter in the presence of a large collection of degrees of freedom. In light of this, on the basis of the original form of the ITS method, we proposed a partitioned variant (P-ITS) which takes advantage of the benefits of both ITS and REMD. In such a method, a large temperature range is divided into multiple small ranges which are then partitioned in individual replicas, the ITS technique is utilized to make an enhanced sampling in each replica, and conformation exchange is tried among neighboring replicas. The combination of the ITS-induced enhanced sampling on the potential energy surfaces in individual replicas as well as the conformation exchange among replicas guarantees the sufficient sampling of global configuration space. Using the P-ITS method, we folded a series of proteins/ peptides with diverse native structures. For single proteins, we tested P-ITS simulations using varied parameter values (e.g., the number of replicas n and the manner of conformation exchange), which indicated that P-ITS yields consistent results which are not sensitive to the parameters involved. However, there are two caveats worth noting. First, in the present simulations, the number of replicas as well as the partitioned temperature ranges remains constant once they are determined. For small-size proteins under study whose conformation spanning along the folding pathways is not extremely extensive, the sampled conformations are overlapped among replicas; but for large-scale transition of the complex system which has extensive configuration space, the replica exchange setting in PITS may add new and previously unexplored regions of phase space into replicas. Then the continuous reoptimization of the number of replicas and temperature partitioning should be necessary. Second, the conformation exchange is enforced among replicas here, and the simulations generate consistent folding pathways in free-energy surfaces as those in the simulations controlled by Metropolis criterion for small-size proteins. However, we should note that the forced mixture of the conformations sampled in different replicas of P-ITS may probably induce inappropriate reweighting of the conformations, particularly for complex transitions. This potential flaw of the current approach could be hopefully resolved by using Metropolis criterion to control the conformation exchange. These factors will be taken into account in our future work on complex systems. Through the comparison of P-ITS and the original singletrajectory ITS on the folding simulations of three tested peptides (Trpcage, CLN025, and GB 1p), P-ITS indicates better sampling ability on the potential energy surface than single-trajectory ITS. Accordingly, while both methods reveal the same folding pathways on multidimensional free-energy landscapes, P-ITS can evaluate the folding/unfolding thermodynamic quantities (e.g., unfolding enthalpy) more consistent with the experimental data. On the other hand, in comparison to standard REMD using the same force field and solvent model (either implicit or explicit), P-ITS can significantly save computational resources and meanwhile generate similar

Figure 12. Two-dimensional free-energy landscapes as the function of dihedral angles (ϕ and ψ) of turn residues in the N-terminal hairpin 1 in GB3. The contours are spaced at intervals of 0.5 kBT.

partially folded in the I state, and can be fully folded only in the F state. It is noteworthy that Baxa et al. did experimental ψ analysis and Monte Carlo simulations (using TerItFix approach) on the folding of three α/β homologues (proteins G and L and NuG2b).98 Among these topologically similar proteins, protein G and NuG2b share 88% and 73%, and protein L shares 13% of the sequence identity of GB3. Intriguingly, agreements can be found between the present P-ITS simulation and the experimental and simulation studies by Baxa et al.: 1) The same non-native hairpin features of hairpin 1 in GB3 found here have been also observed for hairpin 1 of protein G and hairpin 2 of protein L in their transition state ensembles (TSEs) (Figure 5 in ref 98); 2) The native α-helix content is also largely absent in the TSEs of protein G and protein L and only partially presented in the TSE of NuG2b.98 The present study thus suggests that the folding of GB3 is a multiple-step transition which is initiated by the formation of hairpin 1, followed by the antiparallel pairing of the first β-strand of hairpin 1 with the C-terminal hairpin 2, and completed by the secondary structure formation of the central α-helix and hairpin 2. The NMR-guided metadynamics simulation by Granata et al.73 suggested a similar formation order of the native structural elements except that it is hairpin 2 that folds more easily than hairpin 1 in the folding pathway.



DISCUSSION AND CONCLUSIONS Temperature-based enhanced sampling methods have been proposed to operate molecular simulation in a large temperature range so that the simulated system can frequently visit not only low but also high temperatures (energies) where the sampling is ergodic. As a result, the sampling is accelerated across the potential energy surface and the relevant statistics is improved. Specifically, parallel tempering (REMD) approaches run multiple replicas at discrete temperatures simultaneously, and neighboring replicas exchange their temperatures according to certain acceptance probability which preserves the Boltzmann distribution at each temperature.30,32 On the other hand, integrated tempering approaches (e.g., ITS) generate non-Boltzmann distribution in an extensive potential energy range according to the integration over a certain 1240

DOI: 10.1021/acs.jctc.6b00967 J. Chem. Theory Comput. 2017, 13, 1229−1243

Journal of Chemical Theory and Computation



simulation results, e.g., on the folding of Trpcage, CLN025, and GTT. In addition, consistencies can be also seen between the present P-ITS simulations and previous explicit-solvent REMD simulations using different force fields. For instance, the average helicity of individual residues of Trpcage is similar to that measured by the REMD simulation of Zhou et al. using the RSFF2 force field.74 The temperature dependent folding equilibrium of Trpcage is consistent with that measured by the REMD simulation of Day et al. with the AMBER FF99SB force field.73 Using P-ITS simulations, the present study reveals parallel folding pathways of Fip35 WW domain variant GTT where either hairpin 1 or hairpin 2 forms first and the dominant folding pathway of α/β protein GB3 where the β-hairpin(s) forms more readily than α-helix. The observed structural characterizations of transition and intermediate states of the two complex proteins are in agreement with previous experimental and simulation studies on the same proteins and/or homologues (e.g., refs 94−97 for GTT, ref 73 for GB3, and ref 98 for its homologues). In summary, the present study indicates the conformational sampling ability of P-ITS using economized computational resources, implying its large potential in simulating the structural dynamics of complex biomolecule systems.



Article

AUTHOR INFORMATION

Corresponding Authors

*Phone: +86 21 50806600-1304. E-mail: [email protected]. cn (Q.S.). *E-mail: [email protected] (J.S). ORCID

Qiang Shao: 0000-0001-6460-3095 Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS We thank the National Natural Science Foundation of China (Grant No. 21373258), National Basic Research Program (Grant No. 2014CB910400), and Special Program for Applied Research on Super Computation of the NSFC-Guangdong Joint Fund (the second phase) for financial support. The simulations were run at TianHe 1 supercomputer in Tianjin and TianHe II supercomputer in Guangzhou, China.



REFERENCES

(1) Dill, K. A.; Ozkan, S. B.; Weikl, T. R.; Chodera, J. D.; Voelz, V. A. The protein folding problem: When will it be solved? Curr. Opin. Struct. Biol. 2007, 17, 342−346. (2) Grant, B. J.; Gorfe, A. A.; McCammon, J. A. Large conformational changes in proteins: Signaling and other functions. Curr. Opin. Struct. Biol. 2010, 20, 142−147. (3) Parak, F. G. Proteins in action: The physics of structural fluctuations and conformational changes. Curr. Opin. Struct. Biol. 2003, 13, 552−557. (4) Dill, K. A.; Ozkan, S. B.; Shell, M. S.; Weikl, T. R. The protein folding problem. Annu. Rev. Biophys. 2008, 37, 289−316. (5) Skjaerven, L.; Reuter, N.; Martinez, A. Dynamics, flexibility and ligand-induced conformational changes in biological macromolecules: A computational approach. Future Med. Chem. 2011, 3, 2079−2100. (6) Campbell, S. J.; Gold, N. D.; Jackson, R. M.; Westhead, D. R. Ligand binding: Functional site location, similarity and docking. Curr. Opin. Struct. Biol. 2003, 13, 389−395. (7) Zhuravlev, P. I.; Papoian, G. A. Functional versus folding landscapes: The same yet different. Curr. Opin. Struct. Biol. 2010, 20, 16−22. (8) Levy, Y.; Cho, S. S.; Onuchic, J. N.; Wolynes, P. G. A survey of flexible protein binding mechanisms and their transition states using native topology based energy landscapes. J. Mol. Biol. 2005, 346, 1121−1145. (9) Frauenfelder, H.; Sligar, S. G.; Wolynes, P. G. The energy landscapes and motions of proteins. Science 1991, 254, 1598−1603. (10) Shaw, D. E.; Dror, R. O.; Salmon, J. K.; Grossman, J. P.; Mackenzie, K. M.; Bank, J. A.; Young, C.; Deneroff, M. M.; Batson, B.; Bowers, K. J.; et al. Millisecond-scale molecular dynamics simulations on anton. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC09) 2009, ACM: New York. (11) Salomon-Ferrer, R.; Gotz, A. W.; Poole, D.; Le Grand, S.; Walker, R. C. Routine microsecond molecular dynamics simulations with AMBER on GPUs. 2. Explicit solvent particle mesh ewald. J. Chem. Theory Comput. 2013, 9, 3878−3888. (12) Gotz, A. W.; Williamson, M. J.; Xu, D.; Poole, D.; Le Grand, S.; Walker, R. C. Routine microsecond molecular dynamics simulations with AMBER on GPUs. 1. Generalized born. J. Chem. Theory Comput. 2012, 8, 1542−1555. (13) Kohlhoff, K. J.; Shukla, D.; Lawrenz, M.; Bowman, G. R.; Konerding, D. E.; Belov, D.; Altman, R. B.; Pande, V. S. Cloud-based simulations on google exacycle reveal ligand modulation of GPCR activation pathways. Nat. Chem. 2014, 6, 15−21. (14) Lindorff-Larsen, K.; Piana, S.; Dror, R. O.; Shaw, D. E. How fast-folding proteins fold. Science 2011, 334, 517−520.

ASSOCIATED CONTENT

* Supporting Information S

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jctc.6b00967. Amino-acid sequence of peptides and proteins in this work; simulation trajectories of CLN025; one-dimensional free-energy profiles of Trpcage and CLN025 measured by various P-ITS simulations; simulation trajectories of GB 1p and its populated structure clusters in P-ITS simulation; simulation trajectories of Trpcage and CLN025 simulated by P-ITS implemented with Metropolis criterion controlled conformation exchange; one-dimensional free-energy profiles calculated at different simulation times indicating the simulation convergence; representative structure of the unfolded state of Trpcage; helix structure of GB 1p with turn configuration adopted in the C-terminal region; hydrogen bonding donor−acceptor distances along the number of hydrogen bonds formed for CLN025 and GB 1p; two-dimensional free-energy landscape as the function of backbone RMSD and Rg for Trpcage, CLN025, and GB 1p; temperature dependence of backbone RMSD distribution of Trpcage, CLN025, GB1P, and GTT; temperature dependence of the distributions of D9-R16 salt-bridge distance and the hydrogen bonding distance between W6 side-chain Nε atom and R16 backbone O atom of Trpcage; initial configuration of the simulation system for explicit solvent P-ITS and REMD and temperature histories of one replica in REMD; temperature dependence of the heat capacity of CLN025 and GTT and unfolding entropy profile of GTT; one-dimensional free-energy profile as a function of Q of GB3 protein (PDF) 1241

DOI: 10.1021/acs.jctc.6b00967 J. Chem. Theory Comput. 2017, 13, 1229−1243

Article

Journal of Chemical Theory and Computation (15) Piana, S.; Klepeis, J. L.; Shaw, D. E. Assessing the accuracy of physical models used in protein-folding simulations: Quantitative evidence from long molecular dynamics simulations. Curr. Opin. Struct. Biol. 2014, 24, 98−105. (16) Henry, E. R.; Best, R. B.; Eaton, W. A. Comparing a simple theoretical model for protein folding with all-atom molecular dynamics simulations. Proc. Natl. Acad. Sci. U. S. A. 2013, 110, 17880−17885. (17) Best, R. B. Atomistic molecular simulations of protein folding. Curr. Opin. Struct. Biol. 2012, 22, 52−61. (18) Lane, T. J.; Shukla, D.; Beauchamp, K. A.; Pande, V. S. To milliseconds and beyond: Challenges in the simulation of protein folding. Curr. Opin. Struct. Biol. 2013, 23, 58−65. (19) Voelz, V. A.; Bowman, G. R.; Beauchamp, K.; Pande, V. S. Molecular simulation of ab initio protein folding for a millisecond folder NTL9(1−39). J. Am. Chem. Soc. 2010, 132, 1526−1528. (20) Shukla, D.; Meng, Y. L.; Roux, B.; Pande, V. S. Activation pathway of Src kinase reveals intermediate states as targets for drug design. Nat. Commun. 2014, 5, 3397. (21) Dror, R. O.; Arlow, D. H.; Maragakis, P.; Mildorf, T. J.; Pan, A. C.; Xu, H. F.; Borhani, D. W.; Shaw, D. E. Activation mechanism of the β2-adrenergic receptor. Proc. Natl. Acad. Sci. U. S. A. 2011, 108, 18684−18689. (22) Dror, R. O.; Pan, A. C.; Arlow, D. H.; Borhani, D. W.; Maragakis, P.; Shan, Y. B.; Xu, H. F.; Shaw, D. E. Pathway and mechanism of drug binding to G-protein-coupled receptors. Proc. Natl. Acad. Sci. U. S. A. 2011, 108, 13118−13123. (23) Shan, Y. B.; Kim, E. T.; Eastwood, M. P.; Dror, R. O.; Seeliger, M. A.; Shaw, D. E. How does a drug molecule find its target binding site? J. Am. Chem. Soc. 2011, 133, 9181−9183. (24) Vashisth, H.; Skiniotis, G.; Brooks, C. L. Collective variable approaches for single molecule flexible fitting and enhanced sampling. Chem. Rev. 2014, 114, 3353−3365. (25) Maximova, T.; Moffatt, R.; Ma, B.; Nussinov, R.; Shehu, A. Principles and overview of sampling methods for modeling macromolecular structure and dynamics. PLoS Comput. Biol. 2016, 12, e1004619. (26) Dror, R. O.; Dirks, R. M.; Grossman, J. P.; Xu, H. F.; Shaw, D. E. Biomolecular simulation: A computational microscope for molecular biology. Annu. Rev. Biophys. 2012, 41, 429−452. (27) Bernardi, R. C.; Melo, M. C. R.; Schulten, K. Enhanced sampling techniques in molecular dynamics simulations of biological systems. Biochim. Biophys. Acta, Gen. Subj. 2015, 1850, 872−877. (28) Mayor, U.; Johnson, C. M.; Daggett, V.; Fersht, A. R. Protein folding and unfolding in microseconds to nanoseconds by experiment and simulation. Proc. Natl. Acad. Sci. U. S. A. 2000, 97, 13518−13522. (29) Jang, S. M.; Kim, E.; Shin, S.; Pak, Y. Ab initio folding of helix bundle proteins using molecular dynamics simulations. J. Am. Chem. Soc. 2003, 125, 14841−14846. (30) Earl, D. J.; Deem, M. W. Parallel tempering: Theory, applications, and new perspectives. Phys. Chem. Chem. Phys. 2005, 7, 3910−3916. (31) Park, S.; Pande, V. S. Choosing weights for simulated tempering. Phys. Rev. E 2007, 76, 016703. (32) Sugita, Y.; Okamoto, Y. Replica-exchange molecular dynamics method for protein folding. Chem. Phys. Lett. 1999, 314, 141−151. (33) Nymeyer, H.; Gnanakaran, S.; Garcia, A. E. Atomic simulations of protein folding, using the replica exchange algorithm. Methods Enzymol. 2004, 383, 119−149. (34) Yang, L.; Liu, C. W.; Shao, Q.; Zhang, J.; Gao, Y. Q. From thermodynamics to kinetics: Enhanced sampling of rare events. Acc. Chem. Res. 2015, 48, 947−955. (35) Gao, Y. Q. An integrate-over-temperature approach for enhanced sampling. J. Chem. Phys. 2008, 128, 064105. (36) Christ, C. D.; van Gunsteren, W. F. Enveloping distribution sampling: A method to calculate free energy differences from a single simulation. J. Chem. Phys. 2007, 126, 184110. (37) Lin, Z.; van Gunsteren, W. F. Enhanced conformational sampling using enveloping distribution sampling. J. Chem. Phys. 2013, 139, 144105.

(38) Yang, L.; Shao, Q.; Gao, Y. Q. Comparison between integrated and parallel tempering methods in enhanced sampling simulations. J. Chem. Phys. 2009, 130, 124111. (39) Yang, L.; Gao, Y. Q. A selective integrated tempering method. J. Chem. Phys. 2009, 131, 214109. (40) van der Spoel, D.; Seibert, M. M. Protein folding kinetics and thermodynamics from atomistic simulations. Phys. Rev. Lett. 2006, 96, 238102. (41) Lei, H.; Wu, C.; Liu, H.; Duan, Y. Folding free-energy landscape of villin headpiece subdomain from molecular dynamics simulations. Proc. Natl. Acad. Sci. U. S. A. 2007, 104, 4925−4930. (42) Garcia, A. E.; Onuchic, J. N. Folding a protein in a computer: An atomic description of the folding/unfolding of protein a. Proc. Natl. Acad. Sci. U. S. A. 2003, 100, 13898−13903. (43) Shao, Q. Probing sequence dependence of folding pathway of αhelix bundle proteins through free energy landscape analysis. J. Phys. Chem. B 2014, 118, 5891−5900. (44) Shao, Q.; Shi, J.; Zhu, W. Enhanced sampling molecular dynamics simulation captures experimentally suggested intermediate and unfolded states in the folding pathway of Trp-cage miniprotein. J. Chem. Phys. 2012, 137, 125103. (45) Yang, L.; Shao, Q.; Gao, Y. Q. Thermodynamics and folding pathways of trpzip2: An accelerated molecular dynamics simulation study. J. Phys. Chem. B 2009, 113, 803−808. (46) Shao, Q.; Gao, Y. Q. The relative helix and hydrogen bond stability in the B domain of protein A as revealed by integrated tempering sampling molecular dynamics simulation. J. Chem. Phys. 2011, 135, 135102. (47) Shao, Q.; Gao, Y. Q. Temperature dependence of hydrogenbond stability in β-hairpin structures. J. Chem. Theory Comput. 2010, 6, 3750−3760. (48) Barua, B.; Lin, J. C.; Williams, V. D.; Kummler, P.; Neidigh, J. W.; Andersen, N. H. The trp-cage: Optimizing the stability of a globular miniprotein. Protein Eng., Des. Sel. 2008, 21, 171−185. (49) Honda, S.; Akiba, T.; Kato, Y. S.; Sawada, Y.; Sekijima, M.; Ishimura, M.; Ooishi, A.; Watanabe, H.; Odahara, T.; Harata, K. Crystal structure of a ten-amino acid protein. J. Am. Chem. Soc. 2008, 130, 15327−15331. (50) Gronenborn, A. M.; Filpula, D. R.; Essig, N. Z.; Achari, A.; Whitlow, M.; Wingfield, P. T.; Clore, G. M. A novel, highly stable fold of the immunoglobulin binding domain of streptococcal protein G. Science 1991, 253, 657−661. (51) Nguyen, H.; Maier, J.; Huang, H.; Perrone, V.; Simmerling, C. Folding simulations for proteins with diverse topologies are accessible in days with a physics-based force field and implicit solvent. J. Am. Chem. Soc. 2014, 136, 13959−13962. (52) Piana, S.; Sarkar, K.; Lindorff-Larsen, K.; Guo, M. H.; Gruebele, M.; Shaw, D. E. Computational design and experimental testing of the fastest-folding β-sheet protein. J. Mol. Biol. 2011, 405, 43−48. (53) Ulmer, T. S.; Ramirez, B. E.; Delaglio, F.; Bax, A. Evaluation of backbone proton positions and dynamics in a small protein by liquid crystal NMR spectroscopy. J. Am. Chem. Soc. 2003, 125, 9179−9191. (54) Metropolis, N.; Rosenbluth, A. W.; Rosenbluth, M. N.; Teller, A. H.; Teller, E. Equation of state calculations by fast computing machines. J. Chem. Phys. 1953, 21, 1087−1092. (55) Hukushima, K.; Nemoto, K. Exchange monte carlo method and application to spin glass simulations. J. Phys. Soc. Jpn. 1996, 65, 1604− 1608. (56) Faraldo-Gomez, J. D.; Roux, B. Characterization of conformational equilibria through hamiltonian and temperature replicaexchange simulations: Assessing entropic and environmental effects. J. Comput. Chem. 2007, 28, 1634−1647. (57) Meng, Y. L.; Dashti, D. S.; Roitberg, A. E. Computing alchemical free energy differences with hamiltonian replica exchange molecular dynamics (H-REMD) simulations. J. Chem. Theory Comput. 2011, 7, 2721−2727. (58) Garcia, A. E.; Sanbonmatsu, K. Y. Exploring the energy landscape of a β-hairpin in explicit solvent. Proteins: Struct., Funct., Genet. 2001, 42, 345−354. 1242

DOI: 10.1021/acs.jctc.6b00967 J. Chem. Theory Comput. 2017, 13, 1229−1243

Article

Journal of Chemical Theory and Computation (59) Case, D. A.; Babin, V.; Berryman, J. T.; Betz, R. M.; Cai, Q.; Cerutti, D. S.; Cheatham, T. E., III; Darden, T. A.; Duke, R. E.; Gohlke, H. et al. AMBER 14; University of California: San Francisco, CA, 2014. (60) Nguyen, H.; Roe, D. R.; Simmerling, C. Improved generalized born solvent model parameters for protein simulations. J. Chem. Theory Comput. 2013, 9, 2020−2034. (61) Ryckaert, J. P.; Ciccotti, G.; Berendsen, H. J. C. Numericalintegration of cartesian equations of motion of a system with constraints - molecular dynamics of n-alkanes. J. Comput. Phys. 1977, 23, 327−341. (62) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 1983, 79, 926−935. (63) Chaudhury, S.; Olson, M. A.; Tawa, G.; Wallqvist, A.; Lee, M. S. Efficient conformational sampling in explicit solvent using a hybrid replica exchange molecular dynamics method. J. Chem. Theory Comput. 2012, 8, 677−687. (64) Kannan, S.; Zacharias, M. Enhanced sampling of peptide and protein conformations using replica exchange simulations with a peptide backbone biasing-potential. Proteins: Struct., Funct., Genet. 2007, 66, 697−706. (65) Patriksson, A.; van der Spoel, D. A temperature predictor for parallel tempering simulations. Phys. Chem. Chem. Phys. 2008, 10, 2073−2077. (66) Nadler, W.; Hansmann, U. H. E. Optimized explicit-solvent replica exchange molecular dynamics from scratch. J. Phys. Chem. B 2008, 112, 10386−10387. (67) Darden, T.; York, D.; Pedersen, L. Particle mesh Ewald: An N· log(N) method for Ewald sums in large systems. J. Chem. Phys. 1993, 98, 10089−10092. (68) Shao, Q.; Shi, J. Y.; Zhu, W. Molecular dynamics simulation indicating cold denaturation of β-hairpins. J. Chem. Phys. 2013, 138, 085102. (69) Shao, Q. Folding or misfolding: The choice of β-hairpin. J. Phys. Chem. B 2015, 119, 3893−3900. (70) Shao, Q. Important roles of hydrophobic interactions in folding and charge interactions in misfolding of α-helix bundle protein. RSC Adv. 2015, 5, 4191−4199. (71) Shao, Q.; Zhu, W.; Gao, Y. Q. Robustness in protein folding revealed by thermodynamics calculations. J. Phys. Chem. B 2012, 116, 13848−13856. (72) Feig, M.; Karanicolas, J.; Brooks, C. L. MMTSB tool set: Enhanced sampling and multiscale modeling methods for applications in structural biology. J. Mol. Graphics Modell. 2004, 22, 377−395. (73) Granata, D.; Camilloni, C.; Vendruscolo, M.; Laio, A. Characterization of the free-energy landscapes of proteins by NMRguided metadynamics. Proc. Natl. Acad. Sci. U. S. A. 2013, 110, 6817− 6822. (74) Zhou, C. Y.; Jiang, F.; Wu, Y. D. Folding thermodynamics and mechanism of five Trp-cage variants from replica-exchange md simulations with RSFF2 force field. J. Chem. Theory Comput. 2015, 11, 5473−5480. (75) Rovo, P.; Farkas, V.; Hegyi, O.; Szolomajer-Csikos, O.; Toth, G. K.; Perczel, A. Cooperativity network of Trp-cage miniproteins: Probing salt-bridges. J. Pept. Sci. 2011, 17, 610−619. (76) Ahmed, Z.; Beta, I. A.; Mikhonin, A. V.; Asher, S. A. UVresonance Raman thermal unfolding study of Trp-cage shows that it is not a simple two-state miniprotein. J. Am. Chem. Soc. 2005, 127, 10943−10950. (77) Mok, K. H.; Kuhn, L. T.; Goez, M.; Day, I. J.; Lin, J. C.; Andersen, N. H.; Hore, P. J. A pre-existing hydrophobic collapse in the unfolded state of an ultrafast folding protein. Nature 2007, 447, 106− 109. (78) Du, D.; Zhu, Y.; Huang, C. Y.; Gai, F. Understanding the key factors that control the rate of β-hairpin folding. Proc. Natl. Acad. Sci. U. S. A. 2004, 101, 15915−15920.

(79) Du, D.; Tucker, M. J.; Gai, F. Understanding the mechanism of β-hairpin folding via Φ-value analysis. Biochemistry 2006, 45, 2668− 2678. (80) Enemark, S.; Kurniawan, N. A.; Rajagopalan, R. β-Hairpin forms by rolling up from C-terminal: Topological guidance of early folding dynamics. Sci. Rep. 2012, 2, 649. (81) Shao, Q.; Yang, L. J.; Gao, Y. Q. Structure change of β-hairpin induced by turn optimization: An enhanced sampling molecular dynamics simulation study. J. Chem. Phys. 2011, 135, 235104. (82) Shao, Q.; Wei, H.; Gao, Y. Q. Effects of turn stability and sidechain hydrophobicity on the folding of β-structures. J. Mol. Biol. 2010, 402, 595−609. (83) Best, R. B.; Mittal, J. Microscopic events in β-hairpin folding from alternative unfolded ensembles. Proc. Natl. Acad. Sci. U. S. A. 2011, 108, 11087−11092. (84) Zhang, J.; Qin, M.; Wang, W. Folding mechanism of β-hairpins studied by replica exchange molecular simulations. Proteins: Struct., Funct., Genet. 2006, 62, 672−685. (85) Munoz, V.; Thompson, P. A.; Hofrichter, J.; Eaton, W. A. Folding dynamics and mechanism of β-hairpin formation. Nature 1997, 390, 196−199. (86) Day, R.; Paschek, D.; Garcia, A. E. Microsecond simulations of the folding/unfolding thermodynamics of the Trp-cage miniprotein. Proteins: Struct., Funct., Genet. 2010, 78, 1889−1899. (87) Cobos, E. S.; Iglesias-Bexiga, M.; Ruiz-Sanz, J.; Mateo, P. L.; Luque, I.; Martinez, J. C. Thermodynamic characterization of the folding equilibrium of the human Nedd4-WW4 domain: At the frontiers of cooperative folding. Biochemistry 2009, 48, 8712−8720. (88) Best, R. B.; Hummer, G.; Eaton, W. A. Native contacts determine protein folding mechanisms in atomistic simulations. Proc. Natl. Acad. Sci. U. S. A. 2013, 110, 17874−17879. (89) Liu, F.; Nakaema, M.; Gruebele, M. The transition state transit time of WW domain folding is controlled by energy landscape roughness. J. Chem. Phys. 2009, 131, 195101. (90) Liu, F.; Du, D.; Fuller, A. A.; Davoren, J. E.; Wipf, P.; Kelly, J. W.; Gruebele, M. An experimental survey of the transition between two-state and downhill protein folding scenarios. Proc. Natl. Acad. Sci. U. S. A. 2008, 105, 2369−2374. (91) Roe, D. R.; Okur, A.; Wickstrom, L.; Hornak, V.; Simmerling, C. Secondary structure bias in generalized born solvent models: Comparison of conformational ensembles and free energy of solvent polarization from explicit and implicit solvation. J. Phys. Chem. B 2007, 111, 1846−1857. (92) Zhou, R. H.; Berne, B. J. Can a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water? Proc. Natl. Acad. Sci. U. S. A. 2002, 99, 12777−12782. (93) Bursulaya, B. D.; Brooks, C. L. Comparative study of the folding free energy landscape of a three-stranded β-sheet protein with explicit and implicit solvent models. J. Phys. Chem. B 2000, 104, 12378−12383. (94) Wirth, A. J.; Liu, Y. X.; Prigozhin, M. B.; Schulten, K.; Gruebele, M. Comparing fast pressure jump and temperature jump protein folding experiments and simulations. J. Am. Chem. Soc. 2015, 137, 7152−7159. (95) Beccara, S. A.; Skrbic, T.; Covino, R.; Faccioli, P. Dominant folding pathways of a WW domain. Proc. Natl. Acad. Sci. U. S. A. 2012, 109, 2330−2335. (96) Krivov, S. V. The free energy landscape analysis of protein (Fip35) folding dynamics. J. Phys. Chem. B 2011, 115, 12315−12324. (97) Best, R. B.; Hummer, G. Microscopic interpretation of folding Φ-values using the transition path ensemble. Proc. Natl. Acad. Sci. U. S. A. 2016, 113, 3263−3268. (98) Baxa, M. C.; Yu, W.; Adhikari, A. N.; Ge, L.; Xia, Z.; Zhou, R. H.; Freed, K. F.; Sosnick, T. R. Even with nonnative interactions, the updated folding transition states of the homologs proteins G & L are extensive and similar. Proc. Natl. Acad. Sci. U. S. A. 2015, 112, 8302− 8307.

1243

DOI: 10.1021/acs.jctc.6b00967 J. Chem. Theory Comput. 2017, 13, 1229−1243