Integrating Multiple Accelerated Molecular Dynamics To Improve

Feb 2, 2018 - ... to systematically test the accuracy, efficiency, and robustness of IaMD. .... generated using the 2D graphics package Matplotlib(54)...
0 downloads 3 Views 6MB Size
Article pubs.acs.org/JCTC

Cite This: J. Chem. Theory Comput. 2018, 14, 1216−1227

Integrating Multiple Accelerated Molecular Dynamics To Improve Accuracy of Free Energy Calculations Xiangda Peng,†,‡ Yuebin Zhang,† Yan Li,† QingLong Liu,† Huiying Chu,† Dinglin Zhang,† and Guohui Li*,† †

Laboratory of Molecular Modeling and Design, State Key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, P. R. China ‡ Chinese Academy of Science, University of Chinese Academy Sciences, Beijing 100049, P. R. China S Supporting Information *

ABSTRACT: Accelerated Molecular Dynamics (aMD) is a promising enhanced sampling method to explore the conformational space of biomolecules. However, the large statistical noise in reweighting limits its accuracy to recover the original free energy profile. In this work, we propose an Integrated accelerated Molecule Dynamics (IaMD) method by integrating a series of aMD subterms with different acceleration parameters to improve the sampling efficiency and maintain the reweighting accuracy simultaneously. We use Alanine Dipeptide and three fast-folded proteins (Chignolin, Trp-cage, and Villin Headpiece) as the test objects to compare our IaMD method with aMD systematically. These case studies indicate that the statistical noise of IaMD in reweighting for free energy profiles is much smaller than that of aMD at the same level of acceleration and simulation time. To achieve the same accuracy as IaMD, aMD requires 1−3 orders of magnitude longer simulation time, depending on the complexity of the simulated system and the level of acceleration. Our method also outperforms aMD in controlling systematic error caused by the disappearance of the low-energy conformations when high acceleration parameters are used in aMD simulations for fast-folded proteins. Furthermore, the performance comparison between IaMD and the Integrated Tempering Sampling (ITS) in the case of Alanine Dipeptide demonstrates that IaMD possesses a better ability to control the potential energy region of sampling.



INTRODUCTION In molecular dynamics (MD) simulations for the biomolecules, sampling for the substantial conformational changes over a long time scale (usually in milliseconds or even longer time scale1) is typically necessary because these movements may be functionally related. However, the present MD simulation time only achieves the scale from hundreds-of-nanoseconds to tens-ofmicroseconds,2−4 posing a challenge for accurate estimation of statistical properties, such as free energy. To improve the sampling efficiency, much effort has been conducted to develop or improve various enhanced sampling algorithms.5−18 Accelerated Molecular Dynamics (aMD) is one of “unconstrained” enhanced sampling that does not depend on the predefined collective variable (CV). It can boost the exploration of biomolecular conformational space without the need for a priori guidance about the characteristics of structural change.14 By adding a boost potential, aMD flattens the potential energy surface and thus improves the probability of crossing between different potential energy minimums. Hundreds-of-nanosecond aMD simulations match the millisecond time scale sampling of conventional MD (cMD) for a variety of biomolecules.19−22 However, aMD suffers from large statistical noise generated by the “exponential average” reweighting. In this reweighting method, the estimated statistics (such as free energy) are © 2018 American Chemical Society

recovered based on the average of reweighting factors ⟨exp(βΔV(x))⟩s,23 where ΔV(x) is the boost potential of configuration x which belong to position s of selected CV and β = 1.0/kBT, kB is the Boltzmann constant, and T is the temperature in simulation. The average of reweighting factors is mainly dominated by the states with high boost ΔV. These states, which also have low potential energy, are rare in aMD simulations, and thus cause the fluctuating of the average reweighting factor. Moreover, if some of the states with high boost potential exist in the cMD simulation but “disappear” in aMD simulations, systematic errors in reweighting may also occur. To improve reweighting accuracy, a variety of improvements have been proposed, such as reducing the degree of freedom associated with modified potential energy,21,24,25 adjusting the form of bias,26,27 using the Maclaurin series or cumulant expansion to replace exponential reweighting,4,24 replacing exponential reweighting by population-based reweighting in scaled MD,28 and combining the Replica Exchange method with aMD.29−31 A critical factor in solving the reweighting problem is to yield sufficient conformations with low potential energy to estimate Received: December 1, 2017 Published: February 2, 2018 1216

DOI: 10.1021/acs.jctc.7b01211 J. Chem. Theory Comput. 2018, 14, 1216−1227

Article

Journal of Chemical Theory and Computation

According to the theory of Hamelberg, the small Ne(s) reduces the accuracy of − kBT ln(Ne(s)).48 It also leads to an unreliable estimation of ΔVmax(s). In a well-equilibrated aMD simulation under NVT or NPT ensemble, the boost potential weakens the contribution of potential energy to the free energy change and thus enhances the contribution of conformational entropy S to free energy change (in the form of −TS). As a consequence, states tend to be distributed in high entropy regions in an aMD simulation. The conformational entropy of the high potential region is larger than that of the low potential region since the latter usually consists of a few states with specific structural features, especially for biomolecules. Hence, the boost potential of aMD also reduces the sampling for low potential states when the number of low entropy states decreases. As the molecular complexity or acceleration level increases, the number of low potential states and the effective sampling number Ne decrease, resulting in more statistical noise. If some of the states with low potential appear in the cMD simulation but disappear in aMD simulations, systematic errors in reweighting may occur. Integrated Accelerated Molecule Dynamics (IaMD). Inspired by ITS,15 we combined multiple aMDs with varying levels of acceleration to improve the sampling efficiency while maintaining a sufficient number of low potential conformations. In the original of ITS, a series of Boltzmann functions at different temperatures (e.g., T1 to TN) were integrated to a generalized distribution function to avoid multiple parallel tempering calculations and kinetic energy equilibration problems after each exchange event in Replica Exchange Molecular Dynamics (REMD),15 as shown in eq 6

the average of the reweighting factor accurately. In the Integrated Tempering Sampling (ITS) algorithm of Gao,15 the Boltzmann functions at different temperatures are integrated into one expression to facilitate the conformation sampling in a broad range from low to high potential energy. ITS and its variants32−37 have been applied to study protein folding,38−44 molecular clusters,45 path sampling,46 and chemical reactions.47 Thus, in this work, we utilize a similar framework that couple multiple aMDs with varying levels of acceleration together to ensure accurate reweighting. We name this new improvement aMD-type algorithm as the Integrated accelerated Molecule Dynamics (IaMD). It inherits the advantages of both aMD and ITS and avoids the defects of aMD in reweighting. Four systems, Alanine Dipeptide and three fast-folded proteins Chignolin, Trp-cage, and Villin Headpiece, are exploited to demonstrate parameter determination procedure and to systematically test the accuracy, efficiency, and robustness of IaMD.



THEORY AND METHOD Accelerated Molecular Dynamics and Its Reweighting. The boost potential ΔV of aMD is a continuous nonnegative function as shown in eq 1 ⎧ (E − V (x))2 ⎪ , E > V (x ) ΔV (x) = ⎨ α + E − V (x) ⎪ 0, E ≤ V (x) ⎩

(1)

where V(x) is the potential energy of state x, and E and α are preset tuning parameters which determine the level of acceleration of ΔV. E serves as threshold energy. When the original potential V is less than E, the simulation is performed with the effective potential V′(x) = V(x) + ΔV(x). Otherwise, the simulation is performed with the original potential. α is a tuning parameter that controls the flatness of the effective potential energy. After the simulation biased by aMD, the correct canonical probability distribution is obtained via multiplying the probability of states by a reweighting factor w. w(x) = e βΔV (x)

e−βV ′ (x) =

∑ x∈s

N (s)⟨w(x)⟩s w(x) = s wmax (s) wmax (s)

(2)

V ′(x) = V (x) + ΔV (x) = −

⎞ 1 ⎛ ln⎜⎜∑ nie−β(V (x) +ΔVi(x))⎟⎟ β ⎝ i ⎠ (7)

(3)

The total boost potential ΔV(r) is simply ΔV (x) = −

⎞ 1 ⎛ ln⎜⎜∑ nie−β ΔVi(x)⎟⎟ β ⎝ i ⎠

(8)

Eq 8 is a general form equation to combine the boost potentials with different levels of acceleration. For example, if we define ΔVi as λi(V1 − V0) where λi is the coupling parameter, V0 and V1 are the two “end-state” Hamiltonians, and eq 8 is actually equivalent to the Integrated Hamiltonian Sampling (IHS) protocol, which is another flexible and versatile ITS-type method for free energy estimating and conformational sampling.49 In fact, any form of ΔVi can be selected for special purposes. In the proposed IaMD method, we select the aMDtype boost potential (eq 1) as the boost subterm ΔVi to improve the accuracy of reweighting for the aMD simulation and share the advantages of ITS and aMD. The original force

(4)

where wmax(s) is the maximum w(x) at s. Based on eqs 3 and 4, we can also represent F(s) as a function of Ne(s) F(s) = −kBT ln(Ne(s)) − ΔVmax(s) + C

(6)

where ni is a weight factor which determines the contribution of term i to the total distribution, βi = 1/(kBTi), and V′ is the modified effective potential. Replace the original potential V with V′, ITS samples the conformations over a wide range of potential energy. Define the ΔVi as (βi/β − 1) × V, and eq 6 can be rewritten as

where s is a point on the CVs, and Ns(s) is the number of sampling points at s. As described above, states with different boost potentials have different weights for the calculation of ⟨w(x)⟩s. States, whose boost potentials are much smaller than the highest value, give negligible contributions to the average reweighting factor. Here, we use the effective sampling number Ne which is proposed by Shen and Hamelberg48 to measure the “equivalent number” in the calculation of ⟨w(x)⟩s. It is written as Ne(s) =

i

i

According to eq 3, the free energies along selected CVs can also be recovered F(s) = −kBT[ln(Ns(s)) + ln(⟨w(x)⟩s )] + C

∑ nie−β V (x)

(5) 1217

DOI: 10.1021/acs.jctc.7b01211 J. Chem. Theory Comput. 2018, 14, 1216−1227

Article

Journal of Chemical Theory and Computation should be modified by the following equation in IaMD simulations ∑i ⎡⎣ki(x) ·ni ·e−β(ΔVi(x))⎤⎦ F ′(x) = F(x) × ∑i ⎡⎣ni ·e−β(ΔVi(x))⎤⎦

αi =

(9)

(10)

Similar to aMD and ITS, IaMD has little extra cost to the computational efficiency. Determine Acceleration Parameters. To efficiently sample conformations with low to high potential energy, the parameters (Ei, αi) and ni should be determined. The parameter determination process consists of two main parts. First, select a series of aMD subitems with different (Ei, αi). Then, balance the contribution of each aMD subitem on the total boost potential through the parameter ni. We set the first aMD subterm as a protocol without any acceleration so that enough low-energy conformations are generated in IaMD simulations. To this end, a short cMD simulation was carried out, and then E1 was set as a value smaller than the minimum potential energy of the cMD. α1 can be set to an arbitrary nonzero value. ΔV1 is always equal to 0.0 for all frames of IaMD simulations. Then, for simplicity, we set Ei of all other aMD subterms as the same value and use EN to refer to this value (N is the number of subterms). Since the potential energy beyond EN is not flattened, it determines the upper bound of the potential energy range that can be sampled in IaMD simulations. In aMD, the threshold energy E is set to a value above the average potential energy of the cMD simulation, based on the number of atoms or residues.50,51 In IaMD, EN can also be set to this value or a higher value to achieve higher acceleration. The parameter αi controls the acceleration level of each aMD subterm. In a separate aMD simulation, as the parameter α decreases, the location of the potential distribution peak moves from the low to high potential energy region, as shown in Figures S1A and S1C. It is difficult for the distribution peak to exceed threshold E because the boost potential does not work when the potential energy is greater than E. We find a value of α that leads the potential distribution peak to just approach the location of E and assign it to αN of IaMD. Supposing the lowest potential energy in the IaMD simulation approaches the Emin of cMD, the largest ΔVi of aMD subterm i is expressed as a function of αi ΔVi ,max =

(Ei − Emin)2 αi + Ei − Emin

(12)

Hereafter, a series of short aMD simulations with the determined parameters (αi, Ei) are implemented respectively to check the potential energy distributions and to obtain the ni value of each subterm. It is preferable that the potential energies from the short aMD simulations with a series of (Ei, αi) are evenly distributed over a broad interval (as schematically illustrated in the Figures S1A and S1C in the Supporting Information) to yield a homogeneous total distribution (the solid green line in Figures S1B and S1D). However, for the system in low dimensions, such as the torsional potential term of Alanine Dipeptide, these distributions of potential energies have a lot of overlaps in the low-energy area. To solve this problem, we use the cubic spline interpolation algorithm to recalculate the values of α2, α3, ..., and αN−1, based on the results of performed aMD simulations. The details of the implementation are described in Section 2 (step 7) of the Supporting Information. The weight parameter ni in IaMD controls the contribution of the aMD subterms to the total boost potential ΔV(x) in IaMD. In ITS,15,32 these parameters are determined by an iterative procedure. In IHS49 and the work of Yang,52 the weighted histogram analysis method (WHAM) is employed to facilitate the optimization of these parameters. In the approach of Lu and collaborators,53 the weight parameters can be obtained directly according to the intersections of potential distribution curves (or the midpoint of potential canonical averages) of adjacent two separate aMD simulations, avoiding the iterative process (the method is referred to as “direct method” in the following). All these methods are used to calculate the weight parameters of IaMD for Alanine Dipeptide. The results obtained by different methods are consistent with each other (Section 1 of the Supporting Information). Hence, the direct method using the intersections of potential distribution curves is selected for all cases in this work. Here is a brief introduction. For convenience, define Mi as −kBT ln(ni), and eq 8 can be derived as

where ki(r) is the scale factor of force in aMD subterms, as shown in eq 10. ⎧⎛ ⎞2 αi ⎪ ⎪⎜ ⎟ , Ei > V (x) ki(x) = ⎨ ⎝ αi + Ei − V (x) ⎠ ⎪ ⎪ 1.0, Ei ≤ V (x) ⎩

(Ei − Emin)2 − Ei + Emin ΔVi ,max

ΔV (x) = −

⎞ 1 ⎛ ln⎜⎜∑ e−β[ΔVi(x) + Mi]⎟⎟ β ⎝ i ⎠

(13)

Denote the potential energy at the intersection of two adjacent potential distribution curves from aMD simulations with (Ei, αi) and (Ei, αi+1) as Vxi,i+1 (the locations of the black dots shown in Figures S1A and S1C). To balance the contributions to the total boost potential of IaMD, ΔVi and ΔVi+1 can be set to the same value when the potential energy is equal to Vxi,i+1. Accordingly, the relationship between Mi and Mi+1 can be established Mi +

(11)

(Ei − Vix, i + 1)2 αi + Ei − Vix, i + 1

= Mi + 1 +

(Ei + 1 − Vix, i + 1)2 αi + 1 + Ei + 1 − Vix, i + 1 (14)

As mentioned earlier, ΔV1,max = 0 and ΔVN,max can be calculated using eq 11 and the determined (EN, αN). We set ΔV2,max, ΔV3,max, ..., and ΔVN−1,max as an arithmetic sequence from 0 to ΔVN,max and hope to increase the acceleration level of subterms evenly. Then α2, α3, ..., and αN−1 can be determined through eq 12.

or Mi + 1 = Mi +

(Ei − Vix, i + 1)2 αi + Ei − Vix, i + 1



(Ei + 1 − Vix, i + 1)2 αi + 1 + Ei + 1 − Vix, i + 1 (15)

1218

DOI: 10.1021/acs.jctc.7b01211 J. Chem. Theory Comput. 2018, 14, 1216−1227

Article

Journal of Chemical Theory and Computation

Figure 1. Comparing the results of simulations with aMD (top), IaMD (middle), and ITS (bottom) for Alanine Dipeptide. (A) The distributions of torsional potential energies from seven levels of aMD, IaMD, or ITS simulations. (B) The standard deviation of free energy differences between αLhelical and extended β-sheet regions as a function of the average coverage. (C) The standard deviation of free energy differences between TS1 and extended β-sheet state as a function of the mean coverage. The marked points in (B) and (C), from left to right, represent the protocols from aMDA1 to aMD-A7, IaMD-A1 to IaMD-A7, or ITS-A1 to ITS-A7, respectively. The standard deviation and mean coverage of each protocol comes from eight independent 300 ns simulations with data collected every 0.1 ps. To count the coverage, φ−ψ space was divided into 20 × 20 blocks with the same size. The coverage can be calculated by dividing the number of sampled blocks by the total number of blocks.

All figures in this paper are generated using the 2D graphics package Matplotlib54 of Python. The images of the protein structures in Figures 4, 5, and 6 are prepared using the VMD program.55

Set M1 as 0.0, and then, all other Mi can be determined recursively by eq 15. Since ΔV1 and M1 are always equal to 0.0 and exp(−β(ΔV1(x)+M1)) ≡ 1.0, the sum of exp(−β(ΔVi(x)+Mi)) must be greater than 1.0. According to eq 13, ΔV is a negative value. For Alanine Dipeptide and Chignolin, the energy distributions of IaMD with the generated parameters are shown as the red lines of Figure S1B and Figure S1D, respectively. As expected, they are similar to the total distribution of all separated aMD simulations. The parameters of IaMD for Alanine Dipeptide and Chignolin are listed in Tables S2 and S6, respectively. The entire parameter determination process is summed up in eight steps and listed in Section 2 of the Supporting Information to facilitate the implementation. We also present a simplified version to reduce the subjective component of the parameter determination process. Simulation. For all simulations with aMD, IaMD, or ITS in this work, the boost potential was applied to all dihedrals of protein. In the case of Alanine Dipeptide, we respectively set up seven aMD protocols and seven IaMD protocols with different accelerations. These aMD parameters that refer to aMD-A1, aMD-A2, ..., aMD-A7, respectively, are listed in Table S1. The parameters of IaMD denoted as IaMD-A1, IaMD-A2, ..., IaMDA7, respectively, are listed in Table S2. Eight independent 300 ns simulations were performed with each level of aMD and IaMD. For Chignolin and Trp-cage, three levels of aMD or IaMD protocols (the parameters are listed Tables S5 and S6) were employed respectively to accelerate the folding process of them. For the folding of Villin Headpiece, only one aMD or IaMD protocol was selected. Five independent 1000 ns simulations were carried out with each aMD or IaMD protocol for these fast-folded proteins. The total simulation time of all system exceeds 100 μs. The details of the simulation are described in Section 3 of the Supporting Information.



RESULTS AND DISCUSSION Alanine Dipeptide. Alanine Dipeptide is a classic model system to evaluate the performance of enhanced sampling algorithms. Its Free Energy Profile (FEP) can be appropriately described by the dimensions dihedral φ and ψ. We first use this system to compare the efficiency and accuracy of aMD and IaMD. Seven levels of aMD or IaMD parameters were selected to accelerate conformational changes of Alanine Dipeptide. As α of aMD or αN of IaMD decreases, the conformations with higher torsional potential energy can be sampled (Figure 1A). However, the shapes of torsional potential distributions obtained by IaMD and aMD are significantly different. In IaMD simulations, the torsional potential energies are evenly distributed over a wide range from low to high energy. However, in aMD simulations, the torsional potential distributions are more concentrated in some regions, and as the distribution moves toward the high-energy region, the number of conformations with low torsional potential energy is significantly reduced. The difference of potential distributions leads to the difference of sampling efficiency and reweighting accuracy. We count the average coverage over ϕ−ψ space of eight simulations to measure the sampling efficiency and calculate the standard deviations of two free energy differences (ΔFαL−β and ΔFTS1−β) of eight parallel simulations to check the accuracy for each aMD or IaMD protocol. ΔFαL−β is the free energy difference between the αL-helical region (simplify denoted as αL) and global minimum (the extended β-sheet region, simplify 1219

DOI: 10.1021/acs.jctc.7b01211 J. Chem. Theory Comput. 2018, 14, 1216−1227

Article

Journal of Chemical Theory and Computation

Figure 2. Free energy profiles of Alanine Dipeptide. (A) The result obtained from a single aMD-A6 simulation. (B) The result obtained from a single IaMD-A7 simulation. The FEPs are calculated from 300 ns simulations with data collected every 0.1 ps. To give a clear distinction of free energy for the low-energy regions, the range of free energy represented by different colors is 0 to 7.2 kcal/mol, and the regions with free energy greater than 7.2 kcal/mol are represented as red. Contour lines are every 0.36 kcal/mol. All ϕ−ψ plots are sorted in a bin size of 7.2° × 7.2°.

Figure 3. Comparison of aMD (top) and IaMD (bottom) for Alanine Dipeptide. (A) Boost potentials in the simulations with seven levels of aMD for Alanine Dipeptide. (B) Boost potentials in the simulations with seven levels of IaMD for Alanine Dipeptide. (C) The number of effective sampling points in a single 300 ns simulation with aMD-A6 as a function of ϕ and ψ. (D) The number of effective sampling points in a single simulation with IaMD-A7 as a function of ϕ and ψ. (E) The maximum boost potential in an aMD-A6 simulation as a function of ϕ and ψ. (F) The maximum boost potential in an IaMD-A7 simulation as a function of ϕ and ψ. Color bar of (C) and (D) represents the order of magnitude of the effective sampling point numbers. The unit for the color bar of (E) and (F) is kcal/mol. All ϕ−ψ plots are sorted in a bin size of 7.2° × 7.2°.

denoted as β) on 2-D ϕ−ψ FEP of Alanine Dipeptide. It is used to check the free energy calculation performance of minimum states. ΔFTS1−β, the free energy difference between TS1 and β, is used to examine the free energy calculations for transition states. The locations of β, αL, and TS1 are marked in Figure 2B. Figures 1B and 1C respectively show the standard deviations of ΔFαL−β and ΔFTS1−β as a function of the average coverage at each acceleration level with aMD or IaMD. From Figure 1B, we can see that the standard deviation of ΔFαL−β obtained from IaMD simulations is significantly less than that of aMD simulations under similar average coverage. As the average coverage exceeds 90%, the standard deviation of ΔFαL−β increases rapidly in the aMD simulation but maintains a small

value (below 0.1 kcal/mol) in the IaMD simulation. The standard deviations of ΔFTS1−β are larger than that of ΔFαL−β at same coverage in the case of aMD and IaMD, as shown in Figure 1C. The performance of IaMD is still superior to aMD. When the average coverage exceeds 95%, the standard deviation which is about 0.8 kcal/mol in simulations with IaMD and exceeds 1.25 kcal/mol in simulations with aMD. Using eq 3, the 2-D ϕ−ψ FEPs of Alanine Dipeptide are estimated from a single 300 ns simulation with aMD-A6 (Figure 2A) and IaMD-A7 (Figure 2B), respectively. The levels of acceleration of aMD-A6 (with 95.6% coverage) and IaMDA7 (with 96.0% coverage) are high and are similar to each other. Contour lines of FEP from an aMD-A6 simulation are significantly rougher than the results of IaMD-A7. In spite of 1220

DOI: 10.1021/acs.jctc.7b01211 J. Chem. Theory Comput. 2018, 14, 1216−1227

Article

Journal of Chemical Theory and Computation the noise, the two FEPs are consistent with the result of 5 μs cMD (Figure S6A) or previous simulation results.24 In total with 5 μs cMD simulations, the coverage was up to 75%. While in 300 ns simulations with IaMD-A7, 75% ϕ−ψ space was sampled in merely 8 ns. The acceleration factor is about 500 times (Figure S6B). The outperformance of IaMD is mainly attributed to the characteristics of boost potential distributions. Figures 3A and 3B show the distributions of boost potential from the states at the β region. The number of conformations of boost potential near the highest value is the most in IaMD simulations but the least in aMD simulations. According to eq 5, the accuracy of the free energy F(s) is controlled by Ne(s) and ΔVmax(s). The boost potential distribution of IaMD helps to get large Ne. Figures 3C and 3D show the effective sampling numbers along ϕ−ψ space in single 300 ns aMD-A6 and IaMD-A7 simulations, respectively. The Ne(φ,ϕ) in IaMD-A7 is about 3 orders of magnitude larger than that in aMD-A6 at the same locations. At the bin of β, the Ne(φ,ϕ) of an aMD-A6 simulation is about 17 (Ne/Ns is about 0.013%), much smaller than the value at the same location in an IaMD-A7 simulation (Ne ≈ 24597 and Ne/ Ns ≈ 15%). In other words, to obtain the same effective sampling number Ne, the sampling number Ns in an aMD-A6 simulation needs about 1000 times that in an IaMD-A7 simulation. Since an aMD simulation yields very few states with high boost potential, the fluctuations of the estimated ΔVmax are also much larger than that in IaMD. As shown in Figures 3E and 3F, the ΔVmax(φ,ϕ) surface of aMD-A6 is significantly rougher than that of IaMD-A7. Furthermore, the ΔVmax(φ,ϕ) of aMD-A6 changes significantly with the coordinate φ or ϕ change, thus, its roughness has a great effect on the smoothness of the free energy profile. However, the free energy of IaMD-A7 is mainly affected by Ne(φ,ϕ) because the ΔVmax(φ,ϕ) variation of IaMD is within the range of 0.5 kcal/mol at most of the low free energy regions. It means that the regions with lower free energy can get more effective sampling points in an IaMD simulation. The large Ne and the small fluctuation of ΔVmax in IaMD simulations are beneficial to the accuracy of free energy. These properties are mainly derived from the sufficiently low potential states. Apart from this factor, a number of ΔV are close to the maximum value 0.0 in a considerable low torsional energy region that is dominated by the first aMD subterm of IaMD, as illustrated in Figure S3. It is because ΔV1 is always equal to 0 and the contributions from other subitems are very small in this region. That is to say, ΔV is not sensitive to potential energy change in the low potential energy region. To get more Ne in aMD simulations, a parameter with a low degree of acceleration should be selected, and the data used in reweighting should be frequently collected; but in IaMD, we can use very high acceleration parameters and avoid high collection frequency without considering the noise problem. More detailed discussions and presentations are given in Section 4 of the Supporting Information. IaMD vs ITS. We also compared the performances of IaMD and ITS in free energy calculations for Alanine Dipeptide. The subterms with high-temperature in ITS can be viewed as a scaling of the potential energy function by a constant less than 1.0.56 It is essentially a scaled MD similar to the method that was presented in a previous report.28 So the comparison between ITS and IaMD can be seen as a comparison of scaled MD and aMD. The essential difference between aMD and scaled MD is that aMD can control the range of boost potential

energy through the threshold E. Boost potential works only for the potential energies below the parameter E, and the potential energy is mainly distributed in regions lower than E in an aMD simulation. Hence, compared with ITS, IaMD can better control the range of the sampled potential energy and focus more on the important energy region. Take Alanine Dipeptide as an example. Five levels of ITS parameters were selected to accelerate conformational changes of Alanine Dipeptide. These parameters are generated using a procedure consistent with that used for IaMD, listed in the Table S4. For ITS and IaMD protocols with similar acceleration, such as ITS-A5 (with 96.4% coverage) and IaMD-A7 (with 96.0% coverage), the range of torsional potential energy in the simulation with former is broader than that with latter. At the same time, the number of sampling for low torsional potential conformations with former is smaller than that with latter, as shown in the middle and bottom columns in Figure 1A. As a consequence, the number of effective sampling points for ITS-A5 and IaMD-A7 is different (Table 1). In the region of low free energy, such as β and αL, Table 1. Average Effective Sampling Numbers of IaMD-A7 and ITS-A5 Simulations at Different Regions of the Free Energy Profile region

IaMD-A7

ITS-A5

β αL TS1 TS2

2941.14 246.95 1.47 1.13

1885.31 165.01 1.29 1.44

the Ne of IaMD-A7 is much more than that of ITS-A5. In the region of high free energy, such as TS1, the Ne of IaMD-A7 and ITS-A5 are significantly reduced. The former is still more than the latter. In the region with higher free energy such as TS2 (its location is marked in Figure 2B), the Ne of IaMD-A7 is less than that of ITS-A5. The decrease in the number of low potential states increases the statistical noise in reweighting for free energy when lacking enough effective sampling points in specific positions of the selected reaction coordinate. However, for the case of Alanine Dipeptide, the difference in Ne did not significantly affect the accuracy of free energy calculations. As shown in Figure 1B, due to the still enough effective sampling points, the fluctuations of ΔFαL−β from ITS simulations are approximately equal to that in IaMD simulations but smaller than that in aMD simulations. The standard deviations of ΔFTS1−β in ITS simulations with the high level of acceleration are slightly larger than the results in IaMD simulations with similar sampling efficiency, although the standard deviations are still better than that in aMD simulations (Figure 1C). Overall, there is no significant difference in the free energy profiles between ITS-A5 (Figure S8A) and IaMDA7. Since the main goal of this paper is to analyze and solve the errors in reweighting, we do not give a more systematic comparison of IaMD and ITS for other cases. Fast-Folded Proteins. Fast-folded proteins serve as an ideal model to investigate protein folding or to test sampling algorithms.57 These proteins possess the small size and fold on the microsecond time scale usually. From an unstructured state to a well-defined 3-D conformation, fast-folded proteins undergo substantial conformational changes, posing a big challenge for conformational sampling. We select three fastfolded proteins, Chignolin, Trp-cage, and Villin headpiece, to 1221

DOI: 10.1021/acs.jctc.7b01211 J. Chem. Theory Comput. 2018, 14, 1216−1227

Article

Journal of Chemical Theory and Computation

Figure 4. Folding of Chignolin. (A) The native structure. (B) The structure of the intermediate state. (C) 2D free energy profile of aMD-C2. (D) 2D free energy profile of IaMD-C3. (E) RMSD of five simulations with aMD-C2 as a function of simulation length (left) and its total distribution (right). (F) RMSD of five simulations with IaMD-C3 as a function of simulation length (left) and its overall distribution (right).

100 thousand of Ne per 1000 thousand samples, while only 0.75 thousand of Ne are yielded under the same conditions with IaMD-C3. For all three fast-folded proteins, the number of effective sampling points Ne of aMD and IaMD simulations along rmsdCα and Rg are shown in Figure S10. The Ne of IaMD simulations is about 3 orders of magnitude higher than that of aMD. Similar to the case of Alanine Dipeptide, the rough ΔVmax(rmsdCα, Rg) surface of aMD simulations for three proteins affect the folding FEPs significantly, but the ΔVmax(rmsdCα, Rg) surface of IaMD simulations has little contribution to the free energy change of these fast-folded proteins (Figure S11). Regardless of the free energy range, the shapes of FEPs of aMD-C2 and IaMD-C3 look similar to each other. On each FEP, the global free energy minimum is located at (1.3 Å, 5.5 Å) corresponding to the native state (denoted as state A). There is another minimum at (2.15 Å, 5.7 Å) which relates to a different beta-hairpin structure (denote it as state B) on the FEP of IaMD but not on the FEP of aMD. The structures of states A and B are shown in Figures 4A and 4B, respectively. In state A, three backbone hydrogen bonds are formed by carbonyl oxygen of residue ASP3 with N−H groups of THR8, carbonyl oxygen of THR8 with N−H groups of ASP3, and carbonyl oxygen of GLY1 with N−H groups of GLY10. Whereas in state B, the backbone hydrogen bonds are established by carbonyl oxygen of ASP3 with N−H groups of GLY7, carbonyl oxygen of GLY7 with N−H groups of ASP3, and carbonyl oxygen of GLY1 with N−H groups of TRP9. To check the correctness of the FEPs, we performed 10 × 2 μs cMD simulations. In all ten simulations, the protein folds into state B. Then, in form state B, the beta-hairpin further folds into state A in five out of ten simulations (Figure S12A). The 2-D (rmsdCα, Rg) folding FEP was calculated using these trajectories. Since there is no repeated folding−unfolding process, the convergence of the FEP cannot be evaluated. Thus, we just make a qualitative comparison of it with the results of aMD-C2 and IaMD-C3. The FEP of IaMD-C3 is more consistent qualitatively with FEP of cMD (Figure S12B) than

test the proposed method. For all three proteins, folding events are observed in long-time scale simulations with aMD and IaMD. The RMSDs of the Cα atoms (rmsdCα), which are relative to the experimentally determined native structures, are found to reach values within 0.2−0.4 Å. In five independent 1 μs simulations with aMD (α = 7.2 kcal/mol and E = 122.8 kcal/mol) for Chignolin, it takes averaging 35 ns to fold the structure. The lowest rmsdCα value observed is 0.21 Å. A total of about 25 obviously folding− unfolding transitions can be found in these simulations, as shown in the left panel of Figure 4E. The parameters α and E were calculated using the method recommended in another work,20 and we denote this aMD protocol as “aMD-C2”. An IaMD protocol (called “IaMD-C3″, see Table S6 for the parameters) with higher acceleration was selected to compare with aMD-C2. In five independent IaMD-C3 simulations with 1 μs, the structure folds into a native state in averaging 25 ns from the unfolded state, and we reproducibly find more than 60 obviously folding−unfolding transitions during total 5 μs simulations (left panel in Figure 4F). The lowest rmsdCα value observed is 0.25 Å. Comparing the total distributions of rmsdCα which are presented in the right panel of Figures 4E and 4F, we can see that IaMD-C3 gives more sampling to nonnative states than aMD-C2. Along rmsdCα and the radius of gyration of backbone Rg, two-dimensional folding FEPs are calculated by reweighting all frames in total 5 μs simulations with aMD-C2 or IaMD-C3. As shown in Figures 4C and 4D, the FEP of IaMD-C3 is much smoother than the result of aMD-C2, even though the former used higher acceleration parameters with data collected every 1.0 ps, while the latter used lower acceleration parameters with more intensive frequency, collecting data every 0.1 ps. The smoothness of FEP of IaMD-C3 is mainly attributed to the distribution characteristics of boost potentials. As shown in the right column of Figure S9, the distributions of ΔV near its maximum value are adequate in IaMD simulations but are sparse in aMD simulations for all three proteins. As a consequence, for Chignolin, the IaMD-C3 simulation produces 1222

DOI: 10.1021/acs.jctc.7b01211 J. Chem. Theory Comput. 2018, 14, 1216−1227

Article

Journal of Chemical Theory and Computation

Figure 5. Folding of Trp-cage. (A) The native structure. (B) The structure of the intermediate state. (C) 2D free energy profile of aMD-T2. (D) 2D free energy profile of IaMD-T3. (E) RMSD of five simulations with aMD-T2 as a function of simulation length (left) and its total distribution (right). (F) RMSD of five simulations with IaMD-T3 as a function of simulation length (left) and its overall distribution (right).

mol and E = 274.86 kcal/mol) for Trp-cage, the structure is folded to the native state in averaging 155 ns, and the rmsdCα relative to the native structure (PDB: 2JOF) decreases to