Integrating Multiple Accelerated Molecular Dynamics to Improve

propose an Integrated accelerated Molecule Dynamics (IaMD) method by integrating a series of. aMD sub-terms with different ..... To efficiently sample...
3 downloads 16 Views 2MB Size
Subscriber access provided by MT ROYAL COLLEGE

Article

Integrating Multiple Accelerated Molecular Dynamics to Improve Accuracy of Free Energy Calculation Xiangda Peng, Yuebin Zhang, Yan Li, Qinglong Liu, Huiying Chu, Dinglin Zhang, and Guohui Li J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.7b01211 • Publication Date (Web): 02 Feb 2018 Downloaded from http://pubs.acs.org on February 3, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Theory and Computation is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Integrating Multiple Accelerated Molecular Dynamics to Improve Accuracy of Free Energy Calculation Xiangda Peng†,‡, Yuebin Zhang†, Yan Li†, QingLong Liu†, Huiying Chu†, Dinglin Zhang† and Guohui Li*,† †Laboratory of Molecular Modeling and Design, State Key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, P. R. China ‡Chinese Academy of Science, University of Chinese Academy Sciences, Beijing 100049, P. R. China

ABSTRACT: Accelerated Molecular Dynamics (aMD) is a promising enhanced sampling method to explore the conformational space of biomolecules. However, the large statistical noise in reweighting limits its accuracy to recover the original free energy profile. In this work, we propose an Integrated accelerated Molecule Dynamics (IaMD) method by integrating a series of aMD sub-terms with different acceleration parameters to improve the sampling efficiency and maintain the reweighting accuracy simultaneously. We use Alanine Dipeptide and three fastfolded proteins (Chignolin, Trp-cage, and Villin Headpiece) as the test objects to compare our IaMD method with aMD systematically. These case studies indicate that the statistical noise of

ACS Paragon Plus Environment

1

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 32

IaMD in reweighting for free energy profiles is much smaller than that of aMD at the same level of acceleration and simulation time. To achieve the same accuracy as IaMD, aMD requires one to three orders of magnitude longer simulation time, depending on the complexity of the simulated system and the level of acceleration. Our method also outperforms aMD in controlling systematic error caused by the disappearance of the low-energy conformations when high acceleration parameters are used in aMD simulations for fast-folded proteins. Furthermore, the performance comparison between IaMD and the Integrated Tempering Sampling (ITS) in the case of Alanine Dipeptide demonstrates that IaMD possesses a better ability to control the potential energy region of sampling.

INTRODUCTION In molecular dynamics (MD) simulations for the biomolecules, sampling for the substantial conformational changes over a long time scale (usually in milliseconds or even longer time scale1) is typically necessary because these movements may be functionally related. However, the present MD simulation time only achieves the scale from hundreds-of-nanoseconds to tensof-microseconds,2-4 posing a challenge for accurate estimation of statistical properties, such as free energy. To improve the sampling efficiency, much effort has been conducted to develop or improve various enhanced sampling algorithms.5-18 Accelerated Molecular Dynamics (aMD) is one of “unconstrained” enhanced sampling that does not depend on the pre-defined collective variable (CV). It can boost the exploration of biomolecular conformational space without the need for a priori guidance about the characteristics of structural change.14 By adding a boost potential, aMD flattens the potential energy surface and thus improves the probability of crossing between different potential energy minimums. Hundreds-of-nanosecond aMD simulations match

ACS Paragon Plus Environment

2

Page 3 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

the millisecond time scale sampling of conventional MD (cMD) for a variety of biomolecules.1922

However, aMD suffers from large statistical noise generated by the “exponential average” reweighting. In this reweighting method, the estimated statistics (such as free energy) are recovered based on the average of reweighting factors 〈exp∆ 〉 ,23 where ∆ is the

boost potential of configuration x which belong to position  of selected CV and  = 1.0⁄ ,

 is the Boltzmann constant and  is the temperature in simulation. The average of reweighting

factors is mainly dominated by the states with high boost ∆. These states, which also have low potential energy, are rare in aMD simulations, and thus cause the fluctuating of the average reweighting factor. Moreover, if some of states with high boost potential exist in the cMD

simulation but “disappear” in aMD simulations, systematic errors in reweighting may also occur. To improve reweighting accuracy, a variety of improvements have been proposed, such as reducing the degree of freedom associated with modified potential energy,21, 24-25 adjusting the form of bias,26-27 using the Maclaurin series or cumulant expansion to replace exponential reweighting,4, 24 replacing exponential reweighting by population-based reweighting in scaled MD,28 and combining Replica Exchange method with aMD.29-31 A critical factor in solving the reweighting problem is to yield sufficient conformations with low potential energy to estimate the average of reweighting factor accurately. In the Integrated Tempering Sampling (ITS) algorithm of Gao,15 the Boltzmann functions at different temperatures are integrated into one expression to facilitate the conformation sampling in a broad range from low to high potential energy. ITS and its variants32-37 have been applied to study protein folding,38-44 molecular clusters,45 path sampling,46 and chemical reactions.47 Thus, in this work, we utilize a similar framework that couple multiple aMDs with varying levels of

ACS Paragon Plus Environment

3

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 32

acceleration together to ensure accurate reweighting. We name this new improvement aMD-type algorithm as the Integrated accelerated Molecule Dynamics (IaMD). It inherits the advantages of both aMD and ITS and avoids the defects of aMD in reweighting. Four systems, Alanine Dipeptide and three fast-folded proteins Chignolin, Trp-cage and Villin Headpiece are exploited to demonstrate parameter determination procedure and to systematically test the accuracy, efficiency, and robustness of IaMD. THEORY AND METHOD Accelerated Molecular Dynamics and its reweighting The boost potential ∆ of aMD is a continuous non-negative function as shown in eq (1).  



, ! >  ∆ =   0, ! ≤ 

(1)

where  is the potential energy of state , E and α are preset tuning parameters which

determine the level of acceleration of ∆. ! serves as threshold energy. When the original

potential  is less than E, the simulation is performed with the effective potential ′ =

 + ∆ . Otherwise, the simulation is performed with the original potential. α is a tuning

parameter that controls the flatness of the effective potential energy. After the simulation biased by aMD, the correct canonical probability distribution is obtained via multiplying the probability of states by a reweighting factor &. & = ' (∆ 

(2)

According to eq (3), the free energies along selected CVs can also be recovered. ) s = −, -ln01  + ln 〈& 〉1 2 + 3

(3)

where  is a point on the CVs and 01  is the number of sampling points at . As described

above, states with different boost potentials have different weights for the calculation of 〈w x 〉.

ACS Paragon Plus Environment

4

Page 5 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

States, whose boost potentials are much smaller than the highest value, give negligible contributions to the average reweighting factor. Here, we use the effective sampling number 06 which proposed by Shen and Hamelberg48 to measure the “equivalent number” in the calculation of 〈w x 〉. It is written as 06  = ∑∈1

8  89:; 1

=

=> 1 〈8  〉> 89:; 1

(4)

where &?@  is the maximum & at . Based on the eq (3) and (4), we can also represent ) s as a function of 06 

) s = −, ln06  − ∆?@  + 3

(5)

According to the theory of Hamelberg, the small 06  reduces the accuracy of

−, ln06  .48 It also leads to an unreliable estimation of ∆?@  .

In a well-equilibrated aMD simulation under NVT or NPT ensemble, the boost potential weakens the contribution of potential energy to the free energy change and thus enhances the contribution of conformational entropy A to free energy change (in the form of – A). As a consequence, states tend to be distributed in high entropy regions in aMD simulation. The conformational entropy of the high potential region is larger than that of the low potential region since the latter usually consists of a few states with specific structural features, especially for biomolecule. Hence, the boost potential of aMD also reduces the sampling for low potential states when the number of low entropy states decreases. As the molecular complexity or acceleration level increasing, the number of low potential states and the effective sampling number 06 decrease, resulting in more statistical noise. If some of states with low potential appear in the cMD simulation but disappear in aMD simulations, systematic errors in reweighting may occur. Integrated accelerated Molecule Dynamics (IaMD)

ACS Paragon Plus Environment

5

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 32

Inspired by ITS,15 we combined multiple aMDs with varying levels of acceleration to improve the sampling efficiency while maintaining a sufficient number of low potential conformations. In the original of ITS, a series of Boltzmann function at different temperature (e.g., C to = ) were integrated to a generalized distribution function to avoid multiple parallel tempering calculation and kinetic energy equilibration problem after each exchange event in Replica Exchange Molecular Dynamics (REMD),15 as shown in eq (6) . ' (

D 

= ∑F EF ' (G  

(6)

where EF is a weight factor which determines the contribution of term H to the total distribution,

F = 1⁄  F , ′ is the modified effective potential. Replace the original potential  with ′, ITS samples the conformations over a wide range of potential energy. Define the ∆F as F ⁄ − 1 × , eq (6) can be rewritten as C

 J =  + ∆ = − ln∑F EF ' (  ∆G  (

(7)

The total boost potential ∆ K is simply C

∆ = − ln∑F EF ' (∆G  (

(8)

eq (8) is a general form equation to combine the boost potentials with different level of accelerations. For example, if we define ∆F as LF C − M where LF is the coupling parameter,

M and C are the two “end-state” Hamiltonians, the eq (8) is actually equivalent to the Integrated

Hamiltonian Sampling (IHS) protocol, which is another flexible and versatile ITS-type method for free energy estimating and conformational sampling.49 In fact, any form of ∆F can be selected for special purposes. In the proposed IaMD method, we select the aMD-type boost potential (eq (1)) as the boost sub-term ∆F to improve the accuracy of reweighting for aMD simulation and share the advantages of ITS and aMD. The original force should be modified by the following equation in IaMD simulations,

ACS Paragon Plus Environment

6

Page 7 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

)

J

= ) ×

∑GNOG  ∙QG ∙6 ∑GNQG ∙6

RST∆UG ; V W

RST∆UG ; V W

(9)

where F K is the scale factor of force in aMD sub-terms, as shown in eq (10). F = 



X

G V , !F >  T    G

G

1.0, !F ≤ 

(10)

Similar to aMD and ITS, IaMD has little extra cost to the computational efficiency. Determine acceleration parameters To efficiently sample conformations with from low to high potential energy, the parameters !F , YF and EF should be determined. The parameter determination process consists of two main

parts. First, select a series of aMD sub-items with different !F , YF . Then, balance the contribution of each aMD sub-item on total boost potential through the parameter EF .

We set the first aMD sub-term as a protocol without any acceleration so that enough lowenergy conformations are generated in IaMD simulations. To this end, a short cMD simulation was carried out, and then !C was set as a value smaller than the minimum potential energy of the

cMD. αC can be set to an arbitrary non-zero value. ∆C is always equal to 0.0 for all frames of IaMD simulations. Then, for simplicity, we set !F of all other aMD sub-terms as the same value, and use != to

refer to this value (N is the number of sub-terms). Since the potential energy beyond != is not flattened, it determines the upper bound of the potential energy range that can be sampled in IaMD simulations. In aMD, the threshold energy ! is set to a value above the average potential

energy of the cMD simulation, based on the number of atoms or residues.50-51 In IaMD, != can also be set to this value or a higher value to achieve higher acceleration.

ACS Paragon Plus Environment

7

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 32

The parameter YF controls the acceleration level of each aMD sub-terms. In separate aMD

simulation, as the parameter Y decrease, the location of potential distribution peak move from low to high potential energy region, as shown in Figure S1A and S1C. It is difficult for the distribution peak to exceed threshold E because the boost potential does not work when the potential energy is greater than E. We find a value of Y that leads the potential distribution peak just approach the location of ! and assign it to Y= of IaMD. Supposing the lowest potential

energy in the IaMD simulation approach the !?FQ of cMD, the largest ∆F of aMD sub-term H is expressed as a function of YF ,   

∆F,?@ = G 9G[     G

G



9G[

(11)

As mentioned earlier, ∆C,?@ = 0 and ∆=,?@ can be calculated using eq (11) and the

determined != , Y= . We set ∆X,?@ , ∆\,?@ , …, and ∆=C,?@ as an arithmetic sequence from 0 to ∆=,?@ and hope to increase the acceleration level of sub-terms evenly. Then YX ,

Y\ , …, and Y=C can be determined through the eq (12). YF =

G  9G[  ∆G,9:;

− !F + !?FQ (12)

Hereafter, a series of short aMD simulations with the determined parameters (YF , !F ) are

implemented respectively to check the potential energy distributions and to obtain the EF value of each sub-term. It is preferable that the potential energies from the short aMD simulations with a series of !F , YF are evenly distributed over a broad interval (as schematically illustrated in the Figure S1A and S1C in the Supporting Information) to yield a homogeneous total distribution (the solid green line in Figure S1B and S1D). However, for the system in low dimensions, such as the torsional potential term of Alanine Dipeptide, these distributions of potential energies have a lot of overlaps in the low energy area. To solve this problem, we use the cubic spline

ACS Paragon Plus Environment

8

Page 9 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

interpolation algorithm to recalculate the values of YX , Y\ , …, and Y=C , based on the results of performed aMD simulations. The details of the implementation are described in Section 2 (step 7) of Supporting Information. The weight parameter EF in IaMD controls the contribution of the aMD sub-terms to the total

boost potential ∆ in IaMD. In ITS,15, 32 these parameters are determined by an iterative

procedure. In IHS49 and the work of Yang52, the weighted histogram analysis method (WHAM) is employed to facilitate the optimization of these parameters. In the approach of Lu and collaborators 53, the weight parameters can be obtained directly according to the intersections of potential distribution curves (or the midpoint of potential canonical averages) of adjacent two separate aMD simulations, avoiding the iterative process (the method is referred to as "direct method" in the following). All these methods are used to calculate the weight parameters of IaMD for Alanine Dipeptide. The results obtained by different methods are consistent with each other. (Section 1 of Supporting Information) Hence, the direct method using the intersections of potential distribution curves is selected for all cases in this work. Here is a brief introduction. For convenience, define ]F as − ln EF , and the eq (8) can be derived as C

∆ = − ln∑F ' (^∆G  _G ` (13) (

Denote the potential energy at the intersection of two adjacent potential distribution curves from  aMD simulations with (!F , YF ) and (!F , YFC ) as F,FC (the locations of black dot shown in Figure

S1A and S1C). To balance the contributions to the total boost potential of IaMD, ∆F and ∆FC  can be set to the same value when the potential energy is equal to F,FC . Accordingly, the

relationship between ]F and ]FC can be established, ]F +

; G G,Gab



; G G G,Gab

= ]FC + 

; Gab G,Gab



; Gab Gab G,Gab

(14)

ACS Paragon Plus Environment

9

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 32

or G  ;

G,Gab ]FC = ]F +    ; G

G





G,Gab



; Gab G,Gab



; Gab Gab G,Gab

(15)

Set ]C as 0.0, and then, all other ]F can be determined recursively by eq (15). Since ∆C and ]C

are always equal to 0.0 and exp− ∆C + ]C ≡ 1.0, the sum of exp− ∆F + ]F

must be greater than 1.0. According to the eq (13), ∆ is a negative value. For Alanine Dipeptide

and Chignolin, the energy distributions of IaMD with the generated parameters are shown as the red lines of Figure S1B and Figure S1D respectively. As expected, they are similar to the total distribution of all separated aMD simulations. The parameters of IaMD for Alanine Dipeptide and Chignolin are listed in Table S2 and S6 respectively. The entire parameter determination process is summed up in eight steps and listed in Section 2 of Supporting Information to facilitate the implementation. We also present a simplified version to reduce the subjective component of the parameter determination process. Simulation For all simulations with aMD, IaMD or ITS in this work, the boost potential was applied to all dihedrals of protein. In the case of Alanine Dipeptide, we respectively set up seven aMD protocols and seven IaMD protocols with different accelerations. These aMD parameters that refer to aMD-A1, aMD-A2, …, aMD-A7 respectively are listed in Table S1. The parameters of IaMD that denoted as IaMD-A1, IaMD-A2,…, IaMD-A7 respectively are listed in Table S2. Eight independent 300 ns simulations were performed with each level aMD and IaMD. For Chignolin and Trp-cage, three levels of aMD or IaMD protocols (the parameters are listed Table S5 and S6) were employed respectively to accelerate the folding process of them. For the folding of Villin Headpiece, only one aMD or IaMD protocol was selected. Five independent 1000 ns simulations were carried out with each aMD or IaMD protocol for these fast-folded proteins. The

ACS Paragon Plus Environment

10

Page 11 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

total simulation time of all system exceeds 100 µs. The details of the simulation are described in Section 3 of Supporting Information. All Figures in this paper are generated using 2D graphics package Matplotlib54 of Python. The images of the protein structures in Figure 4, 5, and 6 are prepared using the VMD program.55 RESULTS AND DISCUSSION Alanine Dipeptide

Figure 1. Comparing the results of simulations with aMD (top), IaMD (middle), and ITS (bottom) for Alanine Dipeptide. (A) The distributions of torsional potential energies from seven levels aMD, IaMD, or ITS simulations; (B) The standard deviation of free energy differences between αL-helical and extended β-sheet regions as a function of the average coverage; (C) The standard deviation of free energy differences between TS1 and extended β-sheet state as a function of the mean coverage. The marked points in (B) and (C), from left to right, represent the protocols from aMD-A1 to aMD-A7, IaMD-A1 to IaMD-A7 or ITS-A1 to ITS-A7 respectively. The standard deviation and mean coverage of each protocol comes from eight independent 300ns simulations with data collected every 0.1 ps. To count the coverage, φ-ψ space was divided into 20×20 blocks with the same size. The coverage can be calculated by dividing the number of sampled blocks by the total number of blocks.

Alanine Dipeptide is a classic model system to evaluate the performance of enhanced sampling algorithms. Its Free Energy Profile (FEP) can be appropriately described by the dimensions dihedral φ and ψ. We first use this system to compare the efficiency and accuracy of aMD and IaMD. Seven levels of aMD or IaMD parameters were selected to accelerate conformational changes of Alanine Dipeptide. As Y of aMD or Y= of IaMD decreases, the conformations with ACS Paragon Plus Environment

11

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 32

higher torsional potential energy can be sampled (Figure 1A). However, the shapes of torsional potential distributions obtained by IaMD and aMD are significantly different. In IaMD simulations, the torsional potential energies are evenly distributed over a wide range from low to high energy. However in aMD simulations, the torsional potential distributions are more concentrated in some regions, and as the distribution moves toward the high-energy region, the number of conformations with low torsional potential energy is significantly reduced. The difference of potential distributions leads to the difference of sampling efficiency and reweighting accuracy. We count the average coverage over ϕ−ψ space of eight simulations to measure the sampling efficiency and calculate the standard deviations of two free energy differences (∆)e ( and ∆)fgC( ) of eight parallel simulations to check the accuracy for each

aMD or IaMD protocol. ∆)e( is the free energy difference between αL-helical region (simplify denoted as αL) and global minimum (the extended β-sheet region, simplify denoted as β) on 2-D ϕ−ψ FEP of Alanine Dipeptide. It is used to check the free energy calculation performance of

minimum states. ∆)fgC( , the free energy difference between TS1 and β, is used to examine the

free energy calculation for transition states. The locations of β, αL, and TS1 are marked in the Figure 2B. Figure 1B and 1C respectively show the standard deviations of ∆)e( and ∆)fgC( as a function of the average coverage at each acceleration level with aMD or IaMD. From Figure 1B, we can see that the standard deviation of ∆)e ( obtained from IaMD simulations is significantly less than that of aMD simulations under similar average coverage. As the average coverage exceeds 90%, the standard deviation of ∆)e( increases rapidly in aMD simulation, but maintains a small value (below 0.1 kcal/mol) in IaMD simulation. The standard deviations of ∆)fgC( are larger than that of ∆)e( at same coverage in the case of aMD and IaMD, as

ACS Paragon Plus Environment

12

Page 13 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

shown in Figure 1C. The performance of IaMD is still super than aMD. When the average coverage exceeds 95%, the standard deviation which is about 0.8 kcal/mol in simulations with IaMD and exceeds 1.25 kcal/mol in simulations with aMD. Using the eq (3), the 2-D ϕ−ψ FEPs of Alanine Dipeptide are estimated from single 300 ns simulation with aMD-A6 (Figure 2A) and IaMD-A7 (Figure 2B) respectively. The level of acceleration of aMD-A6 (with 95.6% coverage) and IaMD-A7 (with 96.0% coverage) are high and are similar to each other. Contour lines of FEP from aMD-A6 simulation are significantly rough than the results of IaMD-A7. In spite of the noise, the two FEPs are consistent with the result of 5us cMD (Figure S6A) or previous simulation results.24 In total 5 µs cMD simulations, the coverage was up to 75%. While in 300 ns simulation with IaMD-A7 75% ϕ-ψ space was sampled in merely 8 ns. The acceleration factor is about 500 times (Figure S6B).

Figure 2. Free energy profiles of Alanine Dipeptide. (A) the result obtained from single aMD-A6 simulation; (B) the result obtained from single IaMD-A7 simulation. The FEPs are calculated from 300 ns simulations with data collected every 0.1 ps. To give a clear distinction of free energy for the low-energy regions, the range of free energy represented by different color is 0 to 7.2 kcal/mol and the regions with free energy greater than 7.2 kcal/mol are represented as red. Contour lines are every 0.36 kcal/mol. All ϕ−ψ plots are sorted in a bin size of 7.2°×7.2°.

The outperformance of IaMD is mainly attributed to the characteristics of boost potential distributions. Figure 3A and 3B show the distributions of boost potential from the states at β the region. The number of conformations whose boost potential near the highest value is the most in IaMD simulations, but the least in aMD simulations. According to the eq (5), The accuracy of

ACS Paragon Plus Environment

13

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 32

the free energy ) s is controlled by 06  and ∆?@  . The boost potential distribution of

IaMD helps to get large 06 . Figure 3C and 3D show the effective sampling numbers along ϕ−ψ

space in single 300 ns aMD-A6 and IaMD-A7 simulation respectively. The 06 h, i in IaMDA7 is about three orders of magnitude larger than that in aMD-A6 at same locations. At the bin of β, the 06 h, i of aMD-A6 simulation is about 17 (06 ⁄01 is about 0.013%), much smaller than the value at same location in IaMD-A7 simulation (06 ≈ 24597 and 06 ⁄01 ≈ 15%). In

other words, to obtain the same effective sampling number 06 , the sampling number 01 in aMDA6 simulation need about 1000 times that in IaMD-A7 simulation. Since aMD simulation yields very few states with high boost potential, the fluctuations of the estimated ∆?@ is also much

larger than that in IaMD. As shown in Figure 3E and 3F, the ∆?@ h, i surface of aMD-A6 is

significantly rougher than that of IaMD-A7. Furthermore, the ∆?@ h, i of aMD-A6 changes significantly with the coordinate h or i change, thus, its roughness has a great effect on the

smoothness of the free energy profile. However, the free energy of IaMD-A7 is mainly affected by 06 h, i because the ∆?@ h, i variation of IaMD within the range of 0.5 kcal/mol at most of the low free energy regions. It means that the regions with lower free energy can get more effective sampling points in IaMD simulation.

ACS Paragon Plus Environment

14

Page 15 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 3. Comparison of aMD (top) and IaMD (bottom) for Alanine Dipeptide. (A) Boost potentials in the simulations with seven level aMD for Alanine Dipeptide; (B) boost potentials in the simulations with seven level IaMD for Alanine Dipeptide; (C) the number of effective sampling points in single 300 ns simulation with aMD-A6 as a function of ϕ and ψ; (D) the number of effective sampling points in single simulation with IaMD-A7 as a function of ϕ and ψ; (E) the maximum boost potential in aMDA6 simulation as a function of ϕ and ψ; (F) the maximum boost potential in IaMD-A7 simulation as a function of ϕ and ψ. Color bar of (C) and (D) represents the order of magnitude of the effective sampling point numbers. The unit for the color bar of (E) and (F) is kcal/mol. All ϕ−ψ plots are sorted in a bin size of 7.2°×7.2°.

The large 06 and the small fluctuation of ∆?@ in IaMD simulations are beneficial to the accuracy of free energy. These properties are mainly derived from the sufficiently low potential states. Apart from this factor, lots of ∆ are close to the maximum value 0.0 in a considerable low torsional energy region that is dominated by the first aMD sub-term of IaMD, as illustrated in Figure S3. It is because that ∆C is always equal to 0 and the contributions from other sub-

items are very small in this region. That is to say, ∆ is not sensitive to potential energy change in the low potential energy region. To get more 06 in aMD simulations, a parameter with low degree of acceleration should be

selected, and the data used in reweighting should be frequently collected. But in IaMD, we can use very high acceleration parameters and avoid high collection frequency without the

ACS Paragon Plus Environment

15

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 32

considering the noise problem. More detailed discussions and presentations are given in Section 4 of Supporting Information. IaMD vs ITS We also compared the performance of IaMD and ITS in free energy calculation for Alanine Dipeptide. The sub-terms with high-temperature in ITS can be viewed as a scaling of the potential energy function by a constant less than 1.0.56 It is essentially a scaled MD similar to the method that presented in a previous report.28 So the comparison between ITS and IaMD can be seen as a comparison of scaled MD and aMD. The essential difference between aMD and scaled MD is that aMD can control the range of boost potential energy through the threshold !. Boost

potential works only for the potential energies below the parameter !, and the potential energy is mainly distributed in regions lower than E in aMD simulation. Hence, compared with ITS, IaMD can better control the range of the sampled potential energy and focus more on the important energy region. Take Alanine Dipeptide as an example. Five levels of ITS parameters were selected to accelerate conformational changes of Alanine Dipeptide. These parameters are generated using a procedure consistent with that used for IaMD, listed in the Table S4. For ITS and IaMD protocol with similar acceleration, such as ITS-A5 (with 96.4% coverage) and IaMD-A7 (with 96.0% coverage), the range of torsional potential energy in the simulation with former is broader than that with latter. At the same time, the number of sampling for low torsional potential conformations with former is smaller than that with latter, as shown in middle and bottom columns in Figure 1A. As a consequence, the number of effective sampling points for ITS-A5 and IaMD-A7 is different (Table 1). In the region of low free energy, such as β and αL, the 06 of IaMD-A7 is much more than that of ITS-A5. In the region of high free energy, such as TS1, the

ACS Paragon Plus Environment

16

Page 17 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

06 of IaMD-A7 and ITS-A5 are significantly reduced. The former is still more than the latter. In

the region with higher free energy such as TS2 (its location is marked in Figure 2B), the 06 of IaMD-A7 is less than that of ITS-A5. Table 1. The average effective sampling numbers of IaMD-A7 and ITS-A5 simulations at different regions of the free energy profile.

region

IaMD-A7

ITS-A5

β

2941.14

1885.31

αL

246.95

165.01

TS1

1.47

1.29

TS2

1.13

1.44

The decrease in the number of low potential states increases the statistical noise in reweighting for free energy when lacking enough effective sampling points in specific positions of the selected reaction coordinate. However, for the case of Alanine Dipeptide, the difference in 06 did not significantly affect the accuracy of free energy calculations. As shown in Figure 1B, due to the still enough effective sampling points, the fluctuations of ∆)e( from ITS simulations are approximately equal to that in IaMD simulations but smaller than that in aMD simulations. The standard deviations of ∆)fgC( in ITS simulations with the high level of acceleration are slightly larger than the results in IaMD simulations with similar sampling efficiency, although the standard deviations are still better than that in aMD simulations (Figure 1C). Overall, there is no significant difference in the free energy profiles between ITS-A5 (Figure S8A) and IaMD-A7. Since the main goal of this paper is to analyze and solve the errors in reweighting, we do not give a more systematic comparison of IaMD and ITS for other cases. Fast-Folded Proteins Fast-folded proteins serve as an ideal model to investigate protein folding or to test sampling algorithms.57 These proteins possess the small size and fold on the microsecond time scale

ACS Paragon Plus Environment

17

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 32

usually. From an unstructured state to a well-defined 3-D conformation, fast-folded proteins undergo substantial conformational changes, posing a big challenge for conformational sampling. We select three fast-folded proteins, Chignolin, Trp-cage, and Villin headpiece, to test the proposed method. For all three proteins, folding events are observed in long-timescale simulations with aMD and IaMD. The RMSD of the Cα atoms (Kqrs ) which are relative to the experimentally determined native structures are found to reach values within 0.2-0.4 Å.

Figure 4. Folding of Chignolin. (A) The native structure; (B) The structure of intermediate state; (C) 2D free energy profile of aMD-C2; (D) 2D free energy profile of IaMD-C3; (E) RMSD of five simulations with aMD-C2 as a function of simulation length (left) and its total distribution (right); (F) RMSD of five simulations with IaMD-C3 as a function of simulation length (left) and its overall distribution (right).

In five independent 1 µs simulations with aMD (α=7.2 kcal/mol and E=122.8 kcal/mol) for Chignolin, it takes averaging 35 ns to fold the structure. The lowest Kqrs value observed is 0.21 Å. Total about 25 obviously folding-unfolding transitions can be found in these simulations, as shown in the left panel of Figure 4E. The parameters α and E were calculated using the method recommended in another work20, and we denote this aMD protocol as“aMD-C2”. An IaMD protocol (called “IaMD-C3”, see Table S6 for the parameters) with higher acceleration was selected to compare with aMD-C2. In five independent IaMD-C3 simulations with 1 µs, the

ACS Paragon Plus Environment

18

Page 19 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

structure folds into native state in averaging 25 ns from the unfolded state, and we reproducibly find more than 60 obviously folding-unfolding transitions during total 5 µs simulations (left panel in Figure 4F). The lowest Kqrs value observed is 0.25 Å. Comparing the total

distributions of Kqrs which are presented in the right panel of Figure 4E and 4F, we can see that IaMD-C3 give more sampling to nonnative states than aMD-C2. Along Kqrs and the radius of gyration of backbone tu , 2-dimensional folding FEPs are calculated by reweighting all frames in total 5 µs simulations with aMD-C2 or IaMD-C3. As shown in Figure 4C and 4D, the FEP of IaMD-C3 is much smoother than the result of aMD-C2, even though the former used higher acceleration parameters with data collected every 1.0 ps while the latter used lower acceleration parameters with more intensive frequency, collecting data every 0.1 ps. The smoothness of FEP of IaMD-C3 is mainly attributed to the distribution characteristics of boost potentials. As shown in the right column of Figure S9, the distributions of ∆ near its maximum value are adequate in IaMD simulations but are sparse in aMD simulations for all three proteins. As a consequence, for Chignolin, the IaMD-C3 simulation

produces 100 thousand of 06 per 1000 thousand samples, while only 0.75 thousand of 06 are yielded under the same conditions with IaMD-C3. For all three fast-folded proteins, the number of effective sampling points 06 of aMD and IaMD simulations along Kqrsv and tu are shown

in Figure S10. The 06 of IaMD simulations is about 3 orders of magnitude higher than that of

aMD. Similar to the case of Alanine Dipeptide, the rough ∆?@ Kqrs , tu surface of aMD

simulations for three proteins affect the folding FEPs significantly, but the ∆?@ Kqrs , tu surface of IaMD simulations has little contribution to the free energy change of these fast-folded

proteins (Figure S11).

ACS Paragon Plus Environment

19

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 32

Regardless of the free energy range, the shape of FEPs of aMD-C2 and IaMD-C3 look similar to each other. On each FEP, the global free energy minimum is located at (1.3Å, 5.5 Å) corresponding to the native state (denoted as state A). There is another minimum at (2.15Å, 5.7 Å) which relates to a different beta-hairpin structure (denote it as state B) on the FEP of IaMD but not on the FEP of aMD. The structures of state A and B are shown in the Figure 4A and 4B respectively. In state A, three backbone hydrogen bonds are formed by carbonyl oxygen of residue ASP3 with N−H groups of THR8, carbonyl oxygen of THR8 with N−H groups of ASP3, and carbonyl oxygen of GLY1 with N−H groups of GLY10. Whereas in state B, the backbone hydrogen bonds are established by carbonyl oxygen of ASP3 with N−H groups of GLY7, carbonyl oxygen of GLY7 with N−H groups of ASP3, and carbonyl oxygen of GLY1 with N−H groups of TRP9. To check the correctness of the FEPs, we performed 10×2µs cMD simulations. In all ten simulations, the protein folds into the state B. Then, form state B, the beta-hairpin further folds into the state A in five out of ten simulations (Figure S12A). The 2-D (Kqrs , tu ) folding FEP was calculated using these trajectories. Since there is no repeated folding-unfolding process, the convergence of the FEP cannot be evaluated. Thus we just make a qualitative comparison of it with the results of aMD-C2 and IaMD-C3. The FEP of IaMD-C3 is more consistent qualitatively with FEP of cMD (Figure S12B) than aMD-C2. The free energy range in cMD is roughly equivalent to that in IaMD. Two minimums which correspond to state A and B are also clearly present in the FEP of cMD. Both the FEP and the RMSD curves of cMD reveal that the B state is an indispensable intermediate for folding into native structure. As we reduce the parameter E from 122.8 kcal/mol of aMD-C2 to 98.9 kcal/mol of aMD-C1, the distribution of torsional potential shifts to lower energy region remarkably (the upper left

ACS Paragon Plus Environment

20

Page 21 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

corner of Figure S9). There are only seven conversions between the folded and unfolded states in five RMSD curves (Figure S13C), indicating that the acceleration level of aMD-C1 is lower than aMD-C2. On the FEP estimated from total 5 µs aMD-C1simulations, the contour of state B minimum is clearly presented, as shown in Figure S13A. In contrast, when the parameter E increases from 122.8 kcal/mol of aMD-C2 to 146.7 kcal/mol of aMD-C3, the torsional potential energy of sampled conformations becomes higher than that in aMD-C2 (Figure S9). The sampling of aMD-C3 is more efficient that aMD-C2 (Figure S13D). However, it is difficult to recognize the contour of state B minimum on the rough FEP of aMD-C3 (Figure S13B). Comparing the results of aMD-C1, aMD-C2, and aMD-C3, we can find that the range of free energies becomes larger as the level of acceleration increasing. Looking at the results of all aMD simulations for Chignolin, we can conclude that the estimated FEP is dependent on the boost parameters of aMD. It is caused by systematic errors in reweighting. We will discuss the topic in detail in the next section. The Kqrs curves (in Figure 4D, S14C, and S14D) and torsional potential energy distributions of simulations (in Figure S9) with three IaMD protocols (IaMD-C1, IaMD-C2, and IaMD-C3) indicate that the differences between acceleration levels of them are obvious. Unlike aMD, the obtained folding FEPs of these IaMD protocols are broadly in line with each other, the contour lines of these FEPs are smooth, and the minimums that correspond to the state A and B are apparently distinguished in Figure S14A and S14C. As shown in the left panel of Figure 5E, in the five independent 1 µs simulations with aMD-T2 (α=14.34 kcal/mol and E=274.86 kcal/mol) for Trp-cage, the structure is folded to native state in averaging 155 ns and the Kqrs relative to the native structure (PDB: 2JOF) decreases to < 1.8 Å total about 15 times from unfolding structure. The lowest RMSD value observed is 0.36 Å. In the five independent 1 µs simulations with IaMD-T3 whose acceleration is higher than that of

ACS Paragon Plus Environment

21

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 32

aMD-T2, the protein is folded to native state from unfolded structure in averaging 115 ns and the lowest Kqrs value observed is 0.4 Å. More than 35 obviously folding−unfolding transitions were observed in five trajectories, as shown in the left panel of Figure 5F. Similar to the case of Chignolin, IaMD-T3 sample the nonnative states more adequately than aMD-T2 simulations (right panel of Figure 5E and 5F).

Figure 5. Folding of Trp-cage. (A) The native structure; (B) The structure of intermediate state; (C) 2D free energy profile of aMD-T2; (D) 2D free energy profile of IaMD-T3; (E) RMSD of five simulations with aMD-T2 as a function of simulation length (left) and its total distribution (right); (F) RMSD of five simulations with IaMD-T3 as a function of simulation length (left) and its overall distribution (right).

Using all frames that collected every 0.1 ps from five independent aMD-T2 simulations, we calculated the 2-D (Kqrs , tu ) FEP and compared it to the result of IaMD-T3 simulations with data collected every 1.0 ps (Figure 5C and 5D). As expected, the FEP of IaMD-T3 is much smoother than that of aMD-T2. The global minimum for the two FEPs is at coordinates (1.5Å, 7.0 Å), representing the native state of Trp-cage. Another free energy minimum is presented at the position about (5.2Å, 7.6 Å) of FEP of IaMD-T3 (Figure 5D and S17). This is consistent with the previous REMD simulation results.58 We cannot find clear contours of a minimum at the

ACS Paragon Plus Environment

22

Page 23 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

same position on the free energy profile of aMD-T2. The folded state and the representative structure of minimum (5.2Å, 7.6 Å) were shown in Figure 5A and 5B respectively. The FEPs of aMD simulations with different level of acceleration are sensitive to parameter changes (Figure 5 and S15) while the results of IaMD are not significantly affected by the changing of boost parameters (Figure 5 and S16).

Figure 6. Folding of Villin Headpiece. (A) The native structure; (B) 2D free energy profile of aMD-V (not yet convergent); (C) 2-D free energy profile of IaMD-V (not yet convergent); (D) RMSD of five simulations with aMD-V as a function of simulation length; (E) RMSD of five simulations with IaMD-V as a function of simulation length.

In five independent 1 µs simulations with aMD-V (listed in Table S5) for Villin Headpiece, the structure folds into the native state in only one trajectory with 0.46 Å of Kqrs relative to the PDB structure 2F4K. However, the protein folds into the native state successfully in two of five IaMD-V simulations with 1 µs. The lowest Kqrs value observed is 0.46 Å. The 2-D (Kqrs , tu ) FEP is obtained from the simulations that folded to the native structure with aMD-V or

IaMD-V (Figure 6D and 6E). Note that they are not convergent yet because there is no repeated folding-unfolding process in the simulations. Here, we just compare the smoothness between

ACS Paragon Plus Environment

23

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 32

these FEPs. As shown in Figure 6B and 6C, the performance of IaMD is still superior to aMD regarding statistical noise control. The Systematic error in reweighting of aMD In the previous report of Jing and Sun,59 the second order cumulant expansion reweighting method cannot restore the potential energy distributions correctly when an inappropriate boost potential is selected. One important reason is that the sampled potential energy region in aMD simulation may be different from that in cMD simulation. If the difference is significant, the “exponential average” reweighting method may also get the wrong results. We use a simple twodimension model to show the possible systematic error in “exponential average” reweighting.

Figure 7. Schematic plots of standard sampling and enhanced sampling for a simple 2-D model. (A) Good enhanced sampling; (B) inadequate enhanced sampling. The projections on the x-y plane represent the sampled conformational spaces. The height of the surface (and the heights of the projection curve in the x-z or y-z planes) represents the distribution density.

The red surface in Figure 7A represents the canonical distribution of model system from a cMD simulation. If an enhanced sampling technique is used to sample larger conformation space with the same amount of simulation, the resulting conformational distribution is biased, and distinct from the canonical distribution. Then the reweighting process can restore the unbiased probability of the sampled conformations, on the premise that the enhanced sampling space covers the sampling space of cMD, just like the green surface in Figure 7A. However, if the

ACS Paragon Plus Environment

24

Page 25 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

enhanced sampling space overlaps only with parts of the conformational space of cMD, as illustrated in Figure 7B, the relative probabilities between conformations are correctly estimated just in overlapping regions of projections in the x-y plane. We cannot calculate the probability for other conformations in cMD (red circular region without overlapping in the x-y plane), no matter what reweighting scheme is used. By the same token, if some low potential conformations that appear in the cMD simulation disappear in aMD simulations, a systematic error occurs in the estimation of free energy when the weight of these conformations are not negligible for reweighting.

Figure 8. Reweighting results of torsional potential energy distribution. (A) the results of simulations with aMD-A6 or IaMD-A7 for Alanine Dipeptide; (B) the results of simulations with aMD-C1 and IaMD-C3 for Chignolin; (C) the results of simulations with aMD at a lower level of acceleration for Chignolin.

A simple way to determine the existence of systematic errors is to check the reweighted potential energy distribution.59 For example, in the Alanine Dipeptide case, the conformations with low torsional potentials are sampled both in aMD and IaMD simulations. After reweighting, the distribution of torsional potential energy of aMD or IaMD simulations is not different from the torsional potential distribution of cMD (Figure 8A). However, in Chignolin simulation, a more complex case, the smallest torsional potential energy with aMD-C1 is about 78 kcal/mol. It

ACS Paragon Plus Environment

25

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 32

is bigger than the value in cMD simulation (69 kcal/mol). There are about 20% conformations of cMD whose torsional potential energy less than 78 Kcal/mol. These “missing” conformations have large reweighting factors and cause the inconsistencies between the reweighted torsional potential distribution and the cMD results, as shown in Figure 8B. In all the aMD simulations for the fast-folded proteins, the distributions of torsional potential cannot be correctly restored (Figure S18). As the level of acceleration or system complexity increases, more low-energy conformations disappear in simulation, and a more severe systematic error occurs. To avoid this, we need to select the aMD parameters with large Y or small ! to keep the sampling for the low energy region, such as the parameter represented by pink lines in the Figure 8C. From a practical point of view, the acceleration with this parameter is inefficient. Since IaMD keeps enough sampling in low-energy region, it is not plagued by this kind of errors. Whether for Alanine Dipeptide or for fast-folded proteins, the energy distributions are reweighted properly. CONCLUSION As exemplified in the four examples of this paper, the application of the Integrated accelerated Molecule Dynamics increases the efficiency of the sampling significantly, while maintaining the sampling of low energy conformations. The statistical noise in reweighting for free energy calculation can be eliminated or substantially reduced due to the large effective sampled point number in IaMD simulations. The proposed method can fold three small proteins Chignolin, Trpcage and Villin Headpiece into the native structures efficiently and estimated the reliable folding free energy profiles for Chignolin and Trp-cage. Then, we analyzed the possible systematic error caused by the disappearance of the low energy conformations in aMD. IaMD avoids this problem due to its uniform sampling in broad energy regions. These examples also demonstrate the robustness of IaMD parameters determined by a fast and simple procedure introduced in this

ACS Paragon Plus Environment

26

Page 27 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

paper. Compared to ITS, the selective acceleration ability of aMD allows IaMD to control the range of sampled potential energy more flexible and thus to focus on the important energy regions. Accordingly, IaMD inherits the advantages of both aMD and ITS. It can be added to the dynamics simulation programs easily and has little effect on the simulation efficiency. Although the dihedral term is selected as the boost potential in this work, the procedure can be easily extended to any form of boost potentials, such as dual boost or rotatable dihedral boost, for different purposes. The accuracy, efficiency, and robustness of IaMD make it very promising for simulating complex biomolecules. ASSOCIATED CONTENT Supporting Information The procedure of parameter determination of IaMD, simulation details, all boost parameters of aMD and IaMD, and additional discussion and presentation for the Alanine Dipeptide and three fast-folded proteins. AUTHOR INFORMATION Corresponding Author *E-mail:[email protected]. Author Contributions X.P. and Y.Z. contributed equally to this work. Notes The authors declare no competing financial interest. ACKNOWLEDGMENT This work is supported by grants from the National Natural Science Foundation of China under Contracts No. 21573217, No. 21625302, No. 31700647 and No. 91430110.

ACS Paragon Plus Environment

27

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 32

REFERENCES (1) Henzler-Wildman, K.; Kern, D., Dynamic personalities of proteins. Nature 2007, 450 (7172), 964-972. (2) Shaw, D. E.; Maragakis, P.; Lindorff-Larsen, K.; Piana, S.; Dror, R. O.; Eastwood, M. P.; Bank, J. A.; Jumper, J. M.; Salmon, J. K.; Shan, Y. B.; Wriggers, W., Atomic-Level Characterization of the Structural Dynamics of Proteins. Science 2010, 330 (6002), 341-346. (3) Lane, T. J.; Shukla, D.; Beauchamp, K. A.; Pande, V. S., To milliseconds and beyond: challenges in the simulation of protein folding. Curr. Opin. Struct. Biol. 2013, 23 (1), 58-65. (4) Miao, Y.; Feher, V. A.; McCammon, J. A., Gaussian Accelerated Molecular Dynamics: Unconstrained Enhanced Sampling and Free Energy Calculation. J. Chem. Theory Comput. 2015, 11 (8), 3584-3595. (5) Torrie, G. M.; Valleau, J. P., Non-physical sampling distributions in monte-carlo free-energy estimation - umbrella sampling. J. Comput. Phys. 1977, 23 (2), 187-199. (6) Laio, A.; Parrinello, M., Escaping free-energy minima. Proc. Natl. Acad. Sci. U. S. A. 2002, 99 (20), 12562-12566. (7) Darve, E.; Rodriguez-Gomez, D.; Pohorille, A., Adaptive biasing force method for scalar and vector free energy calculations. J. Chem. Phys. 2008, 128 (14). (8) Dellago, C.; Bolhuis, P. G.; Csajka, F. S.; Chandler, D., Transition path sampling and the calculation of rate constants. J. Chem. Phys. 1998, 108 (5), 1964-1977. (9) Cao, L. R.; Lv, C.; Yang, W., Hidden Conformation Events in DNA Base Extrusions: A Generalized-Ensemble Path Optimization and Equilibrium Simulation Study. J. Chem. Theory Comput. 2013, 9 (8), 3756-3768. (10) Gong, L. C.; Zhou, X., Structuring and sampling complex conformation space: Weighted ensemble dynamics simulations. Phys. Rev. E 2009, 80 (2). (11) Schlitter, J.; Engels, M.; Kruger, P.; Jacoby, E.; Wollmer, A., Targeted MolecularDynamics Simulation of Conformational Change - Application to the T[--]R Transition in Insulin. Mol. Simul. 1993, 10 (2-6), 291-&. (12) Grubmüller, H., Predicting slow structural transitions in macromolecular systems: Conformational flooding. Phys. Rev. E 1995, 52 (3), 2893-2906. (13) Hansmann, U. H. E., Parallel tempering algorithm for conformational studies of biological molecules. Chem. Phys. Lett. 1997, 281 (1-3), 140-150. (14) Hamelberg, D.; Mongan, J.; McCammon, J. A., Accelerated molecular dynamics: A promising and efficient simulation method for biomolecules. J. Chem. Phys. 2004, 120 (24), 11919-11929. (15) Gao, Y. Q., An integrate-over-temperature approach for enhanced sampling. J. Chem. Phys. 2008, 128 (6), 064105. (16) Miao, Y.; McCammon, J. A., Unconstrained enhanced sampling for free energy calculations of biomolecules: a review. Mol. Simul. 2016, 42 (13), 1046-1055. (17) Zheng, L. Q.; Chen, M. G.; Yang, W., Simultaneous escaping of explicit and hidden free energy barriers: Application of the orthogonal space random walk strategy in generalized ensemble based conformational sampling. J. Chem. Phys. 2009, 130 (23). (18) Zheng, L. Q.; Chen, M. G.; Yang, W., Random walk in orthogonal space to achieve efficient free-energy simulation of complex systems. Proc. Natl. Acad. Sci. U. S. A. 2008, 105 (51), 20227-20232.

ACS Paragon Plus Environment

28

Page 29 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(19) Singh, R.; Ahalawat, N.; Murarka, R. K., Activation of Corticotropin-Releasing Factor 1 Receptor: Insights from Molecular Dynamics Simulations. J. Phys. Chem. B 2015, 119 (7), 2806-2817. (20) Miao, Y.; Feixas, F.; Eun, C.; McCammon, J. A., Accelerated molecular dynamics simulations of protein folding. J. Comput. Chem. 2015, 36 (20), 1536-1549. (21) Doshi, U.; Hamelberg, D., Achieving Rigorous Accelerated Conformational Sampling in Explicit Solvent. J. Phys. Chem. Lett. 2014, 5 (7), 1217-1224. (22) Yang, C.-Y.; Delproposto, J.; Chinnaswamy, K.; Brown, W. C.; Wang, S.; Stuckey, J. A.; Wang, X., Conformational Sampling and Binding Site Assessment of Suppression of Tumorigenicity 2 Ectodomain. PLoS One 2016, 11 (1), e0146522. (23) Markwick, P. R. L.; McCammon, J. A., Studying functional dynamics in bio-molecules using accelerated molecular dynamics. Phys. Chem. Chem. Phys. 2011, 13 (45), 20053-20065. (24) Miao, Y.; Sinko, W.; Pierce, L.; Bucher, D.; Walker, R. C.; McCammon, J. A., Improved Reweighting of Accelerated Molecular Dynamics Simulations for Free Energy Calculation. J. Chem. Theory Comput. 2014, 10 (7), 2677-2689. (25) Miao, Y. L.; Nichols, S. E.; Gasper, P. M.; Metzger, V. T.; McCammon, J. A., Activation and dynamic network of the M2 muscarinic receptor. Proc. Natl. Acad. Sci. U. S. A. 2013, 110 (27), 10982-10987. (26) de Oliveira, C. A. F.; Hamelberg, D.; McCammon, J. A., Coupling Accelerated Molecular Dynamics Methods with Thermodynamic Integration Simulations. J. Chem. Theory Comput. 2008, 4 (9), 1516-1525. (27) Sinko, W.; de Oliveira, C. A. F.; Pierce, L. C. T.; McCammon, J. A., Protecting High Energy Barriers: A New Equation to Regulate Boost Energy in Accelerated Molecular Dynamics Simulations. J. Chem. Theory Comput. 2012, 8 (1), 17-23. (28) Sinko, W.; Miao, Y.; de Oliveira, C. A. F.; McCammon, J. A., Population Based Reweighting of Scaled Molecular Dynamics. J. Phys. Chem. B 2013, 117 (42), 12759-12768. (29) Xu, C.; Wang, J.; Liu, H., A Hamiltonian Replica Exchange Approach and Its Application to the Study of Side-Chain Type and Neighbor Effects on Peptide Backbone Conformations. J. Chem. Theory Comput. 2008, 4 (8), 1348-1359. (30) Fajer, M.; Hamelberg, D.; McCammon, J. A., Replica-Exchange Accelerated Molecular Dynamics (REXAMD) Applied to Thermodynamic Integration. J. Chem. Theory Comput. 2008, 4 (10), 1565-1569. (31) Arrar, M.; de Oliveira, C. A. F.; Fajer, M.; Sinko, W.; McCammon, J. A., w-REXAMD: A Hamiltonian Replica Exchange Approach to Improve Free Energy Calculations for Systems with Kinetically Trapped Conformations. J. Chem. Theory Comput. 2013, 9 (1), 18-23. (32) Yang, L.; Shao, Q.; Gao, Y. Q., Comparison between integrated and parallel tempering methods in enhanced sampling simulations. J. Chem. Phys. 2009, 130 (12), 124111. (33) Shao, Q.; Shi, J.; Zhu, W., Determining Protein Folding Pathway and Associated Energetics through Partitioned Integrated-Tempering-Sampling Simulation. J. Chem. Theory Comput. 2017, 13 (3), 1229-1243. (34) Xie, L.; Shen, L.; Chen, Z.-N.; Yang, M., Efficient free energy calculations by combining two complementary tempering sampling methods. J. Chem. Phys. 2017, 146 (2), 024103. (35) Yang, Y. I.; Zhang, J.; Che, X.; Yang, L. J.; Gao, Y. Q., Efficient sampling over rough energy landscapes with high barriers: A combination of metadynamics with integrated tempering sampling. J. Chem. Phys. 2016, 144 (9).

ACS Paragon Plus Environment

29

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 32

(36) Shao, Q., Enhanced conformational sampling technique provides an energy landscape view of large-scale protein conformational transitions. Phys. Chem. Chem. Phys. 2016, 18 (42), 29170-29182. (37) Yang, L.; Qin Gao, Y., A selective integrated tempering method. J. Chem. Phys. 2009, 131 (21), 214109. (38) Shao, Q.; Shi, J.; Zhu, W., Enhanced sampling molecular dynamics simulation captures experimentally suggested intermediate and unfolded states in the folding pathway of Trp-cage miniprotein. J. Chem. Phys. 2012, 137 (12), 125103. (39) Wu, T.; Yang, L.; Zhang, R.; Shao, Q.; Zhuang, W., Modeling the Thermal Unfolding 2DIR Spectra of a β-Hairpin Peptide Based on the Implicit Solvent MD Simulation. J. Phys. Chem. A 2013, 117 (29), 6256-6263. (40) Chen, L.; Shao, Q.; Gao, Y.-Q.; Russell, D. H., Molecular Dynamics and Ion Mobility Spectrometry Study of Model β-Hairpin Peptide, Trpzip1. J. Phys. Chem. A 2011, 115 (17), 4427-4435. (41) Shao, Q.; Shi, J.; Zhu, W., Molecular dynamics simulation indicating cold denaturation of β-hairpins. J. Chem. Phys. 2013, 138 (8), 085102. (42) Shao, Q., Probing Sequence Dependence of Folding Pathway of α-Helix Bundle Proteins through Free Energy Landscape Analysis. J. Phys. Chem. B 2014, 118 (22), 5891-5900. (43) Shao, Q.; Qin Gao, Y., The relative helix and hydrogen bond stability in the B domain of protein A as revealed by integrated tempering sampling molecular dynamics simulation. J. Chem. Phys. 2011, 135 (13), 135102. (44) Shao, Q.; Yang, L.; Gao, Y. Q., Structure change of β-hairpin induced by turn optimization: An enhanced sampling molecular dynamics simulation study. J. Chem. Phys. 2011, 135 (23), 235104. (45) Liu, C. W.; Wang, F.; Yang, L. J.; Li, X. Z.; Zheng, W. J.; Gao, Y. Q., Stable Salt-Water Cluster Structures Reflect the Delicate Competition between Ion-Water and Water-Water Interactions. J. Phys. Chem. B 2014, 118 (3), 743-751. (46) Fu, X. B.; Yang, L. J.; Gao, Y. Q., Selective sampling of transition paths. J. Chem. Phys. 2007, 127 (15). (47) Yang, L. J.; Liu, C. W.; Shao, Q.; Zhang, J.; Gao, Y. Q., From Thermodynamics to Kinetics: Enhanced Sampling of Rare Events. Acc. Chem. Res. 2015, 48 (4), 947-955. (48) Shen, T.; Hamelberg, D., A statistical analysis of the precision of reweighting-based simulations. J. Chem. Phys. 2008, 129 (3), 034103. (49) Mori, T.; Hamers, R. J.; Pedersen, J. A.; Cui, Q., Integrated Hamiltonian Sampling: A Simple and Versatile Method for Free Energy Simulations and Conformational Sampling. J. Phys. Chem. B 2014, 118 (28), 8210-8220. (50) Hamelberg, D.; de Oliveira, C. A. F.; McCammon, J. A., Sampling of slow diffusive conformational transitions with accelerated molecular dynamics. J. Chem. Phys. 2007, 127 (15), 155102. (51) Pierce, L. C. T.; Salomon-Ferrer, R.; de Oliveira, C. A. F.; McCammon, J. A.; Walker, R. C., Routine Access to Millisecond Time Scale Events with Accelerated Molecular Dynamics. J. Chem. Theory Comput. 2012, 8 (9), 2997-3002. (52) Shen, L.; Xie, L.; Yang, M., Thermodynamic properties of solvated peptides from selective integrated tempering sampling with a new weighting factor estimation algorithm. Mol. Phys. 2017, 115 (7), 885-894.

ACS Paragon Plus Environment

30

Page 31 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(53) Zhao, P.; Yang, L. J.; Gao, Y. Q.; Lu, Z.-Y., Facile implementation of integrated tempering sampling method to enhance the sampling over a broad range of temperatures. Chem. Phys. 2013, 415, 98-105. (54) Hunter, J. D., Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 2007, 9 (3), 90-95. (55) Humphrey, W.; Dalke, A.; Schulten, K., VMD: Visual molecular dynamics. J. Mol. Graphics Modell. 1996, 14 (1), 33-38. (56) Gao, Y. Q.; Yang, L., On the enhanced sampling over energy barriers in molecular dynamics simulations. J. Chem. Phys. 2006, 125 (11), 114103. (57) Perez, A.; Morrone, J. A.; Simmerling, C.; Dill, K. A., Advances in free-energy-based simulations of protein folding and ligand binding. Curr. Opin. Struct. Biol. 2016, 36, 25-31. (58) Paschek, D.; Day, R.; García, A. E., Influence of water-protein hydrogen bonding on the stability of Trp-cage miniprotein. A comparison between the TIP3P and TIP4P-Ew water models. Phys. Chem. Chem. Phys. 2011, 13 (44), 19840. (59) Jing, Z.; Sun, H., A Comment on the Reweighting Method for Accelerated Molecular Dynamics Simulations. J. Chem. Theory Comput. 2015, 11 (6), 2395-2397.

ACS Paragon Plus Environment

31

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 32

for Table of Contents use only

Integrating Multiple Accelerated Molecular Dynamics to Improve Accuracy of Free Energy Calculation Xiangda Peng, Yuebin Zhang, Yan Li, QingLong Liu, Huiying Chu, Dinglin Zhang and Guohui Li

ACS Paragon Plus Environment

32