Hybrid Cascade-Type Molecular Dynamics with a ... - ACS Publications

Nov 23, 2018 - state model (MSM). The former rapidly generates approximated transition paths directly connecting reactants with products, and the...
0 downloads 0 Views 747KB Size
Subscriber access provided by University of South Dakota

Biomolecular Systems

Hybrid Cascade-Type Molecular Dynamics with Markov State Model for Efficient Free Energy Calculations Ryuhei Harada, and Yasuteru Shigeta J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.8b00802 • Publication Date (Web): 23 Nov 2018 Downloaded from http://pubs.acs.org on November 24, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Hybrid Cascade-Type Molecular Dynamics with Markov State Model for Efficient Free Energy Calculations Ryuhei Harada*,† and Yasuteru Shigeta*,† †Center

for Computational Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8577, Japan

Corresponding author: Ryuhei Harada, Yasuteru Shigeta E-mail: [email protected], [email protected] Abstract A protocol for calculating free energy landscape (FEL) is proposed based on a combination of two cascade-type molecular dynamics (MD) methods, parallel cascade selection MD (PaCS-MD) and outlier flooding method (OFLOOD), with a help of Markov state model (MSM). The former rapidly generate approximated transition paths directly connecting reactants with products, and the latter complementary resampled marginal conformational subspaces. Trajectories obtained by them give reliable microstates in MSM providing accurate FEL with low computational costs. As a demonstration, the present method was applied to a mini-protein (Chignolin and Trp-cage) in explicit water and successfully elucidated multiple folding paths on their free energy landscapes. Our method could be applicable to a wide variety of biological systems to estimate their free energy profiles.

1 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

2 ACS Paragon Plus Environment

Page 2 of 30

Page 3 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

I. INTRORUCTION Molecular dynamics (MD) simulations can sample variable configurations of given systems as time series of dynamics with the time/special resolutions (fs/Å). However, it might still be difficult to obtain reliable canonical ensembles to calculate a free energy landscape (FEL) because the accessible time scales of conventional MD (CMD) simulation are often shorter than those of actual physical phenomena meaning lack of enough conformational sampling. To overcome the sampling problem, several efficient sampling methods that adopt external biases have been proposed such as metadynamics,1-4 multicanonical MD (McMD),5 and replica-exchange MD (REMD)6 and implemented in well-established MD programs. In addition to the above well-established strategies, a large variety of alternative methods have already been recently proposed to reproduce/predict the conformational transitions of proteins.7-10 In particular, well assessed enhanced sampling methods to calculate reliable free energies have also been implemented and refined in the last decade.11-13 In addition to these methods, we have proposed enhanced conformational sampling methods called cascade-type MD, which promote essential conformational transitions of a given system without any external biases by repeating cycles of (1) selections of initial structures relevant to transitions of the proteins and (2) restarting short-time MD from the selected initial structures (initial velocities are regenerated based on Maxwell ― Boltzmann distribution at a given temperature). Despite the simple procedure, we have numerically proven that the cascade-type MD observes rare events of a wide variety of systems such as protein folding,7 domain motions,8 and polymerizations of a nanocube14 quite

3 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 30

efficiently, improving the insufficient conformational sampling of CMD (see our review paper for their computational details).15 A fatal drawback of the cascade-type MD is that it cannot provide FEL for conformational transitions of a given system, while it can rapidly generate a set of transition paths. Previously, the umbrella sampling method (US)16,

17

with the weighted histogram analysis method (WHAM)18 has been used to

obtain FEL for the transition paths after cascade-type MD.19 However, this protocol needs an additional computational cost as a post-processing treatment. Thus, a robust free energy calculation protocol should be established without increasing the computational costs. In the present study, we propose an alternative free energy calculation method based on a combination of the cascade-type MD with a help of Markov state model (MSM), which needs a less computational cost than the US/WHAM-based approach. In the past decades, there have been several preceding studies of FEL calculations based on MSM, and they succeeded in estimating FELs efficiently.20-27 First, the MSM-based free energy calculation is introduced, then demonstrations for simple mini peptides are explained to confirm the effectiveness of the present method.

II. METHODS The cascade-type MD includes several kinds of conformational sampling methods.15 Among them, we propose a hybrid conformational sampling based on the following two cascade-type MDs, i.e. (i) parallel cascade selection MD (PaCS-MD)28 and (ii) outlier flooding method (OFLOOD).29 In selecting the initial

4 ACS Paragon Plus Environment

Page 5 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

structures in (i), the former refers the similarity to the product (for instance root-mean-square deviation with respect to the product, i.e. RMSDproduct), while the latter does the dissimilarity to the previously sampled region characterized as outliers. For their strategies of conformational sampling, see Figs. S1 and S2 in Supporting Information (SI). Therefore, these methods often select different initial structures, meaning that their sampled regions are complementary to each other. In the hybrid conformational sampling, PaCS-MD is first utilized to generate a set of direct transition paths from a given reactant to a product, and then OFLOOD expand surrounding conformational subspaces (boundaries) along the transition paths. Owing to the combination of PaCS-MD and OFLOOD, a broad conformational subspace is searched rapidly and efficiently. Note here that although OFLOOD can of course detect the transition paths if one repeats the cycles, it is rather effective to adopt PaCS-MD when a set of end-point structures is known a priori. One of the advantages of adding the OFLOOD technique is the intensive search of transitional regions that might be not sampled enough in PaCS-MD. To calculate accurate FELs, the transitional regions should be sampled enough. Therefore, the hybrid-type strategy adding the OFLOOD technique combined with the PaCS-MD search can improve the insufficient conformational sampling for the transitional regions, enabling us to estimate FELs efficiently. After the hybrid conformational sampling, the generated trajectories are quantitatively evaluated with MSM. By specifying appropriate reaction coordinates (RCs), trajectories of the hybrid conformational sampling are projected onto a conformational subspace spanned by them. The construction of a reliable

5 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 30

MSM is done with the two steps, i.e. defining microstates and choosing a reasonable lag time. First, trajectories of the hybrid conformational sampling are clustered by the k-means algorithm to yield N clusters, which are defined as the microstates, and then each trajectory is assign to the closet microstates. To determine the appropriate lag time 𝜏, the ith slowest implied time, ti, estimated from the ith largest eigenvalue, 𝜆𝑖(𝜏), of the N × N trial transition matrix T defined as 𝜏

𝑡𝑖(𝜏) = ― ln 𝜆𝑖(𝜏),

(1)

is monitored by changing 𝜏 until 𝑡𝑖(𝜏) reaches an approximately constant value, meaning that a given system satisfies the Markov assumption. After determining the reasonable lag time, the transitions among the microstates are again counted to estimate the maximum likelihood transition matrix T under the constrained detailed balance. To estimate the FEL, one has to obtain a normalized equilibrium distribution  = {𝜋𝑖} (∑ 𝜋𝑖 = 1), from the transition matrix T, which fulfills  = T , meaning that  is obtained from 𝑖 one of the eigenvectors of T. Finally, the FEL is calculated as follows: 𝜋𝑖

𝐹𝑖 = ― 𝑘𝐵𝑇lnmax 𝜋𝑗 (i = 1, 2, … , N),

(2)

𝑗

where the origin of FEL is selected as the maximum value of . After constructing MSM, the resulting model is validated based on the Chapman-Kolmogorov test using the Chapman-Kolmogorov equation: 𝑻(𝑛𝜏) ≈ 𝑻(𝜏)𝑛,

(3)

where n is an integer number of steps. Theoretically, the Chapman-Kolmogorov test is usually conducted by

6 ACS Paragon Plus Environment

Page 7 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

comparing the probability of remaining in a selected state at increasing time steps of the MD trajectories with that of MSM predictions. In the present test, the probability decay of major metastable states predicted by MSM is compared to those estimated from the original simulations. As demonstrations of the present methods, folding free energy landscapes (FELs) of the well-known peptides (Chignolin and Trp-cage) were estimated. To construct full-atomic systems, the target systems were solvated with the TIP3P water model30 in rectangular simulation boxes. Each system included ions for neutralization. For Chignolin, a completely extended structure was modeled based on the amino-acid sequence,31 which was regarded as a given reactant. For Trp-cage, an unfolding MD simulation for 100 ns under NPT (T = 500 K and P = 1 bar) was started from the native structure (PDBid: 1L2Y)32 to generate a denatured configuration as a given reactant. After the 100-ns MD simulation at 500 K, the final denatured snapshot was adopted as a reactant of PaCS-MD. During the conformational resampling based on short-time MD simulations, the chemical bonds of the peptides were treated as rigid bodies with the LINCS algorithm.33 The water molecules were treated as rigid bodies with the SETTLE algorithm.34 The temperatures of systems were controlled with the modified Berendsen thermostat.35 The pressures of system were controlled with the Parrinello–Rahman method.36, 37 The electrostatic interactions were calculated with the particle mesh Ewald method38 using a real space cutoff of 10 Å. The cutoff value for van der Waals interactions was set to 10 Å. All the MD simulations of the present demonstrations were performed with the GPU version of the GROMACS 2018 package39 using the AMBER ff03 force field.40

7 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 30

In the present demonstrations, the PaCS-MD cycle was repeated until C RMSD measured from each native structure (C RMSDnative) was less than 1.0 Å, i.e. C RMSDnative < 1.0 Å. After terminating the PaCS-MD cycle, OFLOOD was performed to expand the boundaries of distribution functions. Finally, MSM was constructed from the trajectories of hybrid conformational sampling to estimate the folding FELs of these peptides.

III. RESULTS AND DISCUSSIONS For the first demonstration, the folding process of Chignolin was elucidated with the hybrid cascade-type MDs with MSM. First, a set of initial guesses of folding paths was prepared with PaCS-MD. At every cycle in PaCS-MD, eight initial structures were selected by referring to the RMSDproduct (C RMSDnative) values, i.e. the highly ranked top eight structures were always resampled by restarting short-time (100-ps) MD simulations from them. The green line in Fig. 2(a) shows a profile of RMSDproduct as a function of cycle number. Judging from the profile, PaCS-MD successfully reproduced the protein folding of Chignolin at the 10th cycle (RMSDproduct < 1.0 Å). To further investigate the folding process of Chignolin by OFLOOD, the trajectories of PaCS-MD for all the 25 cycles were projected onto a subspace spanned by a set of hydrogen bonds between the backbone atoms, i.e. the first one: the distance between D3N and G7O, the second one: the distance between D3N and G7O (see the blue and red spheres in Fig. 1(a) for their correspondences). Herein, these hydrogen bond distances have been widely utilized to characterize the

8 ACS Paragon Plus Environment

Page 9 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

folding process of Chigonolin.41-42 In terms of the distributions on the subspace, clusters and outliers were detected at the beginning of each OFLOOD cycle. In OFLOOD, a hundred outliers were detected and resampled with relatively long-time (500-ps) MD simulations at each cycle. Finally, the OFLOOD cycle was repeated until the searched conformational area spanned by the hydrogen bonds was converged enough. The progress of conformational sampling was monitored by counting the total number of a grid (ngrid) on the subspace searched by the hybrid cascade-type MD simulation. Herein, ngrid was defined as a unit cell on the subspace spanned by a set of the hydrogen bond distances (0.25 Å × 0.25 Å), and it was counted to monitor how the hybrid cascade-type MD simulation searched the subspace for the OFLOOD cycles. Ideally, the total number of searched unit cells increases as cycles went by and converges to a constant value. The green line in Fig. 2(b) shows a profile of ngrid sampled by OFLOOD for the 25 cycles, indicating an enough convergence at the 25th cycle, i.e. the present cycle might be valid to exploring the subspace. To construct a reliable MSM, the implied times were addressed. The resulting MSM contains 50 microstates obtained by the k-means clustering algorithm. To estimate the lag times, a MSM builder EMMA43 was utilized. Figure 3(a) shows the implied times as a function of the lag time. Judging from Fig. 3(a), the implied time scale curves reach constant when the lag time goes beyond around 200 ps. Therefore, 200 ps was selected as a reasonable lag time to construct MSM in the present system. Next, to justify the MSM constructed with the hybrid cascade-type MD simulation, we considered Chapman-Kolmogorov test as discussed in the preceding study.23 This test was also performed using a tool of EMMA.43 Figure 3(c)

9 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 30

shows a set of profiles of probability decays. Judging from these profiles, the estimated curve of the probabilities decay matched well with the predicted one, indicating that the resulting model is Markovian. The folding FEL obtained by the present protocol (Fig. 4(a)) was compared with an approximated FEL estimated from the minus logarithm of distribution obtained by PaCS-MD for the 25 cycles (Fig. 4(c)). Comparing these results, PaCS-MD might generate direct (end-to-end) folding paths to the native state on the folding FEL. In addition, the mis-folded state of Chignolin was not sampled yet for the 25 PaCS-MD cycle. This means that MSM constructed only with PaCS-MD does not give a correct FEL because of its insufficient conformational sampling. In contrast, the hybrid-type protocol successfully identified both the native and mis-folded states as the global and local minima on the folding FEL, respectively (see Fig. 4(a)). Because of the additional boundary search with OFLOOD, the insufficient conformational sampling of PaCS-MD might be drastically improved. For a reference conformational search, an extremely long-time (10-s) CMD simulation was started from the native structure of Chignolin. Figure S3 in SI shows a FEL estimated from the 10-s CMD simulation. Judging from the FEL, the native and mis-folded states were only identified. Furthermore, the conformational area for the unfolded states were not sampled at all, indicating that the quite long-time MD simulation failed to search the broad conformational subspace even for the present peptides. On the other hand, for the computational cost to search the conformational area, the hybrid-type conformational sampling

10 ACS Paragon Plus Environment

Page 11 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

can be fully parallelized and the total required time for our protocol for the Chignolin was 1.275 s. These results clearly show the high conformational sampling efficiency of the present protocol. For a comparison to the preceding study,44-46 the searched conformational areas (the two-dimensional subspaces spanned by the hydrogen bond distances) were almost the same between the hybrid cascade-type MD simulation and the multiscale enhance sampling (MSES). In both the methods, the edges of the subspace with high fee energy values were broadly sampled, indicating that rarely occurring states were sampled enough. Additionally, the locations of metastable states (the native, misfolded, and intermediate states) on the folding FEL showed a good correspondence between the hybrid cascade-type conformational sampling and MSES. Comparison with the preceding study based on MSM,47 a free energy barrier upon the transition from the misfolded to native states was ~ 6 kBT on the folding FEL projected onto a set of relaxation modes, showing a good agreement with the present estimation based the hybrid cascade-type conformational search. These comparisons with the past studies might validate our folding FEL calculation of Chignolin. Next, to generate initial guesses of folding paths of Trp-cage, the PaCS-MD cycle was repeated. A profile of RMSDproduct (C RMSDnative) as a function of the cycle number (red line in Fig. 2(a)) tells one that PaCS-MD successfully generated a set of folding paths between the unfolded to native states since the RMSDproduct value was less than 1.0 Å at the 25th cycle. To characterize the folding of Trp-cage, a conformational subspace spanned by a set of partial C RMSDs measured for two segments (SegA and 11 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 30

SegB) of Trp-cage were considered. Herein, Trp-cage consists of three regions: (A) the  helix region, (B) the loop region, and (C) the other region (see blue, green and red parts in Fig. 1(b)). SegA contains the  helix and the loop regions and SegB does the loop and the other regions. The overlapping loop region as a common part ensures the correct folding structure of Trp-cage when both the segments fold. By monitoring the partial RMSDs (RMSDSegA, RMSDSegB), the folding processes of Trp-cage are characterized with formations of the secondary structure and the other region. After generating the initial guesses of folding paths, the conformational resampling based on OFLOOD was repeated until the 25th cycle. At every cycle, a hundred outliers were again selected from the sampled trajectories after clustering, and they were intensively resampled with 500-ps MD simulations. To end the OFLOOD cycle, the searched conformational area was monitored by counting the total number of grids (ngrid) searched based on OFLOOD. Herein, the grid was defined as a unit cell (0.25 Å × 0.25 Å) on the subspace spanned by the partial RMSDs. The red line in Fig. 2(b) shows that ngrid well converged at the 25th cycle, indicating the present cycle might be enough to estimate the folding FEL on the subspace. To confirm a reliable MSM, the implied times were monitored as a function of lag time (Fig. 3(b)) and reached constant values after 200 ps, i.e. the reasonable lag time of the Trp-cage system was set to 200 ps. Also, we considered Chapman-Kolmogorov test to validate the building MSM from the trajectories of hybrid cascade-type MD simulation. Figure 3(d) shows a set of profiles of probability decays. Judging from these profiles, the estimated curve of the probabilities decay matched well with the predicted one, indicating

12 ACS Paragon Plus Environment

Page 13 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

that the resulting model is Markovian. Based on the constructed MSM, the folding FEL was estimated on the subspace spanned by a set of partial RMSDs (see Fig. 4(b)). In the FEL, two local minimum states (LM1 and LM2) were identified in addition to the native (N) and unfolded (UF) states. There are two different folding paths from the unfolded to native states, i.e. (UF→LM1→N: path 1), where the  helix is formed at first, and (UF→LM2→N: path 2), where the  helix is formed at the end. The former is dominant since the free energy value of LM2 was much higher than that of LM1. This evidence implies that path 1 will be observable as the major folding path with a higher possibility than path 2 as the minor folding path. In other words, the formation of  helix at the early folding stage might regulate the rapid folding of Trp-cage, i.e. the formation of secondary structure might be rate-limiting. Judging from the approximate FEL (Fig. 4(d)), the initial guess of the transition path obtained by PaCS-MD slightly deviated from the major folding path (path 1) and the minor folding path (path 2) is missing (compare Figs. 4(c) and 4(d)), suggesting these multiple folding paths were identified neither only with PaCS-MD nor CMD simulation. Therefore, the present protocol might be efficient for quantitatively evaluating conformational transition paths of a given system. Herein, we mention the choice of RMSD as an RC. As the previous studies reported,21, 26 RMSD might be problematic as an RC in constructing MSMs. In more detail, RMSD might be an inappropriate RC due to the tendency to group kinetically separated states of protein folding. In the present demonstration, we

13 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 30

specified a set of partial RMSDs for the secondary structure ( helical and non-helical regions) and reproduced the multiple folding pathways of Trp-cage, showing a good correspondence with the previous studies,48,

49

indicating that these partial RMSDs worked well to characterize the folding of Trp-cage.

However, in actual applications, it is not ensured that appropriate RCs are non-trivial a priori. Therefore, one has to carefully specify appropriates RCs that can characterize conformational transitions of biological targets to construct reliable MSMs. Now we would like to discuss several issues with using RMSDs as RCs in constructing MSMs. Firstly, there might be many structurally diverse unfolded states that will have differing free energies but similar RMSD values. Next, there might be many other unfolded structures with differing RMSD values. Therefore, the MSM constructions using RMSD may not converge well. To validate the folding FEL of Trp-cage projected onto the partial RMSDs (Fig. 4(b)), a folding FEL was newly calculated using another set of RCs, i.e. characteristic pair distances relevant to the folding of Trp-cage have been reported in the preceding studies.50-52 The 1st RC, d(D9-R16), is a distance between C atoms of Asp9 and Arg16, characterizing a salt bridge of the native state. The 2nd RC, d(W6-P18), is a distance between C atoms of Trp6 and Pro18, characterizing a hydrophobic stacking of the central core of the native state. To recalculate the folding FEL, all the trajectories of the hybrid-type conformational sampling were projected onto a conformational subspace spanned by d(D9-R16) and d(W9-P18). Then a new transition matrix was estimated by counting transitions among states defined in the new subspace spanned by the pair distances. Figure S4(a) in SI shows implied times as a function of lag time for the MSM constructed from the updated projections onto the subspace spanned by the pair distances. Finally, the implied times reached constant values after 200 ps, i.e. the reasonable lag time was set to 200 ps. In addition, we considered the 14 ACS Paragon Plus Environment

Page 15 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Chapman-Kolmogorov test to validate the new MSM. Figure S4(b) shows a set of profiles of probability decays. Judging from these profiles, the estimated curve of the probabilities decay matched well with the predicted one, indicating that the updated MSM is also Markovian.

Figure S5 in SI shows the new folding FEL projected onto the subspace spanned by the pair distances. Judging from the updated folding FEL, only one folding path was identified, corresponding to the major folding path (UF→LM1→N) on the previous FEL (Fig. 4(b)). Herein, the minor folding path (UF→ LM2→N) was not identified on the updated FEL. Therefore, the set of partial RMSDs might be reasonable RCs because they could split the metastable states (LM1 and LM2) and identified the major/minor folding paths, while the set of pair distances failed to identify one of the folding paths. The validity of the partial RMSDs for segments has been confirmed in the several preceding studies of Trp-cage. 48, 51 As shown in this additional MSM construction, the reliability of constructed MSMs might depend on initial specifications of RCs.

For a computational cost to estimate the folding FEL of Trp-cage, we compare our results with the previous study by REMD,53 where the RMED simulation was performed in terms of 36 replicas ranging from 273 to 460 K. To achieve the folding-unfolding equilibrium conditions, each replica was simulated for ~1.2 μs, i.e. the total simulation time of the REMD simulation was ~ 43.2 s, providing the broad folding FEL. On the FEL obtained by the REMD simulation, two folding paths were detected, which correspond to the major/minor folding paths detected in the present study, showing a good agreement with each other. Our hybrid conformational sampling only took 1.275 s to estimate the folding FEL including the multiple (major/minor) paths, suggesting that the present strategy is one of the most efficient FEL calculation methods. 15 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 30

For a comparison to the previous study, the integrated-tempering-sampling molecular dynamics (ITS-MD) simulation49 using the same force field (AMBER ff03)40 identified a major folding path via the intermediate state with a helix partially formed (LM1). This intermediate state has also been identified in a previous study using bias-exchange metadynamics simulations,48 where Trp6 interacts strongly with Pro12 from the central helix in the intermediate state. Also, the ITS-MD simulation estimated a free energy barrier upon the transition from the intermediate to native states was ~ 3kBT, showing a good agreement with our estimation based on MSM using the trajectories of hybrid cascade-type MD simulation. In addition, accelerated MD (aMD) simulations54 estimated the free energy barrier upon the transition from the intermediate to native states as ~ 3kBT. These comparisons with the previous studies might quantitatively validate the folding path of Trp-cage reproduced with the hybrid-type conformational sampling and MSM.

As a comparison to the pre-existing methods,1, 3, 5, 6 our methods are easy to implement because a set of simple scripts is required to repeat the cycles of conformational sampling without modifying any source codes of MD programs. Generally, the pre-existing methods need to find optimal parameters before their applications. In contrast, our methods only need to specify the total number of initial structures to be selected at each cycle, which might be one of the advantages of our methods. As a disadvantage, the current strategy cannot generate rigorous conformational ensembles, while the pre-existing methods do. Therefore, our methods need to construct MSMs to generate reliable ensembles using the obtained short-time (ps-order)

16 ACS Paragon Plus Environment

Page 17 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

trajectories. However, the constructions of MSMs can be automated, and their computational costs are almost negligible compared to the conformational sampling process.

For comparisons to to the free energy calculations based on the WHAM/MD simulations, we refer to our previous studies.42,

55-57

In our previous studies, the folding FELs of these kinds of peptides

(Chignolin, Trp-cage, and Villin) have been calculated based on the WHAM/MD simulations combined with the umbrella samplings. To prepare references for the umbrella samplings, we performed MD simulations using coarse-grained models (C-atom based models or implicit solvent models) at the first step. At the second step, all-atom reference structures for the umbrella samplings were reconstructed from the coarse-grained configurations generated at the first step. Finally, the umbrella samplings were performed using each reference, and the WHAM weighted all the configurations sampled by the umbrella samplings, providing the folding FELs of these kinds of peptides. For instance, one hundred representative all-atom structures were reconstructed and utilized as references for the Chignolin, Trp-cage, and Villin systems.42, 56, 57

In more detail, the umbrella samplings were performed by imposing a set of harmonic restraints with

respect to each reference for 1 ns. Therefore, we needed 100 ns (1 ns × 100 references) computational cost to estimate the folding FELs projected onto the same subspaces. Actually, the folding FELs of Chignolin/Trp-cage showed good correspondences between our previous56, 57 and the present studies, i.e. the location of metastable state (the location of mis-folded state of Chignolin on the same subspace) and the folding processes of Trp-cage (the major/minor folding paths on the same subspace). These computational evidences validate that both the strategies can calculate their folding FELs.

In conclusion, we need an additional conformational sampling when the hybrid-type conformational sampling is combined with the WHAM/MD simulations. To calculate FELs, the umbrella sampling should be additionally performed after finishing the hybrid-type conformational sampling. In contrast, the hybrid-type conformational sampling combined with MSM can calculate FELs by constructing 17 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 30

reliable MSM without the additional umbrella sampling. Therefore, the present strategy might have an advantage with less computational costs compared to the WHAM/MD simulations based on the umbrella sampling.

IV. SUMMARY

In summary, we have proposed a protocol to efficiently evaluate FEL using combined PaCS-MD and OFLOOD with MSM method. The former serves to rapidly generate approximated transition paths, and the latter additionally resampled their marginal conformational subspaces to construct a reliable MSM. The demonstrations of folding processes of Chignolin and Trp-cage clearly show the effectiveness and accuracy of the present hybrid conformational sampling. As one of the advantages of the present strategy, the conformational insufficiency of PaCS-MD will be improved by the combination with OFLOOD, and one can estimate free energy profiles by constructing reliable MSMs using the sampled trajectories. Because of the simple protocol, one might easily implement our method without modifying source codes of MD programs.

Upon applications of the present method, users make use of a priori assigned RCs. However, it might be generally difficult to specify a unique or well-defined RC suitable for characterizing complex scenarios of rare events. Therefore, to address the issue of ill-specified RCs, the conventional PaCS-MD have been extended.58 In the extended PaCS-MD, candidates of RCs are preliminarily prepared, and an 18 ACS Paragon Plus Environment

Page 19 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

optimal RC is dynamically searched from them. For RCs, their gradients are monitored per cycle to find an RC with the steepest gradient between the adjacent cycles, i.e. the (i-1)th and the ith cycles. Then, the conformational resampling proceeds along the steepest direction of the optimal RC, i.e. a reasonable RC is specified on-the-fly manner. To treat complex processes, the extended PaCS-MD might be replaced with the conventional PaCS-MD in the hybrid conformational sampling. In the present study, we limited our implementation to a combination of just two of our methods, i.e. PaCS-MD and OFLOOD. However, the hybrid-type conformational sampling method might be generalized by considering possible combinations of all the methods. Depending on given problems, some of hybrid conformational sampling methods might further accelerate the FEL calculations, while a reasonable set of RCs should be searched and specified a priori.

Acknowledgement The present work was supported by a Japan Society for the Promotion of Science (JSPS) KAKENHI Grant-in-Aid for Young Scientists (A) (JP16H06164). This research was partially supported by the Initiative on Promotion of Supercomputing for Young or Women Researchers, Information Technology Center, The University of Tokyo. This work was supported by the TSUBAME Encouragement Program for Young/Female Users of the Global Scientific Information and Computing Center at Tokyo Institute of Technology.

19 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 30

Supporting Information Available Details of PaCS-MD, details of OFLOOD, FEL of Chignolin obtained from the 10-s MD simulation, convergences of characteristics of MSM constructed for the Chignolin and Trp-cage systems, and FEL of Trp-cage. This information is available free of charge via the Internet at http://pubs.acs.org

Author Information Corresponding Authors E-mail: [email protected] E-mail: [email protected]

20 ACS Paragon Plus Environment

Page 21 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 1

(a) Native structure of Chignolin. Chignolin has a characteristic hydrogen bond between the nitrogen (D3N) and oxygen (G7O) atoms of backbone at the native state. In contrast, an alternative hydrogen bond is formed between the nitrogen (D3N) and oxygen (T8O) atoms of backbone at the mis-folded state. (b) Native structure of Trp-cage. Each color represents each region of Trp-cage. (blue) Secondary structure (residues 2-8). (green) Loop region (residues 8-11). (red) The other region (residues 11-19).

21 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 30

Figure 2

(a) Profiles of RMSDprodcut (C RMSDnative) as a function of cycle number for (green) Chignolin and (red) Trp-cage. The dashed line represents a threshold for judging whether the PaCS-MD cycle identified the native structures or not. (b) Profiles of the number of grids on each conformational subspace identified with the OFLOOD cycle as a function of cycle number for (green) Chignolin and (red) Trp-cage. The dashed lines represent converged values of number of grids after the 25th cycle.

22 ACS Paragon Plus Environment

Page 23 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 3

(left) Implied relaxation time scales of MSM in terms of the k-means algorithm with 50 cluster centers as a function of lag time for (a) Chignolin and (b) Trp-cage. (right) Chapman-Kolmogorov test of the MSM conducted at  = 200 ps with 50 microstates defined by the k-means clustering for (two) major metastable states, corresponding to the probability decay and the predicted one: the pair of the red/green and magenta/blue lines for (c) Chignolin and (d) Trp-cage.

23 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 30

Figure 4

FELs estimated with MSM using the trajectories of hybrid conformational sampling for (a) Chignoliln and (b) Trp-cage. In Figs. 4(a-d), N and UF represent the folded (the products) and the unfolded (the reactants) configurations of Chignolin and Trp-cage. IM and MF in Fig. 4(a) represent mis-folded and intermediate configurations of Chignolin. LM1 and LM2 in Fig. 4(b) represent metastable states during the major and minor folding paths of Trp-cage. All the free energy values are scaled by kBT. Minus logarithm of populations of states sampled with PaCS-MD (25 cycles) for (c) Chignolin and (d) Trp-cage. For Chignolin,

24 ACS Paragon Plus Environment

Page 25 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

a subspace spanned by a set of hydrogen bond distances between the backbone atoms (D3N-G7O and D3N-T8O). For Trp-cage, a subspace spanned by a set of partial C RMSD for each segment defined in Fig. 1, i.e. SegA and SegB. Herein, SegA contains the  helix and the loop regions (the blue/green parts in Fig. 1(b)). SegB contains the loop and the other regions (the green/red parts in Fig. 1(b)). The overlapping loop region (the green part in Fig. 1(b)) as a common part ensures the correct folding structure of Trp-cage when both the segments fold.

25 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 30

References 1.

Laio, A.; Parrinello, M. Escaping Free-Energy Minima. Proc. Natl. Acad. Sci. U. S. A. 2002, 99, 12562-12566.

2.

Raiteri, P.; Laio, A.; Gervasio, F. L.; Micheletti, C.; Parrinello, M. Efficient Reconstruction of Complex Free Energy Landscapes by Multiple Walkers Metadynamics. J. Phys. Chem. B 2006, 110, 3533-3539.

3.

Piana, S.; Laio, A. A Bias-Exchange Approach to Protein Folding. J. Phys. Chem. B 2007, 111, 4553-4559.

4.

Laio, A.; Gervasio, F. L. Metadynamics: A Method to Simulate Rare Events and Reconstruct the Free Energy in Biophysics, Chemistry and Material Science. Rep. Prog. Phys. 2008, 71, 126601-1-22.

5.

Nakajima, N.; Nakamura, H.; Kidera, A. Multicanonical Ensemble Generated by Molecular Dynamics Simulation for Enhanced Conformational Sampling of Peptides. J. Phys. Chem. B 1997, 101, 817-824.

6.

Sugita, Y.; Okamoto, Y. Replica-Exchange Molecular Dynamics Method for Protein Folding. Chem. Phys. Lett. 1999, 314, 141-151.

7.

Valsson, O.; Parrinello, M. Variational Approach to Enhanced Sampling and Free Energy Calculations. Phys. Rev. Lett. 2014, 113, 090601-1-5.

8.

Bubnis, G.; Risselada, H. J.; Grubmuller, H. Exploiting Lipid Permutation Symmetry to Compute Membrane Remodeling Free Energies. Phys. Rev. Lett. 2016, 117, 188102-1-6.

9.

Miao, Y. L.; McCammon, J. A. Unconstrained Enhanced Sampling for Free Energy Calculations of Biomolecules: A Review. Mol. Simul. 2016, 42, 1046-1055.

10.

Awasthi, S.; Nair, N. N. Exploring High Dimensional Free Energy Landscapes: Temperature Accelerated Sliced Sampling. J. Chem. Phys. 2017, 146, 094108-1-8.

11.

Maragliano, L.; Fischer, A.; Vanden-Eijnden, E.; Ciccotti, G. String Method in Collective Variables: Minimum Free Energy Paths and Isocommittor Surfaces. J. Chem. Phys. 2006, 125, 024106-1-15.

12.

Barducci, A.; Bonomi, M.; Parrinello, M. Metadynamics. WIREs Computational Molecular Science 2011, 1, 826-843.

13.

Frushicheva, M. P.; Mills, M. J. L.; Schopf, P.; Singh, M. K.; Prasad, R. B.; Warshel, A. Computer Aided Enzyme Design and Catalytic Concepts. Curr. Opin. Chem. Biol. 2014, 21, 56-62.

14.

Harada, R.; Mashiko, T.; Tachikawa, M.; Hiraoka, S.; Shigeta, Y. Programed Dynamical Ordering in Self-Organization Processes of a Nanocube: A Molecular Dynamics Study. Phys. Chem. Chem. Phys. 2018, 20, 9115-9122. 26 ACS Paragon Plus Environment

Page 27 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

15.

Journal of Chemical Theory and Computation

Harada, R.; Takano, Y.; Baba, T.; Shigeta, Y. Simple, yet Powerful Methodologies for Conformational Sampling of Proteins. Phys. Chem. Chem. Phys. 2015, 17, 6155-6173.

16.

Torrie, G. M.; Valleau, J. P. Monte-Carlo Study of a Phase-Separating Liquid-Mixture by Umbrella Sampling. J. Chem. Phys. 1977, 66, 1402-1408.

17.

Torrie, G. M.; Valleau, J. P. Non-Physical Sampling Distributions in Monte-Carlo Free-Energy Estimation - Umbrella Sampling. J. Comput. Phys. 1977, 23, 187-199.

18.

Souaille, M.; Roux, B. Extension to the Weighted Histogram Analysis Method: Combining Umbrella Sampling with Free Energy Calculations. Comput. Phys. Commun. 2001, 135, 40-57.

19.

Fujita, J.; Harada, R.; Maeda, Y.; Saito, Y.; Mizohata, E.; Inoue, T.; Shigeta, Y.; Matsumura, H. Identification of the Key Interactions in Structural Transition Pathway of Ftsz from Staphylococcus Aureus. J. Struct. Biol. 2017, 198, 65-73.

20.

Bowman, G. R.; Huang, X. H.; Pande, V. S. Using Generalized Ensemble Simulations and Markov State Models to Identify Conformational States. Methods 2009, 49, 197-201.

21.

Pande, V. S.; Beauchamp, K.; Bowman, G. R. Everything You Wanted to Know About Markov State Models but Were Afraid to Ask. Methods 2010, 52, 99-105.

22.

Lane, T. J.; Bowman, G. R.; Beauchamp, K.; Voelz, V. A.; Pande, V. S. Markov State Model Reveals Folding and Functional Dynamics in Ultra-Long Md Trajectories. J. Am. Chem. Soc. 2011, 133, 18413-18419.

23.

Prinz, J. H.; Wu, H.; Sarich, M.; Keller, B.; Senne, M.; Held, M.; Chodera, J. D.; Schutte, C.; Noe, F. Markov Models of Molecular Kinetics: Generation and Validation. J. Chem. Phys. 2011, 134.

24.

Voelz, V. A.; Jager, M.; Zhu, L.; Yao, S. H.; Bakajin, O.; Weiss, S.; Lapidus, L. J.; Pande, V. S. Markov State Models of Millisecond Folder Acbp Reveals New Views of the Folding Reaction. Biophys. J. 2011, 100, 515-515.

25.

Weber, J. K.; Pande, V. S. Characterization and Rapid Sampling of Protein Folding Markov State Model Topologies. J. Chem. Theory Comput. 2011, 7, 3405-3411.

26.

Chodera, J. D.; Noe, F. Markov State Models of Biomolecular Conformational Dynamics. Curr. Opin. Struct. Biol. 2014, 25, 135-144.

27.

Husic, B. E.; Pande, V. S. Markov State Models: From an Art to a Science. J. Am. Chem. Soc. 2018, 140, 2386-2396.

28.

Harada, R.; Kitao, A. Parallel Cascade Selection Molecular Dynamics (Pacs-Md) to Generate Conformational Transition Pathway. J. Chem. Phys. 2013, 139, 035103-1-10.

29.

Harada, R.; Nakamura, T.; Takano, Y.; Shigeta, Y. Protein Folding Pathways Extracted by Oflood: Outlier Flooding Method. J. Comput. Chem. 2015, 36, 97-102.

30.

Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. Comparison of Simple Potential Functions for Simulating Liquid Water. J. Chem. Phys. 1983, 79, 926-935.

31.

Honda, S.; Yamasaki, K.; Sawada, Y.; Morii, H. 10 Residue Folded Peptide Designed by Segment 27 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 30

Statistics. Structure 2004, 12, 1507-1518. 32.

Neidigh, J. W.; Fesinmeyer, R. M.; Andersen, N. H. Designing a 20-Residue Protein. Nat. Struct. Biol. 2002, 9, 425-430.

33.

Hess, B.; Bekker, H.; Berendsen, H. J. C.; Fraaije, J. G. E. M. Lincs: A Linear Constraint Solver for Molecular Simulations. J. Comput. Chem. 1997, 18, 1463-1472.

34.

Miyamoto, S.; Kollman, P. A. Settle - an Analytical Version of the Shake and Rattle Algorithm for Rigid Water Models. J. Comput. Chem. 1992, 13, 952-962.

35.

Bussi, G.; Donadio, D.; Parrinello, M. Canonical Sampling through Velocity Rescaling. J. Chem. Phys. 2007, 126, 014101-1-7.

36.

Parrinello, M.; Rahman, A. Crystal-Structure and Pair Potentials - a Molecular-Dynamics Study. Phys. Rev. Lett. 1980, 45, 1196-1199.

37.

Parrinello,

M.;

Rahman,

A.

Polymorphic

Transitions

in

Single-Crystals

-

a

New

Molecular-Dynamics Method. J. Appl. Phys. 1981, 52, 7182-7190. 38.

Essmann, U.; Perera, L.; Berkowitz, M. L.; Darden, T.; Lee, H.; Pedersen, L. G. A Smooth Particle Mesh Ewald Method. J. Chem. Phys. 1995, 103, 8577-8593.

39.

Abraham, M. J.; van der Spoel, D.; Lindahl, E.; Hess, B. The Gromacs Development Team, Gromacs User Manual Version 2018, http://www.gromacs.org/. (2018).

40.

Duan, Y.; Wu, C.; Chowdhury, S.; Lee, M. C.; Xiong, G. M.; Zhang, W.; Yang, R.; Cieplak, P.; Luo, R.; Lee, T.; Caldwell, J.; Wang, J. M.; Kollman, P. A Point-Charge Force Field for Molecular Mechanics Simulations of Proteins Based on Condensed-Phase Quantum Mechanical Calculations. J. Comput. Chem. 2003, 24, 1999-2012.

41.

Satoh, D.; Shimizu, K.; Nakamura, S.; Terada, T. Folding Free-Energy Landscape of a 10-Residue Mini-Protein, Chignolin. FEBS Lett. 2006, 580, 3422-3426.

42.

Harada, R.; Kitao, A. Exploring the Folding Free Energy Landscape of a Beta-Hairpin Miniprotein, Chignolin, Using Multiscale Free Energy Landscape Calculation Method. J. Phys. Chem. B 2011, 115, 8806-8812.

43.

Senne, M.; Trendelkamp-Schroer, B.; Mey, A. S. J. S.; Schutte, C.; Noe, F. Emma: A Software Package for Markov Model Building and Analysis. J. Chem. Theory Comput. 2012, 8, 2223-2238.

44.

Moritsugu, K.; Terada, T.; Kidera, A. Scalable Free Energy Calculation of Proteins Via Multiscale Essential Sampling. J. Chem. Phys. 2010, 133, 224105-1-6.

45.

Moritsugu, K.; Terada, T.; Kidera, A. Multiscale Enhanced Sampling Driven by Multiple Coarse-Grained Models. Chem. Phys. Lett. 2014, 616, 20-24.

46.

Moritsugu, K.; Terada, T.; Kidera, A. Multiscale Enhanced Sampling for Protein Systems: An Extension Via Adiabatic Separation. Chem. Phys. Lett. 2016, 661, 279-283.

47.

Mitsutake, A.; Takano, H. Relaxation Mode Analysis and Markov State Relaxation Mode Analysis for Chignolin in Aqueous Solution near a Transition Temperature. J. Chem. Phys. 2015, 143, 28 ACS Paragon Plus Environment

Page 29 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

124111-1-15. 48.

Marinelli, F.; Pietrucci, F.; Laio, A.; Piana, S. A Kinetic Model of Trp-Cage Folding from Multiple Biased Molecular Dynamics Simulations. PLoS Comput. Biol. 2009, 5, e1000452.

49.

Shao, Q.; Shi, J. Y.; Zhu, W. L. Enhanced Sampling Molecular Dynamics Simulation Captures Experimentally Suggested Intermediate and Unfolded States in the Folding Pathway of Trp-Cage Miniprotein. J. Chem. Phys. 2012, 137, 125103-1-10.

50.

Bunagan, M. R.; Yang, X.; Saven, J. G.; Gai, F. Ultrafast Folding of a Computationally Designed Trp-Cage Mutant: Trp(2)-Cage. J. Phys. Chem. B 2006, 110, 3759-3763.

51.

Juraszek, J.; Bolhuis, P. G. Sampling the Multiple Folding Mechanisms of Trp-Cage in Explicit Solvent. Proc. Natl. Acad. Sci. U. S. A. 2006, 103, 15859-15864.

52.

Halabis, A.; Zmudzinska, W.; Liwo, A.; Oldziej, S. Conformational Dynamics of the Trp-Cage Miniprotein at Its Folding Temperature. J. Phys. Chem. B 2012, 116, 6898-6907.

53.

Zhou, C. Y.; Jiang, F.; Wu, Y. D. Folding Thermodynamics and Mechanism of Five Trp-Cage Variants from Replica-Exchange Md Simulations with Rsff2 Force Field. J. Chem. Theory Comput. 2015, 11, 5473-5480.

54.

Miao, Y. L.; Feixas, F.; Eun, C. S.; McCammon, J. A. Accelerated Molecular Dynamics Simulations of Protein Folding. J. Comput. Chem. 2015, 36, 1536-1549.

55.

Harada, R.; Kitao, A. Multi-Scale Free Energy Landscape Calculation Method by Combination of Coarse-Grained and All-Atom Models (Vol 503, Pg 145, 2011). Chem. Phys. Lett. 2011, 516, 145-152.

56.

Harada, R.; Kitao, A. The Fast-Folding Mechanism of Villin Headpiece Subdomain Studied by Multiscale Distributed Computing. J. Chem. Theory Comput. 2012, 8, 290-299.

57.

Harada, R.; Nakamura, T.; Shigeta, Y. Sparsity-Weighted Outlier Flooding (Oflood) Mehtod: Efficent Rare Evnet Sampling Method Uing Sparsity of Distribution. J. Comput. Chem. 2016, 37, 724-738.

58.

Harada, R.; Shigeta, A. On-the-Fly Specifications of Reaction Coordinates in Parallel Cascade Selection Molecular Dynamics Accelerate Conformational Transitions of Proteins. J. Chem. Theory Comput. 2018, 14, 3332-3341.

29 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table of Content (TOC)

30 ACS Paragon Plus Environment

Page 30 of 30