Imaging Metastable States and Transitions in Proteins by Trajectory Map

Apr 20, 2017 - ABSTRACT: It has been a long-standing and intriguing issue to develop robust methods to identify metastable states and interstate trans...
2 downloads 10 Views 2MB Size
Subscriber access provided by ORTA DOGU TEKNIK UNIVERSITESI KUTUPHANESI

Article

Imaging Metastable States and Transitions in Proteins by Trajectory Map Chuanbiao Zhang, Jin Yu, and Xin Zhou J. Phys. Chem. B, Just Accepted Manuscript • Publication Date (Web): 20 Apr 2017 Downloaded from http://pubs.acs.org on April 21, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

The Journal of Physical Chemistry B is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Imaging Metastable States and Transitions in Proteins by Trajectory Map Chuanbiao Zhang,† Jin Yu,‡ and Xin Zhou∗,† †School of Physical Sciences, University of Chinese Academy of Sciences, Beijing 100049 ‡Beijing Computer Science Research Center, Beijing 100193 E-mail: [email protected]

1

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract It has been a long-standing and intriguing issue to develop robust methods to identify metastable states and interstate transitions from simulations or experimental data in order to understand the functional conformational changes of proteins. Most of the existing methods are usually hard to define the complicated boundaries of the states in the conformational space, and thus often lead to parameter-sensitive results. Here we present a new approach, named as visualized Trajectory Map (vTM), to identify the metastable states and the rare interstate transitions, by considering both the conformation similarity and the temporal-successiveness of conformations. Then the vTM is able to give a non-ambiguous description of slow dynamics. The case study of a β−hairpin peptide shows that the vTM can reveal the states and transitions from allatom MD trajectory data even when a single observable (i.e, one-dimensional reaction coordinate) is used. We also use the vTM to refine the folding/unfolding mechanism of HP35 in explicit water by analyzing a 125 µs all-atom MD trajectory and obtain folding/unfolding rates about 1/µs which are in good agreement with experiments.

2

ACS Paragon Plus Environment

Page 2 of 30

Page 3 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Introduction Molecular dynamics (MD) simulation is one of important tools in studying structures, dynamics and functions of proteins. Recently, all-atom MD simulations have been well developed to approach the experimental interesting timescales at least in small proteins 1 for one millisecond. 2–6 While more endeavors are required to further extend the capability of MD simulations to larger systems, the efficient and unbiased analysis of the extensive simulation data is also a key aspect to understand the slow dynamics of protein, i.e, the function-relevant conformational changes. So far, many methods have been developed to analyze the slow motions of protein from simulation data. Most of the earlier attempts are based on the static conformation similarity, such as dimension reduction, data clustering. 7–19 Some current approaches, such as the Markov state model based on root-mean-square displacement (RMSD-MSM), 20–27 and its improvement, using the time Independent Correlation Analysis

28,29

to build MSM

(tICA-MSM), focus on the kinetic transition rates among conformation groups, and thus incorporate more dynamic information in the analysis. However, the RMSD-MSM usually requires huge data to form a great number of well defined micro-states to well describe the complicate boundaries of metastable states in the all-atom conformational space, thus it is difficult to be applied to large systems. 30 While the tICA-MSM identifies metastable states in a low-dimensional slow-variable space and thus does not require too much data, it may mis-identify some conformations located in the boundary regions between metastable states in the low-dimensional space and thus lead to biases in the obtained results. Actually, there is a more natural way to assign conformations into metastable states in terms of dynamics. A metastable state should correspond to a conformational region in which a MD trajectory will stay for a long enough time for local equilibrating. Thus, a metastable state should be identified on the basis of such a MD trajectory segment rather than a set of temporally-uncorrelated individual conformations. This idea has been introduced in our previous works, named as trajectory mapping (TM). 31,32 In the TM, we map the 3

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

averaged conformation of each MD trajectory segment as a vector, and calculate the principle components (PCs) of the trajectory-mapped vectors by the principle component analysis (PCA). Since fast conformational fluctuations are suppressed after the segment averaging, the PCs mainly involve slow motions which correspond to slow variables. Then, we cluster the trajectory-mapped vectors in the slow-variable space, and treat the highly-concentrated vectors as metastable states. The conformations along the original MD trajectories are assigned into these metastable states, and the kinetic transition network are consequently constructed. However, in complicate systems, due to possible hierarchical structure of metastable states, it is a bit complicated to implement the TM 31,32 and is usually not easy to evaluate the dependence of the results on controlling parameters. In this paper, we improve the TM in order to identify the metastable states and transitions in an easier and more robust way. The new approach is named as visualized Trajectory Map (vTM) which can directly visualize the obtained slow dynamics and provide a transparent way to evaluate the dependence of the results on controlling parameters. The first step of the vTM is the same as that of the TM, which constructs slow variables by calculating the PCs of trajectory-mapped vectors. After that, instead of hierarchically grouping trajectory segments into states by complicated clustering algorithms in the TM, the vTM projects each original MD trajectory in the slow-variable space, and smooths the trajectory by shorttime-window averaging to further filter the fast motions. Then the vTM calculates the time-ordered similarity matrix to directly visualize metastable states and transitions along the MD trajectory. We illustrate the application of the vTM in two systems, one is a short peptide in a vacuum, and the other is villin headpiece (HP35) in water. In both cases, the obtained results by the vTM are very robust (i.e., insensitive to controlling parameters). Especially, in the short peptide system, even using a single variable, i.e, the end-to-end distance trajectory which is the observable in single molecular fluorescence experiments, 1,33 the vTM can discover all the metastable states and transitions in all-atom conformational space. As a comparison,

4

ACS Paragon Plus Environment

Page 4 of 30

Page 5 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

the tICA-MSM identifies some artificial fast transitions and get a biased estimate in transition rates, due to the mis-identification of some individual conformations. We also apply the vTM to analyse a 125 µs all-atom HP35 MD trajectory which is generated by D. E. Shaw Research 34 at temperature T = 360 K, and our results verify the two-state folding/unfolding mechanism. .

Theory and Methods The code of the vTM can be freely downloaded from our homepage: http://tcb.ucas.ac.cn. Below we give a brief description of its implementation. (1) Construct slow variables from MD trajectories. This step is the same as that in the TM, 32 and is also similar to that in the tICA. 28 The vTM constructs a few slow variables, denoted as {Bα (q)}, α = 1, · · · , m0 < m, by linearly combining a great number of preselect collective variables, denoted as basis functions {A˜µ (q)}, µ = 1, · · · , m. The slow variables are the principle components (PCs) of trajectory-averaged vectors in the basis function space which are given by the PCA in the TM, 32 or are the main eigenfunctions by diagonalizing the time-correlation matrix of these basis functions in the tICA. 28 The two methods were proved to be similar to each other, and the variance-covariance matrix in PCA is a kind of time average of the time-correlation matrix. 32 Here we follow the procedure in the TM, (i) transforming MD trajectories q(t) into the basis function space, and cutting these trajectories into segments with length τ ; (ii) calculating the average conformation of each trajectory segment in basis function space; (iii) obtaining PCs of the trajectory-averaged conformations as slow variables which are linear combinations of the basis functions. Then we can describe the MD trajectory in the slow-variable space, {Bα (t)}, which already filters out most fast motions. Here τ is a free parameter, which represents the timescale of the slow dynamics under consideration. The obtained {Bα (q)} describes the slow dynamics at the τ (or larger)

5

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 30

timescales. Usually, we first set τ large enough, e.g, a fraction of a single MD trajectory , to get a simpler picture of slow dynamics at larger timescales. Then, the more detailed dynamics at smaller timescales can be obtained using smaller τ . We can use any conformational functions, e.g., the Cartesian coordinates of all atoms, torsion angles, pair distances of non-neighboring atoms along protein chain, as the basis functions. Usually, collective variables which are more relevant to large (slow) conformational motions are more efficient as basis functions, while the local-motion-relevant variables, such as bond lengths, bond angles, are less useful. After being orthonormalized, these collective variables are applied as basis functions, A˜µ (q), which satisfy the condition hA˜µ (q)A˜ν (q)i = δµν . Here h. . .i means the average on the whole set of simulation conformations. More details about the construction of slow variables by TM can be found in the references. 32 (2) Calculate the time-ordered similarity matrix from the slow variables. Due to the limited number of basis functions applied in the vTM, although the PCs (B1 , B2 , . . .) mainly correspond to slow motions, they may still involve more or less fast modes, which may hinder the exact identification of slow transition events. We smooth slow-variable trajectories by R ¯α (t) = 1 t+δ/2 Bα (q(t′ ))dt′ . Here δ is the length of time a short-time-window averaging. B δ t−δ/2 window in the smoothing. This smoothing efficiently filters out the motions faster than δ. Usually, we set δ two to three orders of magnitudes smaller than τ , almost not affecting the identification of metastable states at τ timescale. In the rest of this paper, we will not ¯ distinguish between B(t) and B(t) unless explicitly noted, and simply denote the smoothed trajectory as B(t). It is possible to directly identify transitions from the slow-variable trajectory. While the number of slow variables may not be small, we construct the time-ordered similarity matrix, ˆ 1 ) · B(t ˆ 2 ), C(t1 , t2 ) = B(t

(1)

to provide a intuitive measurement of the slow-dynamical transitions and metastable states.

6

ACS Paragon Plus Environment

Page 7 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

ˆ is the unit vector. We added a constant component B 0 (q) ≡ 1 in the definition of Here B(t) B(t) to ensure that the length of the vector will not approach to zero. By this way, C(t1 , t2 ) is limited in [−1, 1] and the large values (close to 1) mean the (δ-time-window averaged) conformations at t1 and t2 are very similar in the slow variable space. (3) Divide trajectories into long transition-free segments and group similar segments as states. From the time-ordered similarity matrix C(t1 , t2 ), we can divide each MD trajectory into some segments along time. Conformations inside the same time segment are very similar to each other, but largely different from those in its time-neighbouring segments. Each time segment corresponds to a state, and interstate transitions happen at the time points dividing these segments. For multiple MD trajectories, we repeat the above procedures for each trajectory to get the state-corresponding segments. These segments (i.e., their averaged conformations) are further compared by normal static clustering algorithms to determine whether some of them belong to the same metastable states. The transition rates among the metastable states can be estimated subsequently by counting the transition events along these MD trajectories. The fast inter-converted states might be further grouped to provide an approximate but simpler description of slow dynamics, especially when many small metastable states are identified. In the vTM, the time successiveness of conformations along trajectories are important in identifying states, it significantly improves the robustness of analysis. The visualized similarity matrix C(t1 , t2 ) provides a clear way to check the reliability of found states and transitions. At the final step, i.e., clustering the segment-averaged conformations, although the time information can not be used anymore, the average of multiple conformations inside a segment effectively suppresses the intrastate fluctuations and emphasizes the interstate differences, and thus efficiently distinguish different states even when the intrastate fluctuations are comparable with interstate differences. Therefore, as a trajectory-based method, the vTM is more promising than the conformation-based methods, such as the tICA, to be applied to more complicated systems.

7

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

In simple cases, such as the peptide and short proteins as shown below, the short-timeneighboring smoothing is sufficient for identification of transitions. We can even simply cluster the image points of the time-ordered similarity matrix to obtain metastable states by normal clustering algorithms, such as PCCA+. 35 It means that first average conformations within a δ-length time interval, and then cluster the averaged conformations to obtain metastable states. Thus, if δ is set to zero (there is only a single conformation in each time window), the method will be reduced to the tICA. 28

Results Short Peptide In this section, we study a short peptide, Ace-(Gly)2 -Pro-(Gly)3 -Nme, as shown in Figure 1, to illustrate the application of the vTM. The peptide was widely studied in previous works, 36,37 where the radius of gyration of all heavy atoms in the peptide (Rg ) was chosen as a good collective variable to describe the kinetics at large timescales. The free energy profile along Rg showed a “double-well" structure, corresponding to a globular and a β−hairpin conformation, respectively. Here we simulate the system and apply the vTM to systematically construct the kinetic transition network without a priori assuming any slow variables. Simulation details are shown in Supporting Information (SI). We choose the dihedral angles in the peptide backbone, φi and ψi with i = 1, · · · , 6, as the basis functions to describe the large-scale motions. Due to the periodicity of dihedral angles, we transform the angles into their cosine and sine functions, to get 24 basis functions in total . 38 The free parameter τ is set as 500 ns, half of a 1 µs MD trajectory, and a 10 ns time slip is applied to get 51 (partially time-overlapped) trajectory pieces with each 500 ns. PCs of the mapped vectors of the trajectory pieces give the slow variables. Here δ is set as 1 ns to smooth the obtained slow-variable trajectory, and to get the time-ordered similarity matrix. As shown in Figure 2(A), the time evolution of the dihedral angles indicates that a 8

ACS Paragon Plus Environment

Page 8 of 30

Page 9 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

few transitions occur during the 1 µs interval. We find two PCs as slow variables in Figure 2(B), denoted as B1 and B2 . The fast motions in the dihedral angles are effectively filtered out in B1 and B2 , thus the metastable states and transitions during the 1 µs interval can be easily identified. In Figure 3(A), we show the time-ordered similarity matrix of the MD trajectory. The matrix is obviously divided into some blocks. Between t = 0 and about 180 ns, the element of similarity matrix is almost equal to unit, which indicates that all the conformations are almost identical in the B space and thus belonging to the same metastable state, denoted as C. Before and after t = 180 ns, the conformational similarity shows an obvious change, then reaches almost unity until t = 250 ns. It implies that the transition occurs at about 180 ns, and the trajectory enters another state denoted as H and stays there until t = 250 ns. Then the system jumps back to state C and stays there until t = 420 ns, and jumps to a new state I at t = 480 ns, then goes back to H. To find more states and collect more transition events, we perform much longer simulations of 20 µs in total, and find that the trajectories always stay inside one of these three states and interstate transitions occur hundreds of times. Thus, we can well estimate the lifetime of each state, the transition rates and populations of these states. The equilibrium free energy landscape in the (B1 , B2 ) space is also obtained, shown in Figure 2(D). By characterizing the typical conformations within these states, we find that the state C looks like a coil structure with large intrastate fluctuation, H is well-defined β−hairpin structure, while I is an intermediate between the C and H. The main folding path is C −→ I −→ H, with transition rates about a few times per µs. The direct transitions from C to H and the reverse are rarer than that along the main pathway, about 2/µs and 1/µs, respectively (Figure 4). We give the time evolution of three native hydrogen bonds, shown in Figure 2(C). The result indicates that the transitions among the three states correspond to the change of native hydrogen bonds. All three native hydrogen bonds completely form in H, but some of them break while transiting to I (the hydrogen bond between the two chain ends breaks and the middle hydrogen bond partially breaks ). While the middle hydrogen bond completely

9

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

breaks, the peptide enters in the coil state C. From the simulation, we found that H is the most stable state with the life time about 150 ns, while the lifetimes of I and C states are about 90 ns and 80 ns, respectively. It is worthy to note that, different from the folded state H and the intermediate state I, the unfolded state C does not have a specific structure, i.e, it is a random coil. Actually, from the time-ordered similarity matrix, we found that conformations belonging to C state include a detailed inner-state structures of timescale shorter than τ = 500 ns. The result is confirmed in the following analysis with smaller τ . Robustness of vTM We choose different basis functions, control parameter τ and the smoothing parameter δ to test the robustness of the vTM. As shown in Figure 3, we use the dihedral angles of backbone, the Cartesian coordinates of main chain atoms (after alignment to remove the global translation and rotation of peptide), the 15 pair distances of the six Cα atoms, as basis functions in the vTM, respectively. τ is set as 10 ns. It is clearly shown that the time-ordered similarity matrix based on different basis functions and τ are very similar to each other. The detected transition events among the C, I and H states are also identical in each other. In addition, the inner substates inside the coil state C are also clarified by applying smaller τ . As shown in Figure S1, we use different τ and δ to calculate the time-ordered similarity matrix. All results are well consistent to each other. The short-timewindow average smoothing gives clearer identification of slow transitions than that without smoothing, but the obtained results do not change while δ is much smaller than τ . It is worthy to mention that the consistence of similarity matrix of various parameters provides a much more transparent criterion to evaluate the obtained slow dynamics, in comparison with those methods based on clustering individual conformations, where the dependence of clustering results on parameters usually leads to ambiguous slow-dynamics picture.

10

ACS Paragon Plus Environment

Page 10 of 30

Page 11 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

vTM on experiment-like data In Figure 3, we have already shown that the vTM can give highly consistent results even for highly coarse-grained data. Here we further study a more coarse-grained case by considering only the end-to-end distance d(t) (Figure 1), and the data is from the 1µs all-atom MD trajectory. Such data can mimic the single molecular fluorescence energy transformation (smFRET) experiments. The d(t) and its distribution P (d) are shown in Figure 5(A) and (B), revealing only two well-separated states. Obviously, the states C and I are largely overlapped, thus they can not be well distinguished by only the end-to-end distance. For the one-dimensional data, we can use multiple functions of d as basis functions. For example, we can use the bin functions of d, Θµ (d), which equals to 1 if d is within the µ-th bin or zero otherwise. µ = 1, · · · , m, correspond to the different range of d value. We uniformly divide the whole d range into m = 40 bins, thus take the 40 bin functions as basis functions. As shown in Figure 3(E), the vTM based on the one-dimensional d trajectory gives almost the same time-ordered similarity matrix as that given by high-dimensional trajectory. The highly overlapped states C and I and the interstate transitions are well identified, thus we correctly recover the kinetics of the all-atom system from the one-dimensional data. As shown in Figure 5(C), the obtained slow variable (reaction coordinate, RC) is helpful to distinguish the state C and I , but small overlapping still exists. The RC is a nonlinear transformation of d, as shown in Figure 5(D). It transforms the d range (0.4, 0.65 nm) to RC about −0.2, the range 0.65 < d < 1.0 nm to RC in [−0.5, 1.5], thus separates most conformations of the state C and I (both locate inside the range 0.4 < d < 1 nm, but mainly on (0.4, 0.65nm) and (0.65, 1nm), respectively). The short-time smoothing of the RC trajectory further removes the remaining overlapping, thus the vTM correctly assigns conformations into states based on the average RC of time-neighboring conformations. As shown in SI, the tICA-MSM can also construct RC and finds the three states based on the one-dimensional data d(t), but it directly assigns each individual conformation without averaging on its time neighbourhood, thus it identifies all conformations with RC larger than 11

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

about zero in the state H, about −1 in the state I, and between the two values in the state C. Due to the overlapping of states even in the RC, the tICA-MSM mis-identifies some conformations close to the boundaries, and thus brings some artificial fast transitions among states.

Villin Headpiece (HP35) We also apply the vTM to analyse the C-terminal fragment of villin headpiece (HP35) with double norleucine mutant (Nle/Nle) in water, denoted as HP35-NLE. 39 Because of its small size and fast folding, HP35 and its double norleucine mutant have been extensively studied by experiments 39–42 and molecule dynamics simulations. 6,34,43–49 Recently, Lindorff-Larsen et al. 34 performed a 125 µs equilibrium MD simulation of HP35-NLE at 360 K. They applied the CHARMM22* force field 47 and the modified TIP3P water model compatible with the CHARMM force field. 50,51 The generated MD trajectory involves about 625000 snapshots with a time interval of 200 ps. 34 We downloaded this trajectory data from DE Shaw Research. Metastable States and Transitions We take all of 33 pairs dihedral angles (their trigonometric functions) of the peptide backbone as basis functions. τ = 20 µs. The time-ordered similarity matrix is shown in Figure 6(A) which divides whole the trajectory into segments. The similarity between many different segments indicates that the trajectory repeatedly visits only a few metastable states. We use the PCCA+ algorithm 35 to cluster the similar segments together. The obtained results are shown in Figure 6(B). The rearranged similarity matrix is shown in the inset. Three metastable states, the unfolded state (U), the folded state (F) and the third state named as T, are found, corresponding to the two obvious nonzero eigenvalues of the similarity matrix. The time evolution of the first two slow variables B1 and B2 are shown in Figure 6(C), and the three metastable states can be found along the trajectories of the two slow variables. As a comparison, we also show here the evolution of ψ of LEU61 and the rooted mean square 12

ACS Paragon Plus Environment

Page 12 of 30

Page 13 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

deviation (RMSD) from the native structure, which are found to be highly correlated to B1 and B2 , respectively. In Figure 6(D), we show the free energy landscape (in units of kB T ) in the (B1 , B2 ) space, which has three wells corresponding to the three states, U , F and T . The states F and U slightly overlap in the slow-variable space, but the short-time-neighboring averaged conformations are well separated in the space. Slow variables and folding mechanism The slow variables obtained by the vTM are linear combinations of the preset basis functions, here the dihedral angles in the peptide backbone. We show the coefficients of linear combinations in Figure 7(A). We find the B1 mainly come from the ψ of the residual LEU61 which connects the helix II to helix III along the chain. While the ψ of LEU61 is positive, the protein could fold to the native structure by forming the hydrophobic core. 52 If the LEU61 ψ is negative, the residual may become an additional part of the helix III, then the protein enters into the T state, cannot directly fold to the native structure anymore (Figure 7(B)). This mechanism is consistent with previous findings in literatures, 34,53 where the proline 62 (next to LEU61) was found to form its native structure at the beginning of folding. As shown in Figure 7(C), the average probability of residues being in native is given as the normalized time along transition paths, which provides the folding order of residues. We also found that the B2 mainly come from the contribution of dihedral angles of MET53, LEU61 and HIP68, as shown in Figure 7(A). Since MET53 and LEU61 locate at the connecting regions between helix I and II, helix II and III, respectively, it is easy to understand that they contribute significantly to slow variables. It is surprising, however, that the HIP68 which locates in the middle of the helix III also makes great contributes to the slow motion. It indicates that the helix III is divided into two parts by HIP68 during folding/unfolding. To clarify this, we calculated the helical propensity of each residue in different metastable states. As shown in Figure 7(B), the helix III is indeed divided into two parts, the helix III-1 (TRP64-NLE-GLN-GLN67) almost keeps the helical structure even in

13

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

the unfolded state, while the helix III-2 (LEU69-NLE-LYS-GLU-LYS73) is usually unfolded (except in the native state). The result is also verified by the formation probability of native structure along the normalized folding time (Figure 7(C)). The helix III-1 achieves the native structure from the beginning of the folding, whereas the helix III-2 does not until the end of the unfolding. 34,53 Some evidences can be found in existing literatures 48,49 which regard the partial unraveling of the C terminal helix as near-native state, and in the reference 54 the four kinds of residues (TRP, NLE, GLN, and GLN) in the helix III-1 were found statistically to have larger probability to form helical structure than the five kinds of residues (LEU, NLE, LYS, GLU, and LYS) in the helix III-2. Since there is only one folding pathway, T⇀ ↽ U ⇀ ↽ F , it is possible to define a onedimensional reaction coordinate to describe the folding/unfolding process. This reaction coordinate can be a curve in the (B1 , B2 ) space through the centres of the three states. While these states almost do not overlap with each other along the curve, the free energy profile (FEP) along the curve will correctly describe the kinetics of the folding/unfolding process. For simplicity, we try to use a straight line, B1 + αB2 , as an approximate reaction coordinate. We optimize the direction of the straight line (the coefficient α) to minimize the overlap of these states along the line, and find B1 − 0.4B2 is the best one. As shown in Figure7(D), the overlaps among the three states along the optimalized reaction coordinate are very small, thus the reaction coordinate gives a proper and simple picture to describe the kinetics of the system. We find that the folding time of HP35 at T = 360 K is in good agreement with the experimental value, about 1 µs, 39 and the folding free energy barrier is about 1.07 kcal/mol, also consistent with the value of 1 kcal/mol estimated from calorimetry experiments. 39,55 It is worthy to note that the state T can be negligible, since it only connects to the unfolded state U and slightly changes its population while insignificantly affecting the folding/unfolding process. Thus the two-state model can be applied to understand the folding/unfolding of HP35. The result is consistent with that in the reference, 56 but very differ-

14

ACS Paragon Plus Environment

Page 14 of 30

Page 15 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

ent from the picture provided by the reference 34 based on an empirical reaction coordinate, where the state T was projected between the state F and U and then was mis-interpreted as the intermediate state of folding/unfolding. From the 125 µs MD trajectory, we do not find any U − F transitions passing through the state T , and all five trajectory segments corresponding to the T state are connected with segments of U . No direct transitions between T and F are found. According to the above results, we could divide HP35 into four parts, helix I (residues 45-51), helix II (residues 56-60), helix III-1 (residues 64-67), and helix III-2 (residues 69-73), and the three connecting regions between them. The helix III-1 and helix I are the most stable parts during the simulation, while helix II, helix III-2 and the connecting regions can change flexibly during the folding/unfolding process. The result should be helpful in constructing coarse-grained models of the protein. As shown in Figure 8, we provide a two-state folding/unfolding kinetic description of the protein (T is negligible), include the characteristic conformations, lifetimes (in µs), populations (in %) of these states and transition rates (in 1/µs) between them.

Discussions The visualised trajectory map (vTM) method provides a promising way to robustly analyse slow motions from simulation or experimental time series. The key point of the vTM is that it takes advantage of the temporal successiveness of conformations to identify metastable states and interstate transitions, rather than only considering the geometric similarity between conformations. Two techniques guarantee its applicability to complicated systems. The first is that the vTM constructs slow variables automatically from data without a priori experience and projects the original data into the low-dimensional space. While the low-dimensional projection much simplifies the description of conformational similarity (especially for large systems), it also brings trouble to the identification of the boundaries between metastable

15

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

states. This problem can be solved by the second technique, which analyses the averaged conformations within each trajectory segment rather than each individual conformation. Thus, the vTM is expected to be more powerful than current methods based on individual conformations, to get slow dynamics for large protein systems. Nevertheless, there are also some limitations in the application of the vTM to large protein systems. The first difficulty is the generation of sufficient time series data. The selection of proper basis functions in these complicated systems is also a nontrivial task. Coarse-graining simulations, or ensemble dynamics simulations which generate many short MD trajectories, or experimental time series, could be applied by the vTM to investigate the dynamics of the system from large timescales to smaller ones. However, the number of required basis functions in the vTM usually increases as the system get larger or more complicated, and the identification of metastable states and interstate transitions become more difficult as the number of states and/or slow modes of motions get increased. In addition, not all slow motions occur like transitions, where the system stays inside a conformational regions a long time to reach local equilibrium before suddenly and fast jumping to another region. Some systems might slowly diffuse inside a very large conformational region without the waiting and transition processes. In these cases, the vTM still provides slow variables to describe the slow dynamics, but the time-ordered similarity matrix may not show a clear state-and-transition structure for constructing the kinetic transition network.

Supporting Information Available Simulation details of the short peptide; Using different τ and δ to calculate the similarity matrix(Figure S1); Comparison of the vTM and tICA-MSM in identifying metastable states based on the end-to-end distance d of short peptide(Figure S2). This material is available free of charge via the Internet at http://pubs.acs.org/.

16

ACS Paragon Plus Environment

Page 16 of 30

Page 17 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Acknowledgement The work is supported by NSFC under Grant No.11574310.

References (1) Schuler, B.; Eaton, W. A. Protein Folding Studied by Single-Molecule FRET. Curr. Opin. Struct. Biol. 2008, 18, 16–26. (2) Shaw, D. E.; Maragakis, P.; Lindorff-Larsen, K.; Piana, S.; Dror, R. O.; Eastwood, M. P.; Bank, J. A.; Jumper, J. M.; Salmon, J. K.; Shan, Y. Atomic-Level Characterization of the Structural Dynamics of Proteins. Science 2010, 330, 341–346. (3) Voelz, V. A.; Bowman, G. R.; Beauchamp, K.; Pande, V. S. Molecular Simulation of Ab Initio Protein Folding for a Millisecond Folder NTL9 (1- 39). J. Am. Chem. Soc. 2010, 132, 1526–1528. (4) Noé, F.; Schütte, C.; Vanden-Eijnden, E.; Reich, L.; Weikl, T. R. Constructing the Equilibrium Ensemble of Folding Pathways from Short Off-Equilibrium Simulations. Proc. Natl. Acad. Sci. U.S.A. 2009, 106, 19011–19016. (5) Best, R. B.; Hummer, G.; Eaton, W. A. Native Contacts Determine Protein Folding Mechanisms in Atomistic Simulations. Proc. Natl. Acad. Sci. U.S.A. 2013, 110, 17874– 17879. (6) Piana, S.; Lindorff-Larsen, K.; Shaw, D. E. Protein Folding Kinetics and Thermodynamics from Atomistic Simulation. Proc. Natl. Acad. Sci. U.S.A. 2012, 109, 17845– 17850. (7) Shea, J.-E.; Brooks III, C. L. From Folding Theories to Folding Proteins: A Review and Assessment of Simulation Studies of Protein Folding and Unfolding. Annu. Rev. Phys. Chem. 2001, 52, 499–535. 17

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(8) Mu, Y.; Nguyen, P. H.; Stock, G. Energy Landscape of a Small Peptide Revealed by Dihedral Angle Principal Component Analysis. Proteins 2005, 58, 45–52. (9) Sims, G. E.; Choi, I.-G.; Kim, S.-H. Protein Conformational Space in Higher Order φ-ψ Maps. Proc. Natl. Acad. Sci. U.S.A. 2005, 102, 618–621. (10) Rao, F.; Karplus, M. Protein Dynamics Investigated by Inherent Structure Analysis. Proc. Natl. Acad. Sci. U.S.A. 2010, 107, 9152–9157. (11) Das, P.; Moll, M.; Stamati, H.; Kavraki, L. E.; Clementi, C. Low-Dimensional, FreeEnergy Landscapes of Protein-Folding Reactions by Nonlinear Dimensionality Reduction. Proc. Natl. Acad. Sci. U.S.A. 2006, 103, 9885–9890. (12) Coifman, R. R.; Lafon, S. Diffusion Maps. Appl. Comput. Harmon. Anal. 2006, 21, 5–30. (13) Nadler, B.; Lafon, S.; Coifman, R. R.; Kevrekidis, I. G. Diffusion Maps, Spectral Clustering and Reaction Coordinates of Dynamical Systems. Appl. Comput. Harmon. Anal. 2006, 21, 113–127. (14) Krivov, S. V.; Karplus, M. Hidden Complexity of Free Energy Surfaces for Peptide (Protein) Folding. Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 14766–14770. (15) Maisuradze, G. G.; Liwo, A.; Scheraga, H. A. How Adequate are One-and TwoDimensional Free Energy Landscapes for Protein Folding Dynamics? Phys. Rev. Lett. 2009, 102, 238102. (16) Torda, A. E.; van Gunsteren, W. F. Algorithms for Clustering Molecular Dynamics Configurations. J. Comput. Chem. 1994, 15, 1331–1340. (17) Shao, J.; Tanner, S. W.; Thompson, N.; Cheatham, T. E. Clustering Molecular Dynamics Trajectories: 1. Characterizing the Performance of Different Clustering Algorithms. J. Chem. Theory. Comput. 2007, 3, 2312–2334. 18

ACS Paragon Plus Environment

Page 18 of 30

Page 19 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

(18) Deuflhard, P.; Huisinga, W.; Fischer, A.; Schütte, C. Identification of Almost Invariant Aggregates in Reversible Nearly Uncoupled Markov Chains. Linear Algebra Appl. 2000, 315, 39–59. (19) Weber, M. Improved Perron Cluster Analysis; ZIB Berlin, Germany, 2003. (20) Gfeller, D.; De Los Rios, P.; Caflisch, A.; Rao, F. Complex Network Analysis of FreeEnergy Landscapes. Proc. Natl. Acad. Sci. U.S.A. 2007, 104, 1817–1822. (21) Noé, F.; Horenko, I.; Schütte, C.; Smith, J. C. Hierarchical Analysis of Conformational Dynamics in Biomolecules: Transition Networks of Metastable States. J. Chem. Phys. 2007, 126, 155102. (22) Chodera, J. D.; Singhal, N.; Pande, V. S.; Dill, K. A.; Swope, W. C. Automatic Discovery of Metastable States for the Construction of Markov Models of Macromolecular Conformational Dynamics. J. Chem. Phys. 2007, 126, 155101. (23) Bowman, G. R.; Pande, V. S. Protein Folded States are Kinetic Hubs. Proc. Natl. Acad. Sci. U.S.A. 2010, 107, 10890–10895. (24) Bowman, G. R.; Meng, L.; Huang, X. Quantitative Comparison of Alternative Methods for Coarse-Graining Biological Networks. J. Chem. Phys. 2013, 139, 121905. (25) Weber, J. K.; Jack, R. L.; Pande, V. S. Emergence of Glass-Like Behavior in Markov State Models of Protein Folding Dynamics. J. Am. Chem. Soc. 2013, 135, 5501–5504. (26) Pande, V. S.; Beauchamp, K.; Bowman, G. R. Everything You Wanted to Know about Markov State Models but were Afraid to Ask. Methods 2010, 52, 99–105. (27) Deng, N.-j.; Dai, W.; Levy, R. M. How kinetics Within The Unfolded State Affects Protein Folding: An Analysis Based on Markov State Models and an Ultra-Long MD Trajectory. J. Phys. Chem. B 2013, 117, 12787–12799.

19

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(28) Naritomi, Y.; Fuchigami, S. Slow Dynamics in Protein Fluctuations Revealed by TimeStructure Based Independent Component Analysis: The Case of Domain Motions. J. Chem. Phys. 2011, 134, 065101. (29) NuÌĹske, F.; Keller, B. G.; PeÌĄrez-HernaÌĄndez, G.; Mey, A. S.; NoeÌĄ, F. Variational Approach to Molecular Kinetics. J. Chem. Theory. Comput. 2014, 10, 1739–1752. (30) Schwantes, C. R.; Pande, V. S. Improvements in Markov State Model Construction Reveal Many Non-Native Interactions in the Folding of NTL9. J. Chem. Theory. Comput. 2013, 9, 2000–2009. (31) Gong, L.; Zhou, X. Kinetic Transition Network Based on Trajectory Mapping. J. Phys. Chem. B 2010, 114, 10266–10276. (32) Gong, L.; Zhou, X.; Ouyang, Z. Systematically Constructing Kinetic Transition Network in Polypeptide from Top to Down: Trajectory Mapping. PloS one 2015, 10, e0125932. (33) Bartlett, A. I.; Radford, S. E. An Expanding Arsenal of Experimental Methods Yields an Explosion of Insights into Protein Folding Mechanisms. Nat. Struct. Mol. Biol. 2009, 16, 582–588. (34) Lindorff-Larsen, K.; Piana, S.; Dror, R. O.; Shaw, D. E. How Fast-Folding Proteins Fold. Science 2011, 334, 517–520. (35) Deuflhard, P.; Weber, M. Robust Perron Cluster Analysis in Conformation Dynamics. Linear Algebra Appl. 2005, 398, 161–184. (36) Babin, V.; Roland, C.; Darden, T. A.; Sagui, C. The Free Energy Landscape of Small Peptides as Obtained from Metadynamics with Umbrella Sampling Corrections. J. Chem. Phys. 2006, 125, 204909.

20

ACS Paragon Plus Environment

Page 20 of 30

Page 21 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

(37) Babin, V.; Roland, C.; Sagui, C. Adaptively Biased Molecular Dynamics for Free Energy Calculations. J. Chem. Phys. 2008, 128, 134101. (38) Altis, A.; Otten, M.; Nguyen, P. H.; Hegger, R.; Stock, G. Construction of the Free Energy Landscape of Biomolecules via Dihedral Angle Principal Component Analysis. J. Chem. Phys. 2008, 128, 245102. (39) Kubelka, J.; Chiu, T. K.; Davies, D. R.; Eaton, W. A.; Hofrichter, J. Sub-Microsecond Protein Folding. J. Mol. Biol. 2006, 359, 546–553. (40) Reiner, A.; Henklein, P.; Kiefhaber, T. An Unlocking/Relocking Barrier in Conformational Fluctuations of Villin Headpiece Subdomain. Proc. Natl. Acad. Sci. U.S.A. 2010, 107, 4955–4960. (41) Kubelka, J.; Eaton, W. A.; Hofrichter, J. Experimental Tests of Villin Subdomain Folding Simulations. J. Mol. Biol. 2003, 329, 625–630. (42) Kubelka, J.; Henry, E. R.; Cellmer, T.; Hofrichter, J.; Eaton, W. A. Chemical, Physical, and Theoretical Kinetics of an Ultrafast Folding Protein. Proc. Natl. Acad. Sci. U.S.A. 2008, 105, 18655–18662. (43) Zagrovic, B.; Snow, C. D.; Shirts, M. R.; Pande, V. S. Simulation of Folding of a Small Alpha-Helical Protein in Atomistic Detail Using Worldwide-Distributed Computing. J. Mol. Biol. 2002, 323, 927–937. (44) Freddolino, P. L.; Schulten, K. Common Structural Transitions in Explicit-Solvent Simulations of Villin Headpiece Folding. Biophys. J. 2009, 97, 2338–2347. (45) Yang, J. S.; Wallin, S.; Shakhnovich, E. I. Universality and Diversity of Folding Mechanics for Three-Helix Bundle Proteins. Proc. Natl. Acad. Sci. U.S.A. 2008, 105, 895–900.

21

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(46) Lei, H.; Deng, X.; Wang, Z.; Duan, Y. The Fast-Folding HP35 Double Mutant has a Substantially Reduced Primary Folding Free Energy Barrier. J. Chem. Phys. 2008, 129, 155104. (47) Piana, S.; Lindorff-Larsen, K.; Shaw, D. E. How Robust are Protein Folding Simulations with Respect to Force Field Parameterization? Biophys. J. 2011, 100, L47–L49. (48) Jain, A.; Stock, G. Hierarchical Folding Free Energy Landscape of HP35 Revealed by Most Probable Path Clustering. J. Phys. Chem. B 2014, 118, 7750–7760. (49) Beauchamp, K. A.; McGibbon, R.; Lin, Y.-S.; Pande, V. S. Simple Few-State Models Reveal Hidden Complexity in Protein Folding. Proc. Natl. Acad. Sci. U.S.A. 2012, 109, 17807–17813. (50) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. Comparison of Simple Potential Functions for Simulating Liquid Water. J. Chem. Phys. 1983, 79, 926–935. (51) MacKerell, A. D.; Bashford, D.; Bellott, M.; Dunbrack, R.; Evanseck, J.; Field, M. J.; Fischer, S.; Gao, J.; Guo, All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins. J. Phys. Chem. B 1998, 102, 3586–3616. (52) Ensign, D. L.; Kasson, P. M.; Pande, V. S. Heterogeneity Even at the Speed Limit of Folding: Large-Scale Molecular Dynamics Study of a Fast-Folding Variant of the Villin Headpiece. J. Mol. Biol. 2007, 374, 806–816. (53) Henry, E. R.; Best, R. B.; Eaton, W. A. Comparing a Simple Theoretical Model for Protein Folding with All-Atom Molecular Dynamics Simulations. Proc. Natl. Acad. Sci. U.S.A. 2013, 110, 17880–17885. (54) Pace, C. N.; Scholtz, J. M. A Helix Propensity Scale Based on Experimental Studies of Peptides and Proteins. Biophys. J. 1998, 75, 422–427. 22

ACS Paragon Plus Environment

Page 22 of 30

Page 23 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

(55) Godoy-Ruiz, R.; Henry, E. R.; Kubelka, J.; Hofrichter, J.; Muñoz, V.; SanchezRuiz, J. M.; Eaton, W. A. Estimating Free-Energy Barrier Heights for an Ultrafast Folding Protein from Calorimetric and Kinetic Data. J. Phys. Chem. B 2008, 112, 5938–5949. (56) Banushkina, P. V.; Krivov, S. V. High-Resolution Free-Energy Landscape Analysis of α-Helical Protein Folding: HP35 and Its Double Mutant. J. Chem. Theory. Comput. 2013, 9, 5257–5266.

ACE1

NME8 d

Figure 1: The atomic structure of short peptide.

23

ACS Paragon Plus Environment

(B)

B1

(A)

0

0

(C)

Page 24 of 30

H bonds

3 2 1 0

200

400 600 Time (ns)

800

1000

2

10

(D) 8

1

C 6

B2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

B2

The Journal of Physical Chemistry

0

H

4

1

2

I 2 1

0

1

0

B1

Figure 2: (A) The time evolution of torsion angles in backbone during a 1 µs simulation. (B) The time evolution of obtained slow variables B1 and B2 with/without the time-window smoothing (black/red, respectively). (C) The number of native hydrogen bonds (H-bond) is shown as the time evolves. The H-bond is thought to be formed if the distance between the donor and acceptor atoms is within 0.35 nm and the angle of donor-hydrogen-acceptor is less than 30◦ . (D) The free energy landscape in the slow-variable space. B1 and B2 are slow variables, C, I, and H are the found metastable states.

24

ACS Paragon Plus Environment

Page 25 of 30

0

1

(B)

0.5 600 800

(D)

0

200

400 600 Time (ns)

800

1000

0

400 0.5 600

1

1000

(E)

400 0.5 600 800

0

200

400 600 Time (ns)

800

1000

0

0

1

400 0.5 600

1000

(F)

200

400 600 Time (ns)

800

1000

0

400 0.5 600

1000

0

200

400 600 Time (ns)

800

1000

0

0

1

200

800

0

1

800

200 Time (ns)

200

0 200

800

0

1000

(C) Time (ns)

400

1000

1

200 Time (ns)

Time (ns)

200

0

Time (ns)

(A)

Time (ns)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

400 0.5 600 800

0

200

400 600 Time (ns)

800

1000

0

1000

0

200

400 600 Time (ns)

800

1000

0

Figure 3: The time-ordered similarity matrix based on different basis functions, (A) dihedral angles; (B) dihedral angles; (C)Cartesian coordinates of main chain atoms (after alignment of translation and rotation); (D) 1830 pair distances of all 61 atoms; (E) 15 pairs distances of 6 Cα atoms; and (F) bin functions of the end-to-end distance d. τ = 500 ns in (A) but 10 ns in others.

Figure 4: The kinetic transition network. Numbers near the arrows are the corresponding transition rates. The life time and population are also listed near the typical conformations of each state. 25

ACS Paragon Plus Environment

The Journal of Physical Chemistry

(B) d (nm)

1 0.5 0

(C)

0

200

400 600 Time (ns)

800

1 0.5 0

1000

(D)

2

0

2 4 6 8 10 Distribution

2 1

RC

d (nm)

(A)

RC

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 30

0

0

✁1

✁2 0

200

400 600 Time (ns)

800

1000

✁2

0.4

0.6 0.8 d (nm)

Figure 5: (A) The time evolution of end-to-end distance d, and (B) its distribution. (C) The time evolution of the reaction coordinate. The red line is the result after time-neighbouring average. (D) The obtained reaction coordinate as a nonlinear function of d.

26

ACS Paragon Plus Environment

Page 27 of 30

(A) 0

1

(B)

20

0.15

T

60 0

80

0.09

U

0.06 0.03

100 120

0 0

(C)

F Frame

0.5

Eigenvalue

0.12

40 Time (µs)

20

40

60 80 Time (µs)

100 120

Frame

5

10

15

Index

(D)

4

T

2 F B2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

0

U

✁2 ✁2

0

2

4

B1

Figure 6: (A): The time-ordered similarity matrix of the HP35. (B):Eigenvalue of the similarity matrix. There are two value significantly greater than 0. The inset is the time-rearranged similarity matrix, suggesting three metastable states. (C): Time evolution of B1 , ψ(LEU61), B2 and RMSD to native structure. (D): Two-dimensional representation of energy landscape, constructed from B1 and B2 . The different colour points are short-time-neighbouring averaged conformations inside the three states.

27

ACS Paragon Plus Environment

The Journal of Physical Chemistry

(A)

1

B

Helical propensity

B value

(B)

B 0.4

2

0.3 0.2

F

U

T

1 0.8 0.6 0.4 0.2

0.1 0 20

50 60 70 Residue number

40 60 80 100 120 Basis Index

(D)

(C) 45 50 55 60 65 70

0.8 0.6 0.4 0.2 0.2 0.4 0.6 0.8 fraction of transition path

1

Free energy (kT)

0

residue number

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 30

4

trap

2 folded

0

✁2

unfolded

0 2 Reaction coordinate

4

Figure 7: (A): The contribution of each basis function to the slow variables B1 and B2 . (B): Helical propensity of different residues inside each metastable state. (C): The average probability of residues being in native conformation as the rescaled time along folding paths. (D): One-dimensional free energy profile on an approximate reaction coordinate(RC = B1 − 0.4B2 ). The dashed line is the distribution of each metastable state on the reaction coordinate.

28

ACS Paragon Plus Environment

Page 29 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 8: The transition network among the F (folded), U (unfolded), and T states (from left to right). Here shown the representative structures of the three states including their lifetimes (in µs), populations (in %), and the transition rates (in 1/µs).

29

ACS Paragon Plus Environment

The Journal of Physical Chemistry

TOC graphic 0

1

Time (ns)

200 400 0.5 600 800 1000

0

200

400 600 Time (ns)

800

1000

0

0

1

20 0.5

40 Time (µs)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

60 0

80 100 120 0

20

40

60 80 Time (µs)

100 120

Table of Contents Image

30

ACS Paragon Plus Environment

Page 30 of 30