Molecular Mechanism Behind the Fast Folding ... - ACS Publications

Oct 21, 2016 - of Villin Headpiece Subdomain: Hierarchy and Heterogeneity. Toshifumi Mori* and Shinji Saito*. Institute for Molecular Science, Myodaij...
1 downloads 0 Views 4MB Size
Article pubs.acs.org/JPCB

Molecular Mechanism Behind the Fast Folding/Unfolding Transitions of Villin Headpiece Subdomain: Hierarchy and Heterogeneity Toshifumi Mori* and Shinji Saito* Institute for Molecular Science, Myodaiji, Okazaki, Aichi 444-8585, Japan School of Physical Sciences, The Graduate University for Advanced Studies, Okazaki, Aichi 444-8585, Japan S Supporting Information *

ABSTRACT: Proteins involve motions over a wide range of spatial and temporal scales. While the large conformational changes, such as folding and functioning, are slow and appear to occur in a highly cooperative manner, how the hierarchical dynamics over different time scales play a role during these slow transitions has been of great interest over the decades. Here we study the folding mechanism of the villin headpiece subdomain (HP35) to understand the molecular mechanism behind this prototypical fast-folding protein. The ∼400 μs molecular dynamics (MD) trajectories obtained by Piana et al. [Piana, S.; Lindorff-Larsen, K.; Shaw, D. E. Proc. Natl. Acad. Sci. U.S.A. 2012, 109, 17845] are analyzed in detail. By extracting the slowest mode from the trajectories, which is responsible for the folding/unfolding transitions, and by analyzing the transition events along this mode, we find that the transitions occur in a heterogeneous manner. Detailed analysis of the individual transition events shows that the folding/ unfolding transitions occur via two qualitatively different pathways, i.e., the unfolding triggered from the C-terminal (α3 helix) and from the N-terminal (α1-α2 loop). Non-native contacts are also found to contribute in slowing down the transitions. The folding of HP35 thus proceeds in a segmental manner rather than cooperatively at the submicrosecond time scale. The Lys→Nle mutation is found to speed up the transitions by rigidifying the α3 helix, i.e., suppressing one transition pathway. The analysis of the microsecond dynamics in the single-molecule Förster resonance energy transfer efficiency trajectories, which are calculated from the MD data, reveals that the folding/unfolding transitions in the NleNle mutant can be fitted with a two-state model, whereas those in WT appear to be more complex and involves multiple time scales. This is due to the coupling between the folding/unfolding transitions and conformational transitions within the unfolded and intermediate states. The present study demonstrates that a protein as small as HP35 already involves heterogeneous characters during folding/unfolding transitions when the hierarchical dynamics at the molecular level is considered, thus heterogeneity can be a general characteristic in protein folding.



INTRODUCTION Dynamics of proteins involve motions over a wide range of time scales from picoseconds to milliseconds, whereas conformational transitions involved in folding and functioning occur in a highly cooperative manner.1,2 The mechanism of how the atomistic fluctuations are regulated and the correlated motions are organized have thus been of interest over the last few decades.3−7 Single-molecule Förster resonance energy transfer (smFRET) and nuclear magnetic resonance (NMR) relaxation dispersion measurements have shown that conformational fluctuations and structural transitions occur even in the native state,8−10 indicating that dynamics is of fundamental importance in maintaining the native state conformational ensemble as well as in regulating protein function. Furthermore, experimental and theoretical studies have also revealed the hierarchical dynamics undergoing behind the seemingly simple two-state transitions in allosteric regulations11−16 in prototypical proteins. These studies have highlighted that while the © 2016 American Chemical Society

conformational transitions of proteins appear to occur cooperatively, the local dynamics occurring at faster time scales can be heterogeneous, e.g., involve multiple states and do not proceed simultaneously. Furthermore, these fast dynamics have even been proposed to direct the slow dynamics and thus the large scale conformational transitions,14−16 although the connection between the hierarchical dynamics still remains elusive. Thus, it is indispensable to understand the molecular mechanism behind the protein dynamics over wide spatial and temporal ranges in order to thoroughly appreciate and manipulate the structures and functions of proteins. Molecular dynamics (MD) simulations have become a powerful tool in obtaining the molecular insights of protein dynamics. Recent advances in computational hardwares and Received: August 10, 2016 Revised: October 14, 2016 Published: October 21, 2016 11683

DOI: 10.1021/acs.jpcb.6b08066 J. Phys. Chem. B 2016, 120, 11683−11691

Article

The Journal of Physical Chemistry B

the diffusion rate about the transition state are, and moreover, whether an intermediate state exists along the folding/ unfolding transition pathway(s). Furthermore, how the Lys→ Nle mutations affect the folding mechanism and dynamics remain unclear. Here we focus on understanding how the transition dynamics occur and investigate the hierarchical dynamics behind the folding/unfolding transitions by looking at the individual transition events. The slowest mode, representing the folding/unfolding transitions, is determined by applying the recently developed dynamic component analysis (DCA) approach,23 which determines the modes based on the relaxation times. The transition dynamics along the slowest mode is analyzed to understand how the folding/unfolding transitions occur, and the role of faster motions during the transitions are discussed. In addition, we study how the Lys→ Nle mutations and temperature changes affect the folding/ unfolding transition dynamics as well as the native and unfolded state conformation ensembles. Finally, the transition dynamics in terms of the smFRET efficiencies calculated from the trajectories are examined to discuss how the differences between WT and NleNle appear in the transition dynamics at the microsecond time scale.

softwares have enabled MD simulations to afford the dynamics on the microsecond time scale,17−19 or reach even up to milliseconds when specialized hardware is used.20,21 Thus, MD simulations are becoming feasible to analyze the dynamics behind the experimental measurements and provide a link between the dynamics at multiple temporal and spatial resolutions from atomistic details. However, due to the highdimensional nature of proteins, analyzing the long trajectory data is not straightforward; for instance, analyses of the same 100 μs-long trajectory have led to controversial folding mechanisms21,22 owing to the difficulty in determining a single best coordinate that describes the overall folding/unfolding transition events.23−25 Moreover, while proteins with multiple domains are considered to involve intermediate states and multiple transition pathways during folding,26 to what extent such heterogeneous dynamics play a role in small proteins, e.g., fast-folding proteins, remain largely elusive. In order to reveal the molecular mechanism of folding for the small proteins, and furthermore to understand the correlation between hierarchical dynamics over different time scales behind the folding/ unfolding transitions, it is important to examine the individual transition events rather than obtaining the averaged picture from the ensemble over many events, i.e., analyze the transition dynamics at the single molecule level from both experimental and theoretical perspectives.16,23,27−32 In this regard, here we investigate the folding mechanism of a fast folding protein, villin headpiece subdomain (HP35), by analyzing a series of ∼400 μs-long MD trajectories obtained by Piana et al.33 HP35 has been a prototypical protein for studying the folding mechanism from theoretical and experimental perspectives.33−42 In particular, while the wild type (WT) already folds within a few microseconds, the designed NleNle mutant, in which lysine (Lys) 65 and 70 is mutated to the uncharged norleucine (Nle) residues (Figure 1), have been one



COMPUTATIONAL DETAILS

The trajectories of HP35 for the WT at 345 K and the NleNle mutant at 360, 370, and 380 K by Piana et al.33 are analzyzed. The lengths of the trajectories are 397.5, 305.2, 395.5, and 301.9 μs, respectively, and the snapshots are recorded every 0.2 ns. Note that in their work another WT trajectory at 360 K was calculated. However, we omit the trajectory from the current study since it is largely unfolded, and moreover involves a slow trans to cis isomerization of a polypeptide bond. In the dynamic component analysis (DCA), the dynamic component (DC) modes q(τ)(t) are obtained by diagonalizing the time correlation function matrix D(τ), q(τ)(t )T = v(̃ t )T Ũ (τ )

(1)

D(τ ) = ⟨v(̃ t0)v(̃ t0 + τ )T ⟩

(2)

where τ is the delay time, ⟨···⟩ denotes the time average, v(̃ t) are the coordinates which satisfy D(0) = I (identity matrix), and Ũ (τ) are the eigenvectors obtained by diagonalizing D(τ). In the current work v(̃ t) is constructed by normalizing the principal component (PC) modes, which is obtained from the principal component analysis43 using the distances between the Cα atom pairs following our previous procedure.23 The superscript (τ) in eq 1 shows that the coordinate is dependent on the delay time τ. We find however that the slow modes, particularly the slowest mode (DC1), are largely invariant with respect to τ.23 Thus, here we set τ to 10 ns throughout the analysis and hereafter omit the superscript (τ). We also note that the eigenvalues and eigenvectors from eqs 1 and 2 constructed using the PC modes are equivalent to those obtained by an independent component analysis method.44,45 While the DC modes are constructed using the distances between Cα atoms, the contact map between residues i and j, Qij, is used to characterize the structures. Qij is defined as

Figure 1. (a) Native structure of the WT HP35 (PDB: 1YRF) and (b) the contact map representation for the crystal structure (lower-right) and native state basin (upper-left). See text for the definition of the native state basin. The helices are shown in purple and loops and tails are in white. The tryptophan (Trp) and phenylalanine (Phe) residues as well as the lysine residues (orange) which are mutated to norleucine are shown in licorice representation. Note that only one of the Trp and Phe residue locations are shown when multiple locations are in the PDB structure.

of the fastest folding protein known up to date.40 Recently, Piana et al.33 have performed multiple ∼400 μs-long equilibrium MD simulations of WT and NleNle at different temperatures. Their trajectories show multiple folding/ unfolding transition events, and various experimental observances calculated from their trajectories are in good agreement with existing experiments. On the contrary, the folding mechanisms analyzed from the trajectories remain controversial,33,37,38 e.g., what the height of the free energy barrier and 11684

DOI: 10.1021/acs.jpcb.6b08066 J. Phys. Chem. B 2016, 120, 11683−11691

Article

The Journal of Physical Chemistry B ⎧ 1 for |i − j| > 4 ⎪ 1 + exp[ r ij(t ) − r0] Q ij(t ) = ⎨ ⎪ ⎩ 0 for |i − j| ≤ 3

multiple conformational substates or proceed via multiple transition pathways where the location of the transition states differ.23 To reveal the transition dynamics in more detail, the folding/ unfolding transition events between the native and unfolded states are extracted directly from the trajectories. The native and unfolded states are chosen to be within kBT from each free energy minimum in the 1D-FES. The distributions of the transition path times (TPTs) in the WT and NleNle results are summarized in Figure 3. The histograms, given in log-scale,

(3)

were rij is the distance between the Cα atoms in residues i and j, and r0 is set to 8 Å. As an example, the crystal structure of the WT (PDB: 1YRF)46 and its contact map representation are shown in Figure 1.



RESULTS AND DISCUSSION Characters of States and Transition Path Time Distributions. In order to characterize the slowest conformational dynamics and define the native and unfolded states, we first apply the DCA method to each trajectory. The onedimensional free energy surface (1D-FES) as a function of the first DC mode (DC1) is given in Figure 2. The contact map of

Figure 3. Distribution of the transition path times for the four trajectories. The mean and standard deviation of the transition path times are given in each figure. Note that x-axis is in log-scale.

span a wide range of time scales which vary by more than 2 orders of magnitude within each trajectory. This strongly indicates that the transition dynamics is heterogeneous. In addition, the current TPT distribution is significantly different from that by the two-state transition model on the high-barrier and overdamped assumption,47,48 i.e., the free energy barrier, ΔG is estimated to be only 0), whereas the C-terminal of the α1 helix (residues 49 to 51) and the α1-α2 loop (residues 52 to 54) deform first in the lower path (DC4 < 0). In either case, the changes of the contacts on the other end are rarely found, i.e., the two paths are qualitatively distinct. Hereafter we denote the upper and lower transition paths as “α3 melting” and “α1-α2 deforming” pathways, respectively. We find that 18 and 9 transitions occur via the α3 melting and α1-α2 deforming pathways, respectively (Figure S6). The α3 melting pathway is thus the major folding/unfolding transition pathway in WT. Importantly, notable amount of non-native contacts, i.e., positive values in the difference contact map, are found during the folding/unfolding transitions. On the other hand, we do not see clear difference between the TPT lengths along the two pathways (Figure S6). Next we examine the effect of Lys→Nle mutations on the transition mechanisms in terms of the 2D-FESs for the NleNle trajectories at 360, 370, and 380 K (Figure S7). To compare the transition mechanisms between WT and NleNle directly, DC1 and DC4 optimized for WT is used to construct the 2D-FESs, i.e., the mean and transition matrices in DCA are determined using the WT trajectory. The FESs show that the α3 melting pathway is less favored in NleNle, e.g., population at DC4 > 2 is very low. The α1-α2 loop formation/destruction is thus

dynamics behind the transitions along the slowest coordinate (DC1). In particular, how the hierarchical dynamics, i.e., the faster dynamics described by other DC modes, play a role during folding/unfolding transitions is of great interest. To this end, we look into the two-dimensional (2D)-FESs in terms of DC1 and other DC modes to distinguish the transition events. We find that DC4 from the WT trajectory characterizes the differences between the transitions, whereas DC2 and DC3 separate the substates within the unfolded state (Figure S3). The mean and standard deviation of the DC modes along DC1, shown in Figure S4, also confirms the importance of DC4 about the transition region. First we focus on the WT trajectory; the 2D-FES as a function of DC1 and DC4 is given in Figure 4. Note that the correlation time along DC4 for WT is ∼400 ns, which is estimated from the time correlation function of DC4 (not shown), whereas that of DC1 is approximately 5.6 μs (see below). Figure 4 clearly shows that at least two major transition pathways exist in WT, and multiple conformational substates appear along the pathways. As an example, two characteristic transition trajectories that proceed via the two pathways are given in Figure S5; the trajectory segments indeed involve clusters of substates and kinetic traps during the transitions. To characterize the transition pathways, the differences between the contact maps at the substates and the native well (Figure 1) are also provided in Figure 4. The difference contact 11686

DOI: 10.1021/acs.jpcb.6b08066 J. Phys. Chem. B 2016, 120, 11683−11691

Article

The Journal of Physical Chemistry B

unfolding transition has been initiated, thus the transitions become heterogeneous at the molecular level. Conformational States from smFRET Efficiency Distributions. While the current results strongly indicate heterogeneous folding/unfolding transitions and different mechanisms between WT and NleNle, to what extent such heterogeneity can be measured experimentally, e.g., in a single molecule observable, is not straightforward. To this end, here we examine the smFRET trajectories from the trajectory data and discuss how the protein conformational transitions alter the distributions of smFRET observables. The time-resolved FRET signals between the cyanophenylalanine (PheCN) and Trp residues have been measured in the engineered HP35,49 and were found to reflect the conformational heterogeneity of HP35. The effect of Phe→PheCN mutation on the structures and dynamics of HP35 has been examined experimentally and is found to have little effect.50 Thus, here we calculate the smFRET efficiencies for the Phe-Trp pairs as a mimic of PheCNTrp. Two pairs, Phe47-Trp64 and Phe76-Trp64, are chosen which are expected to represent the conformational changes of the α1-α2 loop and α3 helix, respectively. By simply assuming the orientational averaging of the FRET chromophores, the FRET efficiency is approximately calculated from the trajectory by 1 E (t ) = 1 + (R(t )/R 0)6 (4)

indicated to regulate the folding/unfolding transitions. In addition, a transition path along DC4 ∼ 0 is found in the NleNle mutant; this pathway involves simultaneous changes in the α1-α2 loop and the C-terminal of the α3 helix, and contributes to the fast transitions found in Figure 3. Figure S7 also shows that the unfolded state minimum for NleNle is shifted from DC1 ∼ −0.8 and DC4 ∼ 0.4 to DC1 ∼ 0 and DC4 ∼ −2; the original minima for the WT becomes a conformational substate within the unfolded state basin (see also the substate within the unfolded state of NleNle in Figure 2 and discussion above). This change in the unfolded state distribution also implies that α3 helix becomes more rigid in NleNle than in WT, even in the unfolded state, which is consistent with previous studies.37,49 Finally, we discuss the molecular origin of the heterogeneity in the TPT lengths, particularly when the transition becomes slow. The difference contact maps, given in Figure 4, show that notable amount of non-native contacts appear along the folding/unfolding transition pathways. Indeed, non-native contacts are frequently found in the slow transitions, which suggests that such contacts play a role in slowing down the transitions. To test this idea, we examine the correlation between the formation of the non-native contact(s) and TPT lengths. Here, the non-native contacts in each transition event are quantified by the mean of the difference contact map over each transition trajectory, using the native state contact map (Figure 1) as the origin. We find that the non-native contact Ala49-Arg55 most strongly appears in the slow transitions. Figure 5 summarizes the correlation between this contact and the TPT length for the transitions in the four trajectories. The result shows that when the transition occurs via the α1-α2 loop deforming pathway (i.e., mean of DC4 over the transition trajectories is negative), the Ala49-Arg55 contact indeed appears to correlate strongly with the slowness of the transition. Interestingly, this pair is not in contact via a characteristic hydrogen bond. Instead, the residues 50 to 54, which are neutral and mostly hydrophobic, form a loop and a hydrophobic core together with the residues in the α2 and α3 helices, e.g., Phe58 and Leu69, leading to a somewhat stable misfolded structure (Figure 5). We note that non-native contacts have been found to also slow down the folding/unfolding transitions of a designed α-helical protein α3D, in which the non-native contacts are maintained by salt bridges. The transitions via the α3 melting pathway, on the other hand, do not seem to have any strong characteristic non-native contact. Instead, in these transitions the α1 and α2 helices as well as the α1-α2 loop segments are preserved while the C terminal of the α3 helix is unfolded, i.e., the protein is partially folded. In this case, the unfolded α3 helix takes different conformations, thus becomes entropically stable. Since the neutral Nle residues become more unstable than the Lys residues when exposed to water in solution, this likely reduces the chance of finding these partially unfolded configurations in NleNle than in WT, which also explains why the α3 melting pathway is less observed in NleNle. Overall, these results indicate that the folding/unfolding transitions of HP35 proceed in a segmental manner, which can be described in terms of DC4, rather than simultaneously. These segmental motions are an order of magnitude faster than the overall folding/unfolding transitions, but are nevertheless found to trigger the slow transitions. Importantly, the transition pathways and potential kinetic traps differ qualitatively depending on which segment of the protein the folding/

where R(t) is the distance between the centers of the heavy atoms forming the rings in Trp and Phe residues, and R0 is set to 18 Å.49,51 We note that the effect of photon shot noise, which can be incorporated using a Markov state model with some ambiguity,52 is omitted in the current analysis due to the limited time resolution of the trajectory. Figure 6 shows the

Figure 6. FRET efficiency distributions for Phe47-Trp64 (left) and Trp64-Phe76 (right). Top to bottom rows are the results for WT at 345 K and NleNle at 360, 370, and 380 K, respectively.

FRET efficiency distributions for the Phe47-Trp64 (E47) and Phe76-Trp64 (E76) pairs from the four trajectories. Three peaks are found in each plot, i.e., ∼0.1, 0.5, and 1.0 for E47 and 0.1, 0.9, and 1.0 for E76. The positions of the peaks are largely invariant with respect to mutation and temperature changes. On the other hand, the relative heights of the peaks vary, e.g., the peak with intermediate efficiency (E47 ∼0.5) decreases with temperature while those with low (E47 ∼0.1) and high (E47 ∼1.0) efficiency increase. This indicates that the intermediate efficiency peak represents the native state while the low and high efficiency peaks arise from the unfolded state. Note that 11687

DOI: 10.1021/acs.jpcb.6b08066 J. Phys. Chem. B 2016, 120, 11683−11691

Article

The Journal of Physical Chemistry B E47 and E76 in the crystal structure (PDB: 1YRF) are 0.62 or 0.48 and 0.64 or 0.66 depending on the two conformations in the PDB, respectively, thus the FRET efficiencies are somewhat different from the MD ensemble possibly due to thermal fluctuations of the terminal residues and multiple configurations within the native state. It is noted that 2D-IR and time-resolved FRET experiments have indicated that HP35 takes multiple configurations within the native state at room temperature.49,50,53 The 2D-FES as a function of DC1 and E47 or E76, given in Figure 7, further shows how the efficiencies change during the

Figure 8. Joint probability distribution for the FRET efficiency of Phe47-Trp64 calculated at different delay times t from the WT trajectory.

between all peaks are found, though it requires more time to converge to the uncorrelated distribution seen at t = ∞. To quantify the amount of transitions, E47 is divided into three groups, 0 ≤ E47 < 0.3, 0.3 ≤ E47 < 0.7, and 0.7 ≤ E47 ≤ 1, denoted as state 1, 2, and 3, respectively. Note that state 2 is mainly the native state, whereas states 1 and 3 belong to the unfolded and intermediate states (see Figure 7). The conditional transition probability as a function of delay time, given by Pi → j(t ) =

∫i dE0 ∫j dEtP(E0 , Et) ∫i dE0P(E0)

(5)

describes how much population transfer occurs from states i to j over t. Here E0 = E47(0), Et = E47(t) and ∫ idE denotes the integral over state i. The results for the four trajectories are summarized in Figure 9. All trajectories are found to roughly reach their equilibrium values by 10 μs, although the relaxation of WT seems to be the slowest. The ratios of the states for NleNle vary at different temperatures. However, the transition mechanism does not seem to change with temperature; for instance, the population of states 1 and 3 are roughly the same. On the other hand, while the 1→2 and 3→2 transitions behave similarly in NleNle, the two curves differ notably in WT at