15134
J. Phys. Chem. B 2008, 112, 15134–15139
Caching of a Chameleon Segment Facilitates Folding of a Protein with End-to-End β-Sheet Sandipan Mohanty John Von Neumann-Institut fu¨r Computing, Forschungszentrum Ju¨lich, D-52425 Ju¨lich, Germany
Ulrich H. E. Hansmann* John Von Neumann-Institut fu¨r Computing, Forschungszentrum Ju¨lich, D-52425 Ju¨lich, Germany, and Department of Physics, Michigan Technological UniVersity, Houghton, Michigan 49931 ReceiVed: May 27, 2008; ReVised Manuscript ReceiVed: August 24, 2008
We report results from all-atom simulations of a 49-residue C-terminal fragment of TOP7 in implicit solvent. Using parallel tempering simulations with high statistics, we probe the thermodynamic properties of the protein over a large range of temperatures and evaluate its free energy landscape at room temperature. Our results confirm that the protein folds by a caching mechanism that relies on a chameleon segment. This mechanism differs from the one seen in high-temperature unfolding simulations. Finally, we discuss a possible mechanism for dimerization of the protein. 1. Introduction With the growth of the protein data bank (PDB), knowledgebased approaches have become increasingly more successful in the prediction of the three-dimensional structure and function of proteins. On the other hand, energy landscape theory and the funnel picture have led to an understanding of the general aspects of the folding process. While these successes of computational protein science over the past decade are impressive, they have raised at least as many questions as they have answered. Perhaps the most important is the occurrence of misfolding and subsequent aggregation, as these phenomena are often connected with the pathology of various diseases. Their understanding requires a detailed insight into the folding mechanisms of the associated proteins. In silico investigations of the folding process are hampered by huge computational costs. The energy landscape of a protein is rough, and the number of local minima separated by large free energy barriers increases exponentially with its size. This renders all-atom simulations of all but the smallest proteins (with ≈30 residues) extremely difficult. The advent of modern search techniques such as generalized ensemble sampling1 and parallel tempering2 (also known as replica exchange method) has alleviated this sampling problem. While atomistic folding simulations are still out of reach for most proteins, they have become possible for designed proteins where complications such as disulfide bridges and proline isomerization can be avoided. An example is the C-terminal fragment (CFr) of the designed 93-residue protein TOP7.3 This fast-folding protein (PDB id: 2GJH) forms a stable homodimer, whose secondary structure remains nearly unchanged up to 98 °C and high concentrations of denaturant.4 Along the sequence from N- to C-terminus the secondary structure profile of the molecule CFr is strandhelix-strand-strand (see Figure 1). The two strands at the C-terminus form a β-hairpin. The strands at the N- and C-termini are also adjacent in the 3-stranded β-sheet. In an ongoing research project we use simulations of CFr to study folding of proteins with end-to-end β-sheets.5 Such proteins pose a special challenge. The N-terminal β-strand is * Corresponding author. E-mail:
[email protected].
Figure 1. PDB structure of the CFR dimer.
synthesized early on, but it cannot bind to the C-terminus before the chain is fully synthesized. The question arises how during this time the protein avoids interaction of the N-terminal β-strand with other parts of the chain or nearby molecules that may lead to potentially harmful aggregates of incompletely folded proteins. For slow-folding proteins, the entropic suppression of sequence distant contacts and a “backtracking” mechanism,6 where full folding only occurs after breaking of prematurely formed long distance contacts, will be sufficient. However, for fast-folding proteins with sequence nonlocal contacts, it is likely that a fold accelerating mechanism is involved. In a recent letter5 we have proposed a “caching” mechanism that relies on chameleon behavior of one of the terminal β-strands to facilitate folding. Our conjecture was deduced from a careful analysis of all 15 observed folding events in a replica exchange simulation of the structured 49-residue part of TOP7CFr (residues 2-50 of one chain of 2GJH.pdb). However, as replica exchange sampling relies on an artificial dynamics, it does not lead to physical trajectories. Hence, information on the kinetics of folding can only be deduced indirectly. For this reason, and because the caching mechanism raises fundamental questions on the design of proteins, we feel that this process needs to be explored in greater detail. This is the purpose of the present paper that extends the statistics of our simulations and complements the “kinetic” arguments of ref 5 with an independent analysis of equilibrium thermodynamic quantities. Our question is whether the caching mechanism can be also
10.1021/jp804661t CCC: $40.75 2008 American Chemical Society Published on Web 10/29/2008
Protein with End-to-End β-Sheet
J. Phys. Chem. B, Vol. 112, No. 47, 2008 15135
deduced from “static” quantities such as specific heat or secondary structure content, which, for fast folding proteins, may be easier to measure in experiments than a direct observation of the folding trajectory. In this context, we aim to obtain a thorough overview of the free energy landscape of the protein and measure the barrier between the native state and competing configurations. Besides describing the particulars of the folding mechanism, we probe the often assumed conjecture that the folding pathway can be deduced from time-reversing unfolding pathways.7,8 Finally, we suggest a picture that relates folding and dimerization of CFr. 2. Methods 2.1. Model. We use an all-atom representation of the protein molecule that ignores fluctuations in covalent bond lengths and bond angles and assumes a constant peptide-bond torsional angle of 180°. As a consequence, the degrees of freedom are the two backbone Ramachandran torsional angles and the side chain torsional angles for each residue. Unlike the protein itself, the solvent is treated as a continuous medium rather than in atomistic detail. This commonly used strategy drastically reduces the computational costs. The geometrical parameters of the model (the “Lund force field”) are taken from ref 9. The force field consists of four terms:
Etot ) Eev + Eloc + Ehb + Ehp where
Eev ) κev
∑ i 90° V(R, β) ) (cos R cos β) 0 otherwise
Ehp ) CIJ )
1 NI + NJ
[∑ (
∑ MIJCIJ I 0.8), and specific heat as functions of temperature.
3. Results and Discussion 3.1. Thermodynamics and Folding Mechanism. A parallel tempering simulation explores the behavior of a protein over a large range of temperatures. It therefore allows an analysis of melting and other transitions in a molecule. Such changes usually lead to a signal in the specific heat defined as
CV )
1 (〈E2〉 - 〈E〉2) NreskBT 2
(3)
and displayed in Figure 2. The peak at T ) 335 ( 10 K separates a disordered and extended high temperature phase from a compact phase with large secondary structure content but does not coincide with the folding transition. This can be seen from the fraction of native configurations (defined as fraction of structures with Q > 0.8) that is also displayed as function of temperature in Figure 2. At the melting transition this fraction is zero but increases rapidly around Tf ≈ 300 K. However, the peak in specific heat is correlated with helix formation as indicated by the fraction of configurations with the native helix (Lys 13-Leu 30) formed. At a slightly lower temperature, T ≈ 320 K, the fraction of nativelike strands starts to increase, but this temperature is still above Tf. On first sight, the curves in Figure 2 therefore suggest a folding scenario where the helix forms first, followed by the strands before finally the secondary structure elements are assembled into the final structure. In order to characterize the folding transition in greater detail, we show in Figure 3 four different views of the configurational state of each residue as functions of temperature. The top panel displays the residue-wise mean-square deviations of the backbone angles. We plot this quantity exponentiated so that the larger this quantity the smaller is the average difference in the Ramachandran angles for a residue between a given configuration and the native state. The helix region, residues 13-30,
Figure 3. Temperature dependence of four residue-wise defined variables. Top: exp(-δA2(i)/2π2), where δA2(i) is the mean-square deviation of the Ramachandran angles of residue i. Second: helix propensity of residue i. Third: strand propensity of residue i. Bottom: local nativeness parameter qi.
has a poor Ramachandran rmsd for high temperatures but improves as the temperature decreases. On the other hand, the three strand regionssresidues 2-9, 35-40, and 43-49sappear “nativelike” in this plot, even at high temperatures. This is an artifact caused by the fact that amino acids in extended conformations have their backbone angles not very far from the β-strand region in the Ramachandran plot. Both the local nativeness parameter qi (bottom) and the strand propensity of the residues (second from bottom) reveal that at high temperatures these residues are not in the native state. The most
Protein with End-to-End β-Sheet
Figure 4. Snapshots of a folding trajectory of CFr.
important aspect of the top plot is the behavior of the N-terminal residues 2-12, containing the N-terminal strand and a turn region connecting the strand to the native helix. These residues lose their superficial similarity to the strands at an intermediate temperature. This temperature is roughly at the point where the helix (residues 13-30) has nearly formed. The second plot shows the local helix propensity and indicates that N-terminal residues 2-12 also show a pronounced tendency to be helical after the native helix is formed between residues 13-30. On the other hand, the third panel depicting the local β-strand propensities indicates a similar pronounced strandlike tendency for the N-terminal residues 2-9. This suggests that with decreasing temperatures this part does not stay helical with 100% probability. The behavior of the C-terminal β-hairpin (residues 32-49) is much less dramatic. It forms at a temperature below the onset of helix-formation, but above where the N-terminal region starts to show significant strand content. Hence, the formation of the helix (13-30) is followed by increased helicity in the N-terminal region (2-12), and formation of the hairpin (32-49) is followed by an increase in strandness in the same region. This picture is supported by the local nativeness parameter qi in the bottom panel. At high temperatures no residue is in a nativelike state. As the temperature decreases, first the helix and later the hairpin become nativelike. Only after both helix and hairpin have formed does the N-terminal region become nativelike. The above-described thermal ordering supports a folding mechanism that we have derived more heuristically in ref 5 and that we sketch in Figure 4. As in the corresponding Figure 3 of ref 5 (derived from a different set of runs), the native helix (Lys 13-Leu 30) and the C-terminal hairpin (Tyr 32-Leu 50) are strong secondary structure elements and fold independently. On the average, the native helix folds before the hairpin and induces the N-terminal residues to grow into a non-native extension of the native helix. After the C-terminal hairpin forms and makes the correct tertiary contacts with the native helix, the non-native helix extension dissolves and N-terminal residues form β-sheet contacts with the hairpin to complete the native structure. In ref 5 this picture was obtained through a careful analysis of the observed folding events and detailed simulations of individual secondary structure elements. The described scenario is not only consistent with the thermal ordering observed in Figures 2 and 3 but also with the well-established funnel picture of folding. The free energy landscape at T ) 284 K (where about 32% of the configurations are native-like, i.e. Q > 0.8) is drawn in Figure 5 as a function of the Q-measure (see section 2.3). Its funnel-like form is obvious: the mean free energy (and energy) at a certain Q decreases with Q. There is a pronounced local minimum in free energy at Q ≈ 0.15 which is absent in the corresponding curve for energy (cf. Figure 5). The confor-
J. Phys. Chem. B, Vol. 112, No. 47, 2008 15137
Figure 5. Energy and free energy landscapes as a function of nativeness parameter Q.
mations in this minimum have low secondary structure and high energies, but their high entropy makes it a local minimum in free energy. Formation of secondary structure is energetically favorable but is also accompanied by a large loss of chain entropy. This leads to the free energy barrier seen in the range (0.2 < Q < 0.4). With increasing Q, the free energy decreases till it reaches a minimum at Q ≈ 0.5-0.65. This minimum has a slightly lower free energy than the native basin Q > 0.8 and is separated from it by a free energy barrier of ≈2.5 kcal/mol (≈4.4kBT). The corresponding structures are also shown in the figure. The nativelike configurations at Q ≈ 0.9 have a backbone root-mean-square deviation (rmsd) of 1.7 Å, with all the hydrogen bonds of the experimental native state present. Besides these configurations, representing the global energy minimum, the native basin (Q > 0.8) also includes low-lying minima with ≈10 kcal/mol higher energy than the global energy minimum. Structures contributing to such local minima differ from the native structure by a backbone rmsd of 2.5 Å to about 5 Å that results from deformation of one end of the helix or the offregister attachment with the rest of the β-sheet. The final stage of the folding event is characterized by a transition between structures that have all the helix Lys 13-Leu 30 and the C-terminal β-hairpin Tyr 32-Leu 50 formed and in contact but differ in the state of the N-terminal residues Glu 2-Thr 12. The minimum at Q ≈ 0.6 contains structures in which the N-terminal residues are partly or entirely included in the native helix, but never as a strand in the β-sheet. The structures in the minimum at Q > 0.8 always feature these residues as a β-strand in the 3-stranded β-sheet of the native state. Both configurations have the same number of hydrogen bonds but forming a β-strand is more advantageous in terms of hydrophobic contacts with both the hairpin and the helix. This again indicates that in the presence of both helix and hairpin the long helix partially unfolds and releases the N-terminal residues to join the C-terminal hairpin in a three-stranded β-sheet. As this requires various hydrogen bonds first to dissolve before forming again in the final structure, the two local minima are separated by a free energy barrier of ≈2.5 kcal/mol as shown in Figure 5. While the idea that non-native contacts may facilitate folding is not new,20,21 the folding mechanism in CFr involves also a change of secondary structure. The caching of the N-terminal strand accelerates folding of CFr by avoiding many misfolded states and therefore might be a common mechanism in molecules where adjacent strands in a β-sheet have large sequence separation. Experimental22-24 as well as computational studies with simplified models18,25 have indicated the presence of non-native R-helical structures early in the folding process of predominantly β-sheet proteins. For instance, in Langevin
15138 J. Phys. Chem. B, Vol. 112, No. 47, 2008
Mohanty and Hansmann
Figure 6. Average fraction of configurations with (left) native helix, (center) C-terminal hairpin, and (right) as non-native helix extension. All three propensities are shown as functions of the similarity measure Q for both folding events from parallel tempering runs and thermal unfolding from high-temperature runs.
dynamics simulations with the united-residue force field, two β-strands at the opposite ends of the protein 1E0G were initially seen to fold as extensions of native R-helices, before turning into β-strands in the last stages of the folding process.26 The simulations of CFr provide a detailed picture of how such nonnative interactions20 might arise spontaneously and channel the folding pathway. They indicate that by utilizing an accelerating mechanism such as caching some proteins with large contact order can fold faster than others of similar complexity. This might contribute to the large fluctuations observed for midcontact order proteins in the contact order vs fold rate plot in ref 27. It is also possible for the same mechanism to work differently for proteins with other topologies and produce the opposite effect. For instance (as we already conjectured in ref 5), the caching of the N-terminal strand of CFr, which corresponds to the middle strand in a 5-stranded β-sheet in the full length TOP7 molecule, could promote metastable non-native contacts between the two terminal β-hairpins. This would lead to deep local minima in the energy landscape of TOP7 (PDB id: 1QYS), adversely affecting its ability to fold fast. Interestingly, experiments have indicated that TOP7 exhibits a slow and multiphase folding behavior.29 The caching mechanism requires that at least one strand exhibits a chameleon behavior,28 i.e., adopts different secondary structures in different tertiary environments. Mutations to that key strand should therefore have significant effects on the folding rate of the protein. This could be used as one way to probe the caching mechanism experimentally. Another possibility would be the measurement of the helix content as a function of either time or temperature. As mentioned above, for several proteins with primarily β-sheet structures, increased helix contents at early stages of folding has been reported in experimental studies.22-24 A similar increased helix content at intermediate temperatures could be a signal for the for the caching mechanism (see also Figures 2 and 3). 3.2. Unfolding versus Folding Simulations. The poor sampling observed in all-atom protein simulations at physiologically relevant temperatures is due to their rough energy landscape. Obviously, the problem is much simpler at higher temperatures where the protein’s thermal energy allows crossing of barriers. It is therefore reasonable to expect that current day models capture the physics of thermal or mechanical unfolding of a protein. However, simulations of high temperature unfolding are sometimes interpreted as reversed-in-time folding.7,8 While such an approach has been used in the past with great success,7,8 it is not clear whether its application to protein simulations is universally justified. As our simulations indicate that CFr folds via a nontrivial pathway, it seems to be a good model to test this question. For this reason, we have examined whether its folding mechanism relying on the caching of a chameleon segment can be also inferred from unfolding simu-
lations at high temperatures. We use 25 unfolding runs, starting from the native state, at a temperature just above the specific heat peak of the molecule, 350 K. In order to avoid the large conformational changes that a Monte Carlo simulation might allow at high temperatures, we use only local backbone updates and side chain rotations. Results from these runs are contrasted in Figure 6 with the folding events obtained in our regular parallel tempering runs. The plots show the probability for groups of contacts representing the native helix, the C-terminal β-hairpin, and the N-terminal segment cached as a non-native extension of the native helix. Both the formation/dissolution order of native helix and hairpin are consistent between the folding and unfolding trajectories. However, the cached segment of N-terminal residues forms in the folding trajectories along with the native helix and dissolves synchronously with the formation of the end-toend contacts, after the C-terminal hairpin is formed. On the other hand, in the unfolding trajectories, there is no indication for caching of the N-terminal residues as a non-native extension of the native helix. The observed occurrence probability are consistent with random behavior; i.e., caching does not contribute to the dynamics. Hence, an interpretation of unfolding data in reversed time would miss the caching mechanism that governs folding of this protein. Our results therefore suggest that for nontrivial folding mechanisms (as observed for CFr) unfolding simulations cannot be interpreted as time-reversed folding simulations. Given the previous successful applications of unfolding simulations to study the folding mechanism of proteins, our result raises the question whether one can characterize these classes of proteins for which an interpretation of unfolding as time-reversed folding is a valid strategy. We conjecture that such an interpretation is restricted to simple twostate folder and associated with a nucleation mechanism in folding, as observed, for instance for CI2.7,8 3.3. Folding and Dimerization. CFr is only observed as a dimer in experiments. The monomeric folded state observed in our simulations contains several exposed hydrophobic residues, suggesting that the folded monomers should dimerize readily. In ref 5 we reported preliminary results from unconstrained simulations that support this conjecture. Our new and extended simulations that started with 2-folded monomers having random initial positions and orientations confirm our previous results but at the same time lead to a more detailed picture. The topology of the CFR monomer suggests six possible dimeric states, illustrated in Figure 7. Dimerization occurs quickly, but we observe not only the experimentally reported state (A in the figure) but all six states, albeit with different probabilities. Of the six states shown in Figure 7, the three binding modes at the bottom, which have the helices on opposite sides of the β-sheet, occur with a lower probability than those on the top row. The three binding modes designated A, B, and C, however, occur with roughly the same probability to the accuracy of the
Protein with End-to-End β-Sheet
J. Phys. Chem. B, Vol. 112, No. 47, 2008 15139 Centre Ju¨lich, Germany. This work is supported, in part, by research grant CHE-08090002 of the National Science Foundation. References and Notes
Figure 7. Schematic illustration of different dimer binding modes seen in our simulations. Only the N-terminal strand and C-terminal β-hairpin are shown for each monomer for clarity. In A, B, and C, the helices of the two monomers are on the same side of the β-sheet as the viewer. In D, E, and F, they are on opposite sides of the β-sheet.
collected statistics. The energy difference between a state with two isolated monomers and a dimer in the native binding mode is roughly 30 kcal/mol, whereas the energy difference between alternative binding modes A, B, and C is within 5 kcal/mol. In the model, it is possible to satisfy the β-sheet hydrogen bonds as well as the exposed hydrophobic contacts by attaching the molecules in any of the ways A, B, and C, shown in Figure 7. This energetic degeneracy might be a consequence of our simple force field, which ignores many higher order effects. It is also possible that the binding modes B and C benefit from our omition of the unstructured part of the CFr molecule, residues 51-58. In the binding mode C, the unstructured parts of the monomer would be close to each other, which could have a destabilizing effect. 4. Conclusions We have presented results of high statistics parallel tempering simulations of the TOP7-CFr using an implicit solvent all-atom model. While we do not know of any other unbiased all-atom simulations that studied successfully the thermodynamics of a protein of this size (49 residues) and complexity (both helix and strand secondary structure, sequence-distant contacts between β-strands), its importance goes beyond providing a new milestone. Instead, our present simulations and the preliminary ones reported in ref 5 have for the first time revealed in detail a possible new folding mechanism for a protein. They suggest mutation experiments that can test this mechanism. At the same time they caution against naı¨ve interpretations of unfolding simulations as time-reversed folding simulations and relate folding and dimerization for our protein. In the near future, we plan to extend this work to studies of the full TOP7 protein with its 92 residues. Acknowledgment. All calculations were done on computers of the John von Neumann Institute for Computing, Research
(1) Hansmann, U. H. E.; Okamoto, Y. Phys. ReV. E 1997, 56, 2228– 2233. (2) Hukushima, K.; Nemoto, K. J. Phys. Soc. Jpn. 1996, 65, 1604– 1608. (3) Kuhlman, B.; Dantas, G.; Ireton, G. C.; Varani, G.; Stoddard, B. L.; Baker, D. Science 2003, 302, 1364–1368. (4) Dantas, G.; Watters, A. L.; Lunde, B. M.; Eletr, Z. M.; Isern, N. G.; Roseman, T.; Lipfert, J.; Doniach, S.; Tompa, M.; Kuhlman, B.; Stoddard, B. L.; Varani, G.; Baker, D. J. Mol. Biol. 2006, 362, 1004–1024. (5) Mohanty, S.; Meinke, J. H.; Zimmermann, O.; Hansmann, U. H. E. Proc. Natl. Acad. Sci. U.S.A. 2008, 105, 8004–8007. (6) Gosavi, S.; Chavez, L.; Jennings, P.; Onuchic, J. J. Mol. Biol. 2006, 357, 986–996. (7) Daggett, V.; Fersht, A. Trends Biochem. Sci. 2003, 28, 18–25. (8) Daggett, V. Acc. Chem. Res. 2002, 35, 422–429. (9) Irba¨ck, A.; Samuelsson, B.; Sjunnesson, F.; Wallin, S. Biophys. J. 2003, 85, 1466–1473. (10) Irba¨ck, A.; Mohanty, S. Biophys. J. 2005, 88, 1560–1569. (11) Mohanty, S.; Hansmann, U. H. E. Phys. ReV. E 2007, 76, 012901. (12) Hansmann, U. H. E. Chem. Phys. Lett. 1997, 281, 140–150. (13) Trebst, S.; Troyer, M.; Hansmann, U. H. E. J. Chem. Phys. 2006, 124, 174903. (14) Nadler, W.; Hansmann, U. H. E. Phys. ReV. E 2007, 76, 65701. (15) Nadler, W.; Hansmann, U. H. E. Phys. ReV. E 2007, 75, 26109. (16) Favrin, G.; Irba¨ck, A.; Sjunnesson, F. J. Comput. Chem. 2001, 114, 8154–8158. (17) Takada, S.; Luthey-Schulten, Z.; Wolynes, P. G. J. Chem. Phys. 1999, 110, 11616–11629. (18) Chikenji, G.; Fujitsuka, Y.; Takada, S. Chem. Phys. 2004, 307, 157–162. (19) Irba¨ck, A.; Mohanty, S. J. Comput. Chem. 2006, 27, 1548–1555. (20) Plotkin, S. Proteins: Struct., Funct. Genet. 2001, 45, 337–345. (21) Plotkin, S.; Onuchic, J. Q. ReV. Biophys. 2002, 35, 111–167. (22) Hamada, D.; Segawa, S.; Goto, Y. Nat. Struct. Biol. 1996, 3, 868– 873. (23) Kuwata, K.; Hoshino, M.; Era, S.; Batt, C.; Goto, Y. J. Mol. Biol. 1998, 283, 731–739. (24) Kuwata, K.; Shastry, R.; Cheng, H.; Hoshino, M.; Batt, C.; Goto, Y.; Roder, H. Nat. Struct. Biol. 2001, 8, 151–155. (25) Chikenji, G.; Kikuchi, M. Proc. Natl. Acad. Sci. U.S.A. 2000, 97, 14273–14277. (26) Liwo, A.; Khalili, M.; Scheraga, H. Proc. Natl. Acad. Sci. U.S.A. 2005, 102, 2362–2367. (27) Plaxco, K.; Simons, K.; Baker, D. J. Mol. Biol. 1998, 277, 985– 994. (28) Minor, D.; Kim, P. Nature (London) 1996, 380, 730–734. (29) Watters, A.; Deka, P.; Corrent, C.; Callender, D.; Varani, G.; Sosnick, T.; Baker, D. Cell 2007, 128, 613–624.
JP804661T