8288
J. Phys. Chem. B 2009, 113, 8288–8295
Energy Landscape of the Trpzip2 Peptide Hugh Nymeyer* The Institute of Molecular Biophysics and the Department of Scientific Computing, Florida State UniVersity, Tallahassee, Florida 32306-4380 ReceiVed: July 29, 2008; ReVised Manuscript ReceiVed: December 18, 2008
Replica exchange simulations are used to study the energy landscape of trpzip2, a model beta-hairpin system, using the AMBER99sb force field and explicit solvent. The total simulation time is 300 ns per replica (approximately 10 µs total). The trp side chains are observed to adopt multiple packing arrangements with a freezing temperature below 273 K in the simulated system. The secondary structure and native hydrogen bonds melt out cooperatively around 273 K. The residual beta-strand structure and antiparallel bonding persist at high temperature. These results provide a model for the three apparent melting transitions observed experimentally in this system. The dominant folding mechanism of trpzip2 in this model appears to be zipping, which is consistent with recent measurements on similar hairpins. Structures for which the turn is native-like and the termini are non-native-like collectively form a metastable intermediate. Most of the stabilizing enthalpy is gained after the formation of the turn. Equilibrium thermodynamic quantities are compared against experiment. Although the AMBER99sb force field reproduces the native structure with good fidelity, the stability of the native state is significantly underpredicted with a melting temperature near 273 K, and the relative heat capacity is only about one tenth of its experimental value. Introduction In spite of the significant progress made in understanding the thermodynamics and kinetics of globular proteins, many of the details remain unclear. To clarify the mechanisms by which globular proteins stabilize their native structures and fold, a number of small, fast-folding peptides and proteins have been discovered or designed and studied as model systems. Trpzip2 is a small (12 residue) model peptide that was produced by modifying the terminal hairpin of GB1.1 It forms a beta-hairpin at lower temperatures. The thermal and denaturant induced melting of this hairpin has been studied by circular dichroism (CD),1,2 tryptophan fluorescence,2-4 and various forms of 1D and 2D infrared spectroscopy (IR).2,4,5 These studies found indications of noncooperativity in the melting of trpzip2 at zero denaturant concentration. In particular, CD indicates some partial melting or loosening of the native tryptophan packing occurring from 15-40 °C;2 however, IR and CD both show that the secondary structure melts from 60 to 70 °C.1,2 Residual tryptophan contacts2 and antiparallel betastrand hydrogen bonds5 appear to persist at high temperatures in the absence of denaturant, although it is not clear whether these are primarily native or non-native contacts. Trpzip2 has been a popular system to study via simulation. The earliest simulations2 used the replica exchange molecular dynamics (REMD) method with the AMBER966 force field and an implicit generalized Born/surface area model7 to study the equilibrium ensemble of trpzip2. These simulations suggest that trpzip2 has a fairly rough surface at low temperature with multiple stable conformations and diverse non-native-like backbone conformations. A REMD simulation with the AMBER96 force field was also carried out with explicit solvent.8 Potential of mean force (PMF) calculations suggested that the folding/unfolding transition was cooperative, but a partially * Corresponding author. E-mail:
[email protected]. Phone: 850-5908160. Fax: 850-644-0098.
folded intermediate was observed that had a natively structured turn and frayed termini. Unfolding simulations at 500 K in the same model8 observed unzipping (melting from the termini to the turn), suggesting that the dominant folding mechanism is zipping.9 Other studies10,11 with the same force field and explicit solvent confirmed that folding occurs by some combination of hydrophobic collapse and turn formation, although the exact ordering of events was not clear. In contrast to the simulations with explicit solvent, room temperature folding simulations12 using the AMBER96 force field and a generalized Born implicit solvent model observed folding to be heterogeneous; i.e., the folding did not appear to occur mostly by one mechanism. Similarly, room temperature folding simulations4 using the OPLS all-atom force field13 and a generalized Born model showed heterogeneous folding behavior with only a slight preference for zipping. These results are supported by another group14 that used a REMD-like method to study the equilibrium properties using the same force field and solvent model. Their results suggested that although the turn is formed first (including the first hydrogen bond and the trp-trp pair closest to the turn) the hydrogen bonds do not zipup, but rather form out of sequence. Thus, there is a fundamental disagreement between the implicit and explicit solvent simulationssthe implicit simulations observe a rugged landscape without a strong preference for zipping, but explicit solvent simulations show a relatively cooperative transition with a preference for zipping. This is not completely unexpected since implicit solvent models have shown substantial differences with explicit solvent simulations in other contexts15-19 as well. Also, although the implicit simulations provided a model for understanding the multiple melting transitions observed experimentally, it was not clear to this author how these results could be explained with the explicit solvent simulations. For these reasons, a new simulation was performed in explicit solvent. This simulation used a different force field (AMBER99sb20) with a well-balanced secondary
10.1021/jp806749b CCC: $40.75 2009 American Chemical Society Published on Web 05/26/2009
Energy Landscape of the Trpzip2 Peptide structure preference. Special attention in the analysis is made to examine convergence and discrepancies between this simulation and experiments, which might impact the reliability of these simulation results and the limitations of current force fields.
J. Phys. Chem. B, Vol. 113, No. 24, 2009 8289
kfδt exp[-kfτ] and for each stretch of time τ for which the trajectory is unfolded, the probability of the trajectory gains a factor of
Methods Simulation Protocol. Trpzip2 has the sequence SWTWENGK-WTWK-NH2. The N-terminus was protonated. This was placed in a periodic cubic box containing 1008 TIP3P waters,21 two Na+ ions, and 4 Cl- ions to produce a neutral system. The initial conditions of the simulation were taken to be the final state of an REMD simulation that had been run for approximately 100 ns per replica using the AMBER03 force field22 starting from the native state. This initial state had three folded replicas and 29 unfolded replicas. This initial state was simulated using REMD23 with 32 replicas exponentially spaced from 273 to 474.6 K. This was done at constant volume. The volume was fixed at the average volume taken by the system in the folded state at 300 K and under 1 bar of pressure in the initial simulation. The time step was 2 fs with constraints on all bonds involving hydrogens, enforced using the LINCS algorithm.24 Waters were kept rigid with the SETTLE algorithm.25 Replica exchanges between replicas 0 and 1, 2 and 3, etc., were attempted at time 0 and at multiples of 100 fs. Replica exchanges between replicas 1 and 2, 3 and 4, etc., were attempted at time 50 fs and additional multiples of 100 fs. On average, each attempted replica exchange was accepted between 31% and 43% of the time. The lowest acceptance probability (31%) occurred between the lowest two temperatures; the highest acceptance probability (43%) occurred between the highest two temperatures. Temperature was maintained using the Nose-Hoover algorithm.26,27 Particle mesh Ewald28 was used with fourth-order charge interpolation, a grid of size 32 × 32 × 32, and a real-space cutoff of 14 Å. LeonardJones interactions were smoothly switched off from 10 to 12 Å. The pairlist was reconstructed every 10 integration steps for all atoms within a 14 Å radius. All simulations were performed with GROMACS v3.3.1,29 modified to correctly scale the Nose-Hoover momentum during replica exchanges. The AMBER99sb20 force field, which had been previously ported to GROMACS,30 was used for the peptide. The water model used was TIP3P.21 Each replica was simulated for 300 nssa total time of 9.6 µs. The first 100 ns was discarded as equilibration. The final 200 ns was used as production and for all the analysis except for the calculations in Figure 1, which used all 300 ns data. Bayesian Analysis. To estimate the probable values of the folding and unfolding rates of a replica, we use Bayes’ rule to write the probable values for the single-replica folding and unfolding rates kf and ku in terms of N trajectories x1(t) through xN(t).
p(kf, ku |x1(t)...xN(t)) )
kuδt exp[-kuτ] The contribution of each stretch of trajectory that runs to the end of the simulated time without transitioning can be found by integrating over all possible trajectory histories beyond that point. The probability distribution of relaxation times and fraction of folded replicas can be found from the probability distribution of values of kf and ku. Clustering. Clustering was performed using the method described in Daura et al.31 Nearest neighbors were taken to be
p(x1(t)|kf, ku)...p(xN(t)|kf, ku)p(kf)p(ku) ∫∫p(x1(t)|kf, ku)p(kf)p(ku)dkfdku... ∫∫p(x1(t)|kf, ku)p(kf)p(ku)dkfdku
The priors p(kf) and p(ku) are assumed to be constant from 1 to 10-6 ns-1. The trajectories are functions x(t) ∈ {0,1} indicating whether the replica is folded (0) or unfolded (1). These trajectories are discretized into steps of time δt. For each stretch of time τ for which the trajectory is folded, the probability of the trajectory gains a factor of
Figure 1. (A) Bayesian analysis of the probable folding and unfolding times of a replica. Details of the calculations are provided in the Methods section. (B) The same analysis applied to determine the probable values of the relaxation time of the folded population. This time determines the rate at which the folded/unfolded population relaxes to its equilibrium value. The relaxation time is greater than 250 ns with 95% probability and less than 1036 ns with 95% probability. (C) The same analysis applied to determine the expected number of folded replicas that would be observed in a long simulation.
8290
J. Phys. Chem. B, Vol. 113, No. 24, 2009
Nymeyer
any two structures with a heavy atom root mean squared deviation less than 2 Å. The structures for clustering were taken from the last 200 ns of the simulation at all temperatures from 273 through 350.5 K. For clustering, structures were sampled at 100 ps intervals in each replica. 5078 clusters were found in total. 1486 clusters were found containing two or more structures. Figure 9 was produced by plotting only clusters with three or more structures. In Figure 9, each cluster was connected to the three other clusters that were closest to it, and it was connected to any other cluster which was closer than 3.5 Å. The distance between clusters was chosen to be the heavy atom root mean squared distance between the representative structures of the two clusters. The resulting graph was embedded into a two-dimensional representation by minimizing the sum of the squared deviations among all the cluster connections. This minimization was done using the Graphviz program (www. graphviz.org). Thermodynamic Fitting. One advantage of REMD simulations is that the temperature dependence of various populations can be used to infer the relative enthalpies, entropies, and heat capacities. This type of analysis is done in Figures 4, 6, and 8. In each case, each configuration of the system is categorized as one class of a disjoint set. For example, in Figure 4, the states are classified as either folded or unfolded. Similarly, in Figure 6, each state is placed into a bin of the potential of mean force based on its heavy atom root mean squared deviation (rmsd) to the native structure and the rmsd of the turn regions. In all cases, there are by construction a discrete, finite set of states into which each configuration of the system can be placed. The population of state i in equilibrium at temperature T is pi(T). The following equations are assumed
1 exp[-Ai(T)/RT] ΩT Ai(T) ) Ei(T) - TSi(T) Ei(T) = Ei(T0) + CV,i(T0)(T - T0) Si(T) = Si(T0) + CV,i(T0) ln[T/T0]
pi(T) )
Ai(T) is the Helmholtz free energy of state i at temperature T. Ei(T) is the energy of state i at temperature T. Si(T) is the entropy of state i at temperature T. CV,i(T) is the constant volume heat capacity of state i at temperature T. The temperature dependence of CV,i(T) is assumed to be weak. ΩT is a (temperaturedependent) normalization. The temperature dependence of pi(T) is fitted using three adjustable parameters: Ei(T0), Si(T0), and CV,i(T0). (In certain circumstances, CV,i(T) may be assumed to be zero, in which case this is a two-parameter fit). It should be noted that the fitting procedure cannot determine the absolute value of any of these quantities. For example, the following transformation leaves all the temperature-dependent probabilities invariant
Ei(T0) ⇒ Ei(T0) + ∆E Si(T0) ⇒ Si(T0) + ∆S CV,i(T0) ⇒ CV,i(T0) + ∆CV
[
ΩT ⇒ ΩT exp -
∆E + ∆CV(T - T0) - T∆S T∆CV ln[T/T0] RT
]
However, this transformation does not alter the relative values of the fitted quantities. In practice, ΩT is first fixed at a value
Figure 2. Native structure from the simulation (right) compared with one of the NMR structures (structure 1 from 1LE1.pdb). The structures are aligned and shown from the top (top) and side (bottom). Trp side chains are shown in a space-filling format. Hydrogen atoms are removed for clarity. The native structure of the simulation is taken to be the representative structure from the largest cluster. The most significant difference between the simulated and NMR-determined native structures is the different packing arrangement of the two terminal Trp residues (edge on edge versus edge on face), which allows a partial stacking interaction between the two central Trp residues. The Trp closest to the turn also favors a different χ1 than this NMR structure.
of unity, and the other quantities are subsequently fitted. This procedure exactly mirrors the experimental procedure to determine thermodynamic estimates from measured state populations. Figure 4 and Figure 6 are fitted using all three parameters. Figure 8 is fitted assuming that ∆CV is zero. Results and Discussion It is important in any simulation of equilibrium properties to carefully assess convergence. This is especially important in a REMD simulation because temperature sorting can provide the appearance of convergence even though no convergence has occurred. A practical approach to assessing convergence in most situations is to measure the autocorrelation of the slowest relaxing structural quantity, which is usually a measure of the amount of native structure.32 Although it is common to measure temperature cycling of the replicas as a proxy for structural relaxation, this often underestimates the relaxation time. The REMD simulation reported here was started with 3 folded replicas (heavy atom root mean squared deviation (rmsd) less
Energy Landscape of the Trpzip2 Peptide
Figure 3. χ1 and χ2 angles of the four tryptophan residues at 273 K in native-like structures (all structures with a heavy atom rmsd from the first NMR structure less than 3 Å). Populations of the dominant conformations are listed. This system shows many alternate packing arrangements of the tryptophan residues with nearly equivalent free energies. Nearly all of these alternatively packed conformations have native-like hydrogen bonding patterns.
Figure 4. Free energy of unfolding as a function of temperature from the simulation (triangles) and fit to a thermodynamic model (solid curve) as described in the Methods section. This is compared with the actual thermodynamic data (also model fit) of Trpzip2 (dashed curve). The simulated trpzip2 not only is less stable than the actual trpzip2 but also shows a more gradual folding transition and an absence of cold denaturation at low temperatures. The changes in enthalpy and entropy of the computational model upon unfolding are roughly half of their experimental values (32.6 kJ/mol versus 70.2 kJ/mol and 116 J/mol/K versus 203 J/mol/K); the change in the heat capacity is roughly one tenth of its experimental value (102 J/mol/K versus 1176 J/mol/K). All quantities are reported at 345 K.
than 3 Å from structure 1 of 1LE1.pdb) and 29 unfolded replicas. Over the next 300 ns, three folding events and two unfolding events were observed. A folding event was a transition from the unfolded state to a state with a heavy atom rmsd 5 Å. This definition prevents one from counting fluctuations around the folded and unfolded states as actual barrier crossing events. No replica that underwent unfolding was observed to refold, and no replica that underwent folding was observed to unfold, which strongly suggests that these events were not fluctuations mistakenly classified as transition events. A Bayesian analysis was used to estimate the probable values of folding/unfolding times for a single replica that are consistent with the distribution of waiting times from the simulation (Figure 1A). The same calculation was used to -1 estimate the probable values of the relaxation time τ-1 relax ) τfolding -1 + τunfolding consistent with the same data (Figure 1B) and the probable values for the average number of folded replicas expected over a long simulation (Figure 1C). The expected
J. Phys. Chem. B, Vol. 113, No. 24, 2009 8291
Figure 5. Potential of mean force (PMF) for the trpzip2 peptide as a function of the root mean squared deviation (rmsd) of all heavy atoms and the rmsd of just the heavy atoms of the native turn plus flanking residues (Trp4 through Trp9). Data is shown for 273 K with contours drawn every 2 kJ/mol. This PMF suggests that the primary folding mechanism of the model is zipping, which starts from the beta turn and proceeds toward the N- and C-termini. Metastable minima occur when the turn is formed and the N- and C-termini are unstructured. The gray background indicates regions not sampled at 273 K. Data were binned using square bins of 0.1 Å by 0.1 Å.
Figure 6. Relative enthalpy as a function of the same coordinates as Figure 5. Relative enthalpy is determined by fitting the temperature dependence of the PMF to a thermodynamic model as described in the Methods section. Contours are drawn every 10 kJ/mol (with alternating line weights for clarity). From this projection, the activation enthalpy for folding appears small. This is caused by the enthalpic well with a total rmsd near 6 Å. An additional contribution to the enthalpy of folding may arise from the temperature dependence of the configurational diffusion constant. The system must undergo a significant entropic search to locate the correct turn structure. Most of the stabilizing enthalpy occurs after the formation of the native turn conformation.
relaxation time of 285 ns (250 to 1036 ns) is significantly longer than most replica simulations are currently run. This slow relaxation is consistent with predictions from semianalytic models of REMD simulations32-35 and with previous simulations36 of a ten-residue beta-hairpin in explicit solvent that suggested total simulation lengths of 400 ns per replica are necessary to equilibrate this sytem fully using REMD. However, the semianalytic models of REMD suggest that the observed relaxation times are a strong function of the activation enthalpies for folding and unfolding, so it is very possible that the longest relaxation time is much shorter in other computational models, e.g, models with different potentials or implicit solvent. Because our simulations are only equilibrated for 100 ns (and run for 300 ns total), and the estimated relaxation time is anywhere from 250 to 1036 ns, the results presented here are unlikely to be quantitatively precise. For example, Figure 1
8292
J. Phys. Chem. B, Vol. 113, No. 24, 2009
Figure 7. Relative free energy of states with various native backbone hydrogen bond patterns at 273 K. Listed from the turn to the termini, a 0 indicates no native hydrogen bond, and a 1 indicates the presence of a native hydrogen bond. A permissive definition of hydrogen bonding (