Estimating Free-Energy Barrier Heights for an Ultrafast Folding Protein

16 Feb 2008 - ... Paolo V. Pedone , Carla Isernia , Danilo Milardi , and Roberto Fattorusso ... Athi N. Naganathan , Peng Li , Raúl Perez-Jimenez , Jo...
0 downloads 0 Views 385KB Size
5938

J. Phys. Chem. B 2008, 112, 5938-5949

Estimating Free-Energy Barrier Heights for an Ultrafast Folding Protein from Calorimetric and Kinetic Data† Raquel Godoy-Ruiz,‡,§,# Eric R. Henry,‡,| Jan Kubelka,|,⊥ James Hofrichter,| Victor Mun˜ oz,#,+ Jose M. Sanchez-Ruiz,*,§ and William A. Eaton*,| Departamento de Quimica Fisica Facultad de Ciencias, UniVersidad de Granada, 18071 Granada, Spain, Laboratory of Chemical Physics, National Institute of Diabetes and DigestiVe and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892-0520, Department of Chemistry, UniVersity of Wyoming, Laramie, Wyoming 82071, Department of Chemistry and Biochemistry and Center for Biomolecular Structure and Organization, UniVersity of Maryland, College Park, Maryland 20742, and Centro de InVestigaciones Biolo´ gicas, Consejo Superior de InVestigaciones Cientı´ficas, Ramiro de Maeztu 9, 28040 Madrid, Spain ReceiVed: July 23, 2007

Differential scanning calorimetry was used to measure the temperature dependence of the absolute heat capacity of the 35-residue subdomain of the villin headpiece, a protein that folds in 5 µs and is therefore assumed to have a small free-energy barrier separating folded and unfolded states. To obtain an estimate of the barrier height from the calorimetric data, two models, a variable-barrier model and an Ising-like model, were used to fit the heat capacity in excess of the folded state over the temperature range 15-125 °C. The variablebarrier model is based on an empirical mathematical form for the density of states, with four adjustable parameters and the enthalpy (H) as a reaction coordinate. The Ising-like model is based on the inter-residue contact map of the X-ray structure with exact enumeration of ∼105 possible conformations, with two adjustable parameters in the partition function, and either the fraction of native contacts (Q) or the number of ordered residues (P) as reaction coordinates. The variable-barrier model provides an excellent fit to the data and yields a barrier height at the folding temperature ranging from 0.4 to 1.1 kcal mol-1, while the Ising-like model provides a less good fit and yields barrier heights of 2.3 ( 0.1 kcal mol-1 and 2.1 ( 0.1 kcal mol-1 for the Q and P reaction coordinates, respectively. In both models, the barrier to folding increases with increasing temperature. Assuming a sufficiently large activation energy for diffusion on the free-energy surfaces, both models are consistent with the observation of a temperature-independent folding rate in previously published laser temperature-jump experiments. Analysis of this kinetic data, using an approximate form for the preexponential factor of Kramers theory and the 70 ns relaxation time for the fast phase that precedes the unfolding/ refolding relaxation to determine the diffusion coefficient, results in a barrier height of 1.6 ( 0.3 kcal mol-1 for an unspecified reaction coordinate. Although no independent test of the validity of the H, Q, or P reaction coordinates is given, the barrier-height estimates obtained with the three reaction coordinates are in quite good agreement with the value derived from a Kramers analysis of the kinetics that makes no assumptions about the reaction coordinate. However, the higher estimates obtained using Q or P appear more consistent with the finding of barrier-crossing kinetics of a villin mutant that folds in 700 ns, corresponding to a 1.3 kcal mol-1 reduction in the folding barrier relative to wild-type. All of the results suggest that the free-energy barrier to folding is sufficiently low that it should be possible to engineer this protein or find solution conditions that would eliminate the barrier to create the “downhill” folding scenario of Wolynes and Onuchic.

Introduction Construction of free-energy surfaces has proven to be an important method for investigating the thermodynamics, kinetics, and mechanisms of protein folding.1-12 The free-energy surface immediately reveals the number of distinct thermodynamic states, the free-energy differences between states from which †

Part of the “Attila Szabo Festschrift”. * Corresponding authors. E-mail: [email protected] (W.A.E.), [email protected] (J.M.S.-R.). ‡ Contributed equally to this work. § Universidad de Granada. | National Institutes of Health. ⊥ University of Wyoming. # University of Maryland. + Consejo Superior de Investigaciones Cientı´ficas.

equilibrium populations can be calculated, and the positions and heights of the free-energy barriers separating these states, the major factor in determining rates. Determining the heights of folding free-energy barriers is important for several reasons. One is to test the accuracy of theoretical free-energy surfaces, such as those obtained from statistical mechanical models,9,13-17 from molecular dynamics simulations that exhaustively sample conformational space4,11 and from Langevin simulations of simplified representations of proteins.10,18 More recently, the use of distributed computing has made it possible to calculate multiple folding trajectories for the fastest folding proteins from atomistic simulations.19,20 This is an important development because, if accurate, everything that one would want to know about a folding mechanism is contained in the simulations. However, many long trajectories

10.1021/jp0757715 CCC: $40.75 © 2008 American Chemical Society Published on Web 02/16/2008

Estimating Free-Energy Barrier Heights

J. Phys. Chem. B, Vol. 112, No. 19, 2008 5939

must be calculated to obtain sufficient statistical sampling to describe kinetics. Prime candidates for such studies, therefore, are small proteins folding in the shortest possible time. Since the folding rate cannot be increased very much once the freeenergy barrier disappears, knowing the free-energy barrier height suggests how much protein engineering or changing solvent conditions can potentially increase the rate for a given protein. Moreover, in barrierless folding (the so-called “downhill” folding scenario of Wolynes and Onuchic3,21), it may be possible to observe all intermediate structures progressing from the unfolded to the folded state.22 This prospect has motivated a number of studies to find and characterize proteins that fold with low or no barriers.23-30 Finally, the use of equilibrium and kinetic data on mutants to infer transition-state structures (φvalue analysis)31-34 implicitly assumes high barriers and would require modification when the barriers are low. While free-energy differences between states can be obtained from equilibrium experiments, free-energy barrier heights to folding ∆G/F cannot be directly obtained from kinetic experiments because the value of the pre-exponential factor (τ0) in the usual relation for the folding time (τF),

( )

∆G/F τF ) τ0 exp RT

(1)

is not known and can only be estimated from experiments on conformational dynamics of the unfolded and folded states using a simplified Kramers relation for τ0.35-38 An alternative approach to obtain barrier heights is based on the realization by Mun˜oz and Sanchez-Ruiz that for proteins with low barriers, conformations at and near the top of the free-energy barrier contribute to the equilibrium properties of the system. They introduced a “variable-barrier model” to estimate barrier heights from measurements of the temperature dependence of the excess heat capacity.39,40 Their model assumes a form for the density of states that produces a double-well free energy-versus-enthalpy profile. Since several theoretical and simulation studies have suggested that the energy (and therefore the enthalpy) acts as a valid reaction coordinate5,10,16,17,41 (see, however, ref 42), their calculated free-energy barrier should be relevant to describing folding kinetics. Although the variable-barrier model yields freeenergy profiles similar to those calculated from simulations (see, e.g., ref 10), it is also important to generate profiles from other plausible models to have an independent estimate of barrier heights consistent with calorimetric data. Here, we present calorimetric data for the ultrafast folding 35-residue subdomain from the villin headpiece, the smallest naturally occurring polypeptide that autonomously folds into a globular structure.43,44 It consists of three helices surrounding a compact hydrophobic core (Figure 1).45,46 Because of its small size and fast folding, the villin subdomain has been the subject of numerous folding simulation studies, including all-atom molecular dynamics simulations in explicit solvent.47-67 We have assumed from its folding rate of 1/(5 µs)68-72 that this protein has a small free-energy barrier and is therefore amenable to estimation of its barrier height from calorimetric data. We fit the data with both the variable-barrier model and an Isinglike theoretical model that has been remarkably successful in explaining relative protein folding rates and has also had some success in explaining the relative effects of mutations on rates and equilibrium constants (i.e., φ values).9,16,73-75 We shall see that both methods produce low barriers, as does an estimate based on an analysis of the laser temperature-jump kinetics using Kramers theory to obtain the pre-exponential factor.38,68

Figure 1. (a) Structure of villin subdomain (PDB 1WY4) solved by X-ray diffraction at 0.15 nm resolution.46 (b) Contact map. A contact between residues exists if the R carbons are separated by 0.8 nm or less.

Experimental Section Peptide Synthesis and Purification. The 35-residue synthetic peptide was synthesized on an Applied Biosystems AB 433A synthesizer using standard solid-phase (fluorenylmethoxy)carbonyl (FMOC) chemistry. The samples were purified by HPLC to >95% purity as evidenced by mass spectrometry. The sequence is (LSDED FKAVF GMTRS AFANL PLWKQ QHLKK EKGLF), which differs from the natural sequence by a replacement of asparagine 27 with histidine. Calorimetry. Differential scanning calorimetry experiments were carried out with a VP-DSC (Valerian-Plotnikov differential scanning calorimeter), capillary-cell microcalorimeter from MicroCal (Northampton, MA) at a scan rate of 200°/h. Protein solutions for the calorimetric experiments were prepared by exhaustive dialysis against the buffer (20 mM sodium acetate, pH 4.9), and the buffer from the last dialysis step was used in the reference cell of the calorimeter. Calorimetric cells (operating

5940 J. Phys. Chem. B, Vol. 112, No. 19, 2008

Godoy-Ruiz et al.

Figure 2. Illustrative plots of apparent heat capacity versus protein concentration. The continuous lines are the best fits of zero-intercept straight lines.

volume 0.135 mL) were kept under an excess pressure of 4 bar to prevent degassing during the scan and also to permit the scans to be extended to a temperature of 125 °C without boiling. Several buffer-buffer baselines were obtained before each run with protein solution in order to ascertain proper equilibration of the instrument. Reheating runs were carried out to determine the calorimetric reversibility of the denaturation process: refolding of the protein was essentially complete with no significant decrease in transition height observed after four successive scans of the same sample. Finally, an additional buffer-buffer baseline was obtained immediately after the last protein run to check that no significant change in instrumental baseline had occurred. Baseline reproducibility was excellent and similar to that previously described.76,77 This protocol was used to obtain several thermograms corresponding to different protein concentrations in the range 0.2-0.8 mg/mL. Protein concentrations were determined from the absorbance at 280 nm using an extinction coefficient of 5500 M-1cm-1. Absolute heat-capacity values and their associated uncertainties were determined from the protein-concentration dependence of the apparent heat capacities, as previously described.77 See Figure 2 for illustrative examples of this calculation. Empirical Estimates of the Heat Capacities for the Folded and Unfolded States. The absolute heat capacity of the folded state (CpF; i.e., completely folded protein) was calculated using the empirical equation proposed by Freire,78 in which the heat capacity as a function of temperature (T) depends only on the molecular weight (Mr) of the protein (in grams per mole):

CFp ) (b + c(T - 273.15))Mr

cal K-1 mol-1

(2)

with b ) 0.316 ( 0.013 cal K-1 g-1 and c ) 1.6 (( 0.3) × 10-3 cal K-2 g-1, where the uncertainties arising from the deviations between data and correlation line are reported in ref 78. The parameters of the upshifted and downshifted baselines corresponding to one standard deviation above or below the correlation line are therefore b ) 0.329 cal K-1 g-1 and c ) 1.9 × 10-3 cal K-2 g-1, and b ) 0.303 cal K-1 g-1 and c ) 1.3 × 10-3 cal K-2 g-1. For DSC experiments reporting absolute heat capacities, eq 2 should provide a good approximation to the native baselines, especially to their temperature dependence. However, the database from which this equation was derived came from absolute heat-capacity measurements on proteins that are much larger than the villin subdomain. The villin subdomain with only 35 residues has a larger fraction of its residues exposed to solvent and could therefore be expected to have increased

Figure 3. Absolute heat capacity vs temperature and fit with twostate thermodynamic model. The absolute heat capacities and associated uncertainties were calculated from linear fits to the protein-concentration dependencies of the apparent heat capacities (Figure 2). The continuous green line through the data points corresponds to the best fit using a two-state thermodynamic model (eqs 4-7) for the excess heat capacity, defined as the observed absolute heat capacity minus the absolute heat capacity of the folded state. The heat capacity of the folded state (green continuous straight line) was calculated from eq 2. The red long-dashed curves and blue short-dashed curves were calculated using the extremes for the absolute folded-state heat capacity based on the uncertainties given by Freire.78 The least-squares best fit parameters are: Tf ) 337 K, ∆HTf ) 28.2 kcal mol-1, and ∆Cp,Tf ) 0.636 kcal mol-1 K-1, and R ) -0.00591 kcal mol-1 K-2 for the green continuous curves with b ) 0.316 cal K-1 g-1 and c ) 1.6 × 10-3 cal K-2 g-1 in eq 2, Tf ) 339.7 K, ∆HTf ) 28.9 kcal mol-1, and ∆Cp,Tf ) 0.457 kcal mol-1 K-1, and R ) -0.0069 kcal mol-1 K-2 for the red long-dashed curve with b ) 0.329 cal K-1 g-1 and c ) 1.9 × 10-3 cal K-2 g-1, and Tf ) 335.6 K, ∆HTf ) 28.0 kcal mol-1, and ∆Cp,Tf ) 0.764 kcal mol-1 K-1, and R ) -0.0040 kcal mol-1 K-2 for the blue short-dashed curves with b ) 0.303 cal K-1 g-1 and c ) 1.3 × 10-3 cal K-2 g-1. The labels MP (for Makahtadze and Privalov80) and HHH (for Ha¨ckel, Hinz, and Hedwig79) to the nearly horizontal black lines correspond to the absolute heat capacity of the unfolded state calculated using the empirical eq 3 and the parameters given by these authors.

conformational flexibility and a larger heat capacity per residue than larger proteins. We would expect, then, that the extreme upshifted baseline might be more appropriate for our analysis of the villin subdomain data. The heat capacities expected for fully solvated unfolded states were calculated from the amino acid composition using: 20

CUp )

ni‚Cp,i + (N - 1)‚Cp,NH-CO + Cp,NH ∑ i)1

2

+ Cp,COOH (3)

where N is the number of amino acid residues, the index i refers to the 20 amino acids, ni is the number of amino acids of type i, Cp,i is the molar heat capacity of the side chain of amino acid i, and Cp,NH-CO, Cp,NH2, and Cp,COOH stand for the molar heat capacities of the peptide unit and the terminal amino and the terminal carboxyl groups, respectively. Both the group contributions given by Ha¨ckel et al.79 and those reported by Makhatadze and Privalov80 were used in the calculation; in the latter case, we used for the temperature-dependent Cp,i, Cp,NH-CO, Cp,NH2, and Cp,COOH values the polynomial representation of the Makhatadze and Privalov data given by Freire.78 Results Calorimetric Data. The absolute heat capacity-versustemperature profile for the thermal unfolding of the villin

Estimating Free-Energy Barrier Heights

J. Phys. Chem. B, Vol. 112, No. 19, 2008 5941

headpiece subdomain shows a completely reversible, broad transition, indicating a low value for the enthalpy of denaturation as expected for a protein of only 35 residues (Figure 3). The folded and unfolded baselines are, therefore, not unambiguously defined in the experimental thermograms. Nevertheless, at the lower temperatures, the data do approach Freire’s prediction for the heat capacity of the folded state, which accounts for the contributions from internal degrees of freedom arising from bond and angle vibrations, as well as from hydration (eq 2).78 The experimental heat capacities at the higher temperatures are reasonably close to the prediction for the unfolded state based on the model-compound values given by Makhatadze and Privalov (Figure 3),80 although they are lower than the prediction based on the model-compound values of Hackel et al.79 In the following analysis, we shall only consider the heat capacity in excess of the folded-state heat capacity calculated from the Freire empirical eq 2. Two-State Thermodynamic Model. In order to characterize the data in terms of familiar parameters and facilitate comparison, we analyze the DSC profile using a classic thermodynamic two-state model. In this model of thermal unfolding, there are two well-defined populations of molecules, folded and unfolded, corresponding to two thermodynamic states. We fit the heatcapacity data using the standard relations,81,82 with the assumption that the absolute heat capacity of the folded state is linear in temperature, as in eq 2, and that of the unfolded state is independent of temperature (corresponding to a ∆Cp that decreases linearly in absolute magnitude with increasing temperature). For a two-state system, the excess heat capacity (the observed heat capacity minus the heat capacity of the fully folded protein) is 2

Cex p ) ∆Cp

(∆H) K K + 2 1+K RT (1 + K)2

(4)

where ∆Cp (t CUp - CFp ) is given by:

∆Cp ) ∆Cp,Tf + R(T - Tf)

(5)

Here, Tf is the temperature at which the populations of unfolded and folded states are equal and ∆Cp,Tf is the heat capacity difference (t CUp (Tf) - CFp (Tf)) at Tf. K is the unfolding equilibrium constant (≡ [U ]/[F])

K ) exp (∆S/R - ∆H/RT)

(6)

with ∆H (≡ HU - HF) and ∆S (≡ SU - SF) the enthalpy and entropy differences, respectively, between the unfolded and folded states given by:

1 ∆H ) ∆HTf + ∆Cp,Tf(T - Tf) + R(T - Tf)2 2 ∆S )

∆HTf Tf

()

+ (∆Cp,Tf - RTf) ln

T + R(T - Tf) Tf

Figure 4. Fits to heat capacity-versus-temperature profile with variablebarrier model. The continuous green line through the data points correspond to the best fit using eqs 8-12 for the excess heat capacity, defined as the observed absolute heat capacity minus the absolute heat capacity of the folded state. The heat capacity of the folded state (green continuous straight line) was calculated from eq 2. The red long-dashed curves and blue short-dashed curves were calculated using the extremes for the absolute folded-state heat capacity based on the uncertainties given by Freire. The least-squares best fit parameters are: T0 ) 73.5 °C, RN ) 8.8 kcal mol-1, RP ) 32.0 kcal mol-1, β ) 0.50 kcal mol-1 for the green continuous curves with b ) 0.316 cal K-1 g-1 and c ) 1.6 × 10-3 cal K-2 g-1 in eq 2, T0 ) 74.1 °C, RN ) 13.5 kcal mol-1, RP ) 25.6 kcal mol-1, β ) 1.08 kcal mol-1 for the red long-dashed curves with b ) 0.329 cal K-1 g-1 and c ) 1.9 × 10-3 cal K-2 g-1, and T0 ) 72.7 °C, RN ) 4.6 kcal mol-1, RP ) 38.9 kcal mol-1, β ) 0.43 kcal mol-1 for the blue short-dashed curves with b ) 0.303 cal K-1 g-1 and c ) 1.3 × 10-3 cal K-2 g-1. The standard deviation of the fit was 0.0021 kcal mol-1 K-1. Evaluation of the integral of eq 11 was facilitated by representing G0(H) in terms of piecewise second-order polynomials (splines) and expressing the integral in terms of the error function and the imaginary error function. In addition, fitting was carried out using a simplex algorithm, instead of the time-consuming, exhaustive grid search used in the proof-of-principle analysis reported in the original publication.39 The fitting procedure will be described in more detail elsewhere.

the residuals increase as the assumed absolute heat capacity of the folded state decreases. At the lowest temperatures, the increase in heat capacity arising from the two diverging baselines indicates cold denaturation, which would occur at an even lower temperature if the unfolded state heat capacity is assumed to have a curved temperature dependence, as proposed by Makhatadze and Privalov.80 Calculating Free-Energy Barrier Heights from VariableBarrier Model. The two major assumptions of this semiempirical method of calculating barrier heights are that the enthalpy is a relevant reaction coordinate for the one-dimensional freeenergy surface and that the surface contains no more than two minima, corresponding to folded and unfolded states. The partition function for the variable-barrier model (ZVB) is (ignoring the p(∂V/∂T) term)

ZVB ) (7)

Tf, R, ∆Cp,Tf, and ∆HTf t (HTUf - HTFf) are the four adjustable parameters of the two-state model. As shown in Figure 3, varying the four parameters produces a fit with very low residuals using the upshifted baseline, but

∫ F(H) exp(-H/RT) dH

(8)

where F(H) is the density of enthalpy microstates. The excess heat capacity at constant pressure, Cex p , is calculated from the partition function using the relation:

Cex p )

d〈H〉 〈H2〉 - 〈H〉2 ) dT RT2

(9)

5942 J. Phys. Chem. B, Vol. 112, No. 19, 2008

Godoy-Ruiz et al.

where

〈Hn〉 )

1 ZVB

∫HnF(H) exp(-H/RT) dH

(10)

with the density of microstates given by39

F(H) ) exp(H/RT0 - G0(H)/RT0)

(11)

where T0 is a reference temperature. We use the functional form for G0(H) introduced by Mun˜oz and Sanchez-Ruiz:39

G0(H) ) -2β

()

()

H 2 H + |β| Ri Ri

i ) N for H e0

4

i ) P for H > 0

(12)

The free-energy profile is then calculated from the relation:

G(H,T) t -RT ln Π(H,T) G0(H)T T0 - T - RT ln ZVB (13) )H T T0

(

)

where Π(H,T) is the fractional population of microstates with enthalpy H at temperature T. Equation 13 produces a doublewell free-energy profile, with equal free energies at T0 for the two minima at negative and positive values of H, but different curvatures determined by Ri and therefore different heat capacities. The height of the barrier separating the two wells at T0 is determined by β. The excess heat capacity as a function of temperature was fit using eqs 10, 11, and 12, varying the four fitting parameters T0, β, RN, and RP. An excellent fit to the data was obtained (Figure 4) with all three parameters sets of eq 2 for the folded state and yields free-energy profiles (Figure 5a) characterized by very small barriers, ranging from essentially zero below 70 °C to about 3.3 kcal mol-1 at 100 °C. At the T0 ∼ 74 °C where the freeenergy barrier heights for folding and unfolding are equal (Figure 5c; T0 closely approximates the folding temperature, the difference being due to the difference in curvature of the folded and unfolded wells), the barrier obtained using Freire’s baseline is only 0.5 kcal mol-1, close to the thermal energy (RT ) 0.6 kcal mol-1). Although the probability distribution of enthalpy microstates is roughly bimodal, a significant population of intermediate-enthalpy microstates is observed (see Figure 5b). The fit using the upshifted baseline has slightly higher residuals and produces a higher barrier of 1.1 kcal mol-1. It must be noted that the analysis is very sensitive to the value of β and reasonable fits cannot be obtained with high barriers that produce sharply defined two-state behavior. Calculating Free-Energy Barrier Heights from Ising-Like Theoretical Model. The Ising-like model is based on the interresidue contact map of the native structure. The key postulate of the model is that each residue of the polypeptide chain can exist in one of two possible states, native (n) or non-native (c).9,16 To greatly reduce the number of possible configurations, the double-sequence approximation was employed, in which no more than two continuous stretches of native residues are allowed in each molecule (e.g., ...cnnncccnnnccc...). The free energy and thermodynamic weight of a stretch of native residues of length j beginning at residue i are, respectively,

Gji ) q - jT∆sconf

wji ) exp(-Gji/RT)

(14)

where q is the number of native inter-residue contacts,  is the energy of a contact, and ∆sconf is the entropy cost of fix-

Figure 5. Results of fit with variable-barrier model. (a) Free energy vs enthalpy, G(H,T) (eq 13) at different temperatures calculated from parameters corresponding to the fit with continuous green curves in Figure 4, (b) populations vs enthalpy, Π(H,T) using continuous green curves in Figure 4, (c) barrier heights vs temperature. In panel c, the line types correspond to those of the fits in Figure 4 for the three sets of parameters in eq 2 for the absolute heat capacity of the folded state.

ing a residue in its native conformation. The number of configurations for an N-residue protein containing at most M continuous stretches of native residues is given by M (N + 1)!/[(2m)!(N + 1 - 2m)!]; for the villin headpiece ∑m)0 subdomain in the double-sequence approximation (N ) 35, M ) 2), this yields 59 536 configurations or “microstates”.

Estimating Free-Energy Barrier Heights

J. Phys. Chem. B, Vol. 112, No. 19, 2008 5943

However, because the model allows contacts between residues in two different native segments separated by a disordered loop (as well as between residues within a single native segment), any configuration in which native segments are linked by native contacts is represented by two distinct microstates, one in which these inter-segment contacts are not formed and one in which the contacts are formed.16 This increases the total number of microstates to 92 696. Moreover, the microstate in which the inter-segment contacts are formed (with the concomitant fixing of the relative positions of the two segments) is further destabilized by the entropy loss arising from fixing the ends of the disordered loop linking the two segments in the sequence. The partition function for this Ising-like model (ZIL) is:

ZIL )

n

n-j(1)+1

j(1))1

i(1))1

∑ exp(-Gl/RT) ) 1 + ∑ ∑ microstates l n

n-j(1)+1 n-(i(1)+j(1))

∑ ∑

j(1))1

i(1))1



j(2))1

wj(1)i(1) +

n-j(2)+1



where N is a normalization constant, lc is the length of the fully extended chain, and lp is the persistence length. The entropy loss (or, strictly speaking, the free-energy change divided by T) due to loop closure is computed as ∆sloop ) R ln Kloop, where Kloop is the equilibrium constant for forming the encounter complex in the closed-loop state. For a given loop, the constraints on the end-to-end distance in the closed state are imposed by the conformation of the native residues anchoring both ends of the loop. We identify the closed-loop state with all chain conformations in which the end-to-end distance lies in the range a < r < a + δa, where a, the minimum contact distance for the loop, is taken to be the CR-CR distance (computed from the native structure of the molecule) between the two native residues that anchor the ends of the loop, and a + δa is the maximum distance for which the encounter complex between the two ends of the loop is formed. With this specification of the geometry of loop closure, the equilibrium constant is given by

wj(1)i(1)wj(2)i(2)(1 +

i(2))i(1)+j(1)+1

Vj(1)i(1);j(2)i(2)) (15)

σ Kloop ) 1-σ

∫aa+δaP(r) dr σ) ∫al P(r) dr

(18)

c

0

where Vj(1)i(1);j(2)i(2) is the weighting factor reflecting the change in free energy arising from the coupling between the native segment j(1)i(1) and the native segment j(2)i(2). If there are native contacts between residues in the two segments, this weighting factor is given by

Vj(1)i(1);j(2)i(2) ) exp (-∆Gj(1)i(1);j(2)i(2)/RT) ∆Gj(1)i(1);j(2)i(2) )

(

∑ contacts

4πNr2 l2c

2 9/2

(1 - (r/lc) )

(

exp -

t2m(7/2) + 2tm(5/2) + m(3/2) t2M(7/2) + 2tM(5/2) + M(3/2)

)

M(n) ≡ Γ(n) - γ(n,q(a0))

(16)

(19)

x

Otherwise, the inter-segment weighting factor is 0. Here, ∆sloop is the entropy loss due to constraining the endpoints of the disordered loop incorporating residues i(1) + j(1) to i(2) - 1, inclusive. To evaluate the partition function, the simplest description of interactions was employed, in which both non-native interactions and amino acid type are ignored. Maps of native interresidue contacts were derived from the X-ray crystallographic coordinates for the villin headpiece subdomain (PDB file 1YRF).46 Two residues were considered to be in contact if their R carbons are separated by 0.8 nm or less. (Contacts between adjacent residues and residues separated by one amino acid were ignored because of the high probability that their side chains are in contact in the unfolded state.) There are 68 such contacts (Figure 1). All contacts were assigned the same energy. The same conformational entropy decrease (∆sconf in eq 14) for the non-native to native transition was assigned to every residue of the protein. The entropy loss associated with forming a disordered loop between two native segments was calculated using the radial distribution function of end-to-end distances for a wormlike (semiflexible) polymer chain. A convenient approximate function that closely corresponds to computer simulations of a wormlike chain for persistence lengths both shorter and longer than the contour lengths is:83

P(r) )

σ)

m(n) ≡ γ(n,q(a + δa)) - γ(n,q(a))

 - T∆sloop

j(1)i(1)Tj(2)i(2)

which can be integrated to give:

3lc 4lp(1 - (r/lc)2)

)

(17)

where γ(n,x) ) ∫ zn-1 e-z dz is the incomplete gamma 0 function, Γ(n) ) γ(n, x f ∞) is the familiar gamma function, and the parameter t and function q(x) are defined by

t)

4lp 3lc

q(x) )

( )

1 x2 t l2 - x 2 c

(20)

The additional parameter a0 is the minimum sterically allowed contact distance between the ends of a loop, taken to be 0.4 nm. The persistence length lp was assigned the value 0.6 nm, the wormlike-chain persistence length derived from an analysis of rates of end-to-end contact formation in helix-forming peptides;84,85 δa was taken as 0.1 nm; the contour length lc for a loop was computed as n × (0.36 nm), where 0.36 nm is the projection of the CR-CR vector on the direction of an extended strand and n is the number of residues in the loop. Neglecting the p(∂V/∂T) term, the heat capacity at constant pressure can be calculated from the model partition function as:

Cex p )

( ) (

)

d ln ZIL d2 ln ZIL dE + RT2 ) 2RT dT dT dT2

(21)

We calculate the excess heat capacity from the derivatives of the partition function instead of from the energy variance, as in eq 9, in order to allow for the possibility of a temperaturedependent contact energy . For specified values of the only variable parameters of the model, ∆sconf and , the heat capacity of the protein was

5944 J. Phys. Chem. B, Vol. 112, No. 19, 2008

Godoy-Ruiz et al.

Figure 6. Fits to heat capacity-versus-temperature profile with Isinglike model. The continuous green line through the data points corresponds to the best fit using eqs 14, 15, and 21 for the excess heat capacity, defined as the observed absolute heat capacity minus the absolute heat capacity of the folded state. The heat capacity of the folded state (green continuous straight line) was calculated from eq 2. The red long-dashed curves and blue short-dashed curves were calculated using the extremes for the absolute folded state heat capacity based on the uncertainties given by Freire.78 The least-squares best fit parameters are: ∆sconf ) -4.03 cal mol-1 K-1 and  ) -0.68 kcal mol-1 for the green continuous curves with b ) 0.316 cal K-1 g-1 and c ) 1.6 × 10-3 cal K-2 g-1 in eq 2, ∆sconf ) -3.80 cal mol-1K-1 and  ) -0.647 kcal mol-1 for the red long-dashed curves with b ) 0.329 cal K-1 g-1 and c ) 1.9 × 10-3 cal K-2 g-1, and ∆sconf ) -4.18 cal mol-1 K-1 and  ) -0.707 kcal mol-1 for the blue short-dashed curves with b ) 0.303 cal K-1 g-1 and c ) 1.3 × 10-3 cal K-2 g-1.

computed at each experimental temperature from eq 21. Bestfit values of these two parameters were obtained by varying them to produce the optimal least-squares fit of this predicted heat capacity versus temperature profile to the measured excess heat capacity versus temperature (Figure 6). The fits converged to single minima for ∆sconf ) -4.0 ( 0.2 cal mol-1 K-1 and  ) -0.68 ( 0.04 kcal mol-1, where the uncertainties correspond to the extremes in the parameters of the absolute heat capacity of the folded state (eq 2). Introducing a temperature-dependent contact energy with the relation

(T) ) (Tref) + d(T - Tref)

(22)

and/or a temperature-dependent conformational entropy with the relation

∆sconf(T) ) ∆sconf(Tref) + e(T - Tref)

Figure 7. Results of fit with Ising-like model with fraction of native contacts (Q) as reaction coordinate. (a) Free energy (-RT ln ZIL(Q)) calculated from eq 15 vs reaction coordinate Q. There are 68 contacts, so populations at adjacent values of Q were summed to give a surface discretized close to that of Figure 8 for the P reaction coordinate. (b) Populations vs Q. (c) Barrier heights vs temperature. The green continuous curves correspond to the parameters b ) 0.316 cal K-1 g-1 and c ) 1.60 × 10-3 cal K-2 g-1 in eq 2, the red long-dashed curves correspond to the parameters b ) 0.329 cal K-1 g-1 and c ) 1.9 × 10-3 cal K-2 g-1, and the blue short-dashed curves to the parameters b ) 0.303 cal K-1 g-1 and c ) 1.3 × 10-3 cal K-2 g-1.

(23)

gave no significant improvement in the fit. Also, no significant improvement in the fit was obtained by varying the persistence length over the physically plausible range from 0.3 nm, estimated by Zhou from an analysis of loops in the PDB structure data base,86 to 0.8 nm obtained from all-atom molecular dynamics simulations on unfolded proteins (Robert Best, private communication). Figures 7 and 8 summarize the results from the fits using the fraction of native contacts (Q) or number of ordered residues (P) as reaction coordinates. When two barriers are present between the completely unfolded state and folded state, the barrier height to folding is taken as the larger of the two. For unfolding, the barrier height is always the difference between the free energy of the folded minimum and the top of the first barrier to unfolding. At the temperatures at which the folding

and unfolding barriers are equal (60-65 °C), the barrier height is 2.0-2.5 kcal mol-1 (Figure 7c). Calculating Free-Energy Barrier Height from Kinetics. Following Kubelka et al.,38 we can also estimate the barrier height from kinetic data using an approximate form of the preexponential factor in Kramers theory. In the laser temperaturejump experiments, using tryptophan fluorescence to monitor the folding reaction, the folding time was determined to be 4.6 ( 0.6 µs and essentially independent of temperature.68,71 A faster relaxation was also observed at 70 ( 20 ns, also with very little temperature dependence (J. Kubelka et al., in preparation). The temperature dependence of the amplitude of this relaxation and its time scale suggest that it represents partial unfolding/refolding of the protein possibly due to unzipping/zipping of the Cterminal helix of the folded protein.68 By treating the 70 ns

Estimating Free-Energy Barrier Heights

J. Phys. Chem. B, Vol. 112, No. 19, 2008 5945 barrier top, and ∆G*f is the free-energy barrier to folding. ∆G*f can be estimated by making the simplifying assumptions that (i) the unfolded well and inverted barrier have the same curvatures and that (ii) the diffusion coefficients for motion over the barrier top and motion in the folded well are equal (i.e., D ) D*). In this case, the free-energy barrier height can be simply calculated from

∆G*f ≈ RT ln

( )

τf ) 1.6 ( 0.3 kcal/mol 2πτ

(26)

and is essentially independent of temperature, since both τ and τF are temperature-independent. It is not clear whether 1.6 kcal mol-1 is an upper or lower limit. While the diffusion coefficient in the folded well is expected to be smaller than that at the barrier top, the frequency of the folded well would be expected to be higher than that of both the unfolded well and the inverted frequency at the barrier top. In other words, these effects should partially cancel out. Discussion

Figure 8. Results of fit with Ising-like model with number of ordered residues (P) as reaction coordinate. (a) Free energy (-RT ln ZIL(P)) calculated from eq 15 vs reaction coordinate P. (b) Populations vs P. (c) Barrier heights vs temperature. The green continuous curves correspond to the parameters b ) 0.316 cal K-1 g-1 and c ) 1.60 × 10-3 cal K-2 g-1 in eq 2, the red long-dashed curves correspond to the parameters b ) 0.329 cal K-1 g-1 and c ) 1.9 × 10-3 cal K-2 g-1, and the blue short-dashed curves to the parameters b ) 0.303 cal K-1 g-1 and c ) 1.3 × 10-3 cal K-2 g-1.

relaxation as conformational relaxation in a folded well described by a harmonic potential, U(x) ) 1/2ω2x2, the relaxation time (τ) is given by

τ)

RT ) 70 ( 20 ns ω2D

(24)

where ω2 is the curvature of the folded well, and D is the diffusion coefficient for motion in the well. The Kramers folding time (τF) for diffusion over a one-dimensional barrier is given by

τf )

( )

∆G*f 2πRT exp ) 4.6 ( 0.6 µs * * RT ω′ω D

(25)

where ω′2 is the curvature of the unfolded well, ω*2 the curvature of the inverted barrier, D* is the diffusion coefficient at the

We have used three different models to interpret the calorimetric data on the villin subdomain: a two-state thermodynamic model, a model that employs an empirical free-energy surface, and a simple statistical mechanical model. The quality of the fits to the excess heat capacity (the measured heat capacity minus the heat capacity of the fully folded protein) as a function of temperature with the different models largely depends on the assumed temperature dependence of the absolute heat capacity of the folded state (eq 2). We have considered three pairs of values for the parameters b and c in eq 2 for the absolute heat capacity of the completely folded protein, the most probable values given by Freire and those corresponding to one standard deviation above and below the most probable ones.78 The function corresponding to the highest of the three absolute heat capacities (which we call the upshifted baseline) might be expected to be more appropriate for such a small protein as the 35 residue villin subdomain. Assuming a completely solventaccessible chain in the unfolded state, as is customarily done, the increase in solvent-accessible surface area (SASA) upon unfolding villin calculated from the crystal structure is ∼1800 Å2 or ∼52 Å2/residue, a value which is significantly smaller than those reported for larger proteins,87-89suggesting a native heat capacity per residue higher than the average. The best fit to the data with a two-state thermodynamic model is obtained with the upshifted baseline for the folded state. This is expected because the upshifted baseline minimizes the folding-related energy fluctuations and thus the deviations from two-state behavior. As the assumed absolute heat capacity of the folded state decreases, the folding-related energy fluctuations increase, and the two-state fits significantly worsen. For the best two-state fit, the thermodynamic parameters are Tf ) 339.7 K, ∆HTf ) 28.9 kcal mol-1, and ∆Cp,Tf ) 0.457 kcal mol-1 K-1, R ) -0.007 kcal mol-1 K-2. This fit has very low residuals, but the fitted folded and unfolded baselines cross at high temperature indicating the existence of deviations from pure two-state behavior.28,39,40,90 However, the baselines cross at temperatures significantly higher than the midpoint, suggesting that the deviations from two-state behavior are not large.90 The obtained parameters, moreover, are consistent with the small size of the protein, as judged by an empirical correlation with size which predicts ∆Cp,Tf ) 0.5 kcal mol-1 K-1 and ∆HTf ) 26.4 kcal mol-1 for a 35 residue protein.88

5946 J. Phys. Chem. B, Vol. 112, No. 19, 2008 There is of course only qualitative information about freeenergy barrier heights in a thermodynamic analysis, which implicitly assumes a high barrier. Obtaining information on freeenergy barriers from the calorimetric data requires adopting a microscopic model. We have considered two models: the variable-barrier model,39,40 which assumes an empirical form for the free energy as a function of the enthalpy, and a simple statistical mechanical model, which exactly enumerates the ∼105 individual conformations in an Ising-like approach.9,16 The variable-barrier model (eqs 8-13) is based on an empirical freeenergy profile and assumes no more than two free-energy minima, an assumption that is supported by the thermodynamic two-state analysis. By varying the four adjustable parameters of the mathematical function (eqs 11 and 12) that generates the free-energy profile (eq 13), the model produces an essentially perfect fit to the calorimetric data (Figure 4) with the middle Freire baseline and excellent fits with both the upshifted and downshifted baselines. In the variable-barrier model, enthalpy is assumed to serve as a valid reaction coordinate. According to the results of fitting the calorimetric data with this model, folding is downhill below 68 °C (Figure 5a). Above this temperature, a barrier to folding appears which continuously increases with increasing temperature (Figure 5c). At the folding temperature of 74 °C, the barrier to folding (and unfolding) is only 0.4-1.1 kcal mol-1 (the uncertainty arising from the uncertainty in the absolute heat capacity of the folded protein described by eq 2). The analysis with the variable-barrier model relies on having good estimates of the heat capacity baseline for the folded state.39,40 This is particularly the case for marginal folding barriers. Thus, following the arguments outlined in the first paragraph of the discussion, we could perhaps conclude that the best estimate for the midpoint folding barrier is the 1.1 kcal mol-1 obtained with the upshifted baseline. The variable-barrier model produces much more temperature dependence to the folding barrier than to the unfolding barrier, in contrast to the experimental results of an essentially temperature-independent folding rate and a large positive activation energy for unfolding (the apparent activation energy for folding is 1 ( 1 kcal mol-1 while the unfolding activation energy is 26 ( 1 kcal mol-1).68 Explaining the temperature-independent folding rate of villin would require a very large activation energy of ∼50 kcal mol-1 for the diffusion coefficient at the barrier top, which seems unphysical for such a small protein. Naganathan et al. have recently proposed a mean-field onedimensional free-energy model that does account for the temperature dependence of the kinetics30 and is more amenable to direct comparison with structure-based microscopic models. The Ising-like model produces much poorer fits to the data (Figure 6). There are quite large deviations between the data and the fit above the unfolding temperature. However, the fit using the upshifted baseline for the folded state (long-dashed curves in Figure 6) is reasonable considering that there are only two adjustable parameters in the model. The two parameters of the Ising-like model are the destabilizing entropy loss associated with ordering a residue in its native conformation and the stabilizing energy decrease of forming an inter-residue contact (eq 14). The conformational entropy loss is 4.0 ( 0.2 cal mol-1 K-1. This value is much smaller than the value of 8-12 cal mol-1 K-1 estimated assuming that all residues in the unfolded state can access all possible conformations and that the conformational entropy in the folded state is zero (see review by Brady and Sharp, and references therein91), However, this is clearly not the case, since the unfolded state in water contains

Godoy-Ruiz et al. inter-residue interactions that reduce the accessible conformational space and residual conformational entropy remains for surface residues in the folded protein. A more realistic estimate of 4-5 cal mol-1 K-1 has been made that considers these factors.92 The large deviations between the fit and the data above the unfolding temperature (Figure 6) reflect the fact that the model underestimates the variance of the energy and therefore the heat capacity of the unfolded state (eq 9). This can best be seen by inspection of the free-energy profiles generated by the model (Figures 7 and 8). Two possible progress variables in the model can serve as reaction coordinates for a one-dimensional freeenergy surface: the fraction (Q) of native contacts and the number (P) of residues ordered in the native conformation. With Q as the reaction coordinate (Figure 7a) the completely unfolded state corresponds to the region near Q ) 0, where there is very little variance in the energy. Similarly with P as the reaction coordinate (Figure 8a), the free-energy minimum of the completely unfolded state corresponds to P ) 2. Since there are only two ordered residues, there is no energy for these states (only native segments containing two or more residues are considered and i, i + 1, and i, i + 2 interactions are ignored), so again the variance approaches zero. It is also important to note that the fits to the Ising-like model overestimate the heat capacity at low temperatures (Figure 6), indicating that the upshifted baseline should in fact be regarded as an upper limit for the heat capacity of villin’s folded state, consistent with the analysis using the variable-barrier model. The free-energy profiles produced by the fits to the Isinglike model are complex. At low temperatures, the free energy versus P profiles exhibit a small barrier to folding and a second more shallow minimum at about P ∼ 15 (Figure 7a). At the highest temperatures, unfolding encounters no barrier; that is, it is “downhill”. The results for the free energy versus Q profiles are very similar. The small folding barrier at low temperatures is located at very low values of the reaction coordinate and is reminiscent of the helix-coil nucleation barrier.93 The shallow minimum at intermediate values could be a byproduct of using a CR-CR definition for native contacts, which produces a very sparse protein contact map for villin and thus a highly discretized description for the interaction energy (only two long-range contacts, see Figure 1b). At the temperatures at which the folding and unfolding barriers are equal (60 °C for Q and 63 °C for P, Figures 7c and 8c), the barrier to folding is 2.3 ( 0.1 kcal mol-1 for the Q profile and 2.1 ( 0.1 kcal mol-1 for the P profile, higher than the 1.1 kcal mol-1 obtained from the variable-barrier model using the same baseline. As in the variable-barrier model, the free-energy barrier to folding increases with increasing temperature, while the barrier to unfolding decreases. However, a much smaller activation energy for the diffusion coefficient is required to produce a temperature-independent folding rate, with an average of 12 kcal mol-1 and 14 kcal mol-1 for the Q and P reaction coordinates, respectively. Interestingly, such activation energies for the diffusion coefficient are in very good agreement with the estimate of 0.3 kcal mol-1 per residue obtained by Naganathan et al. from the analysis of the temperature dependence of villin’s slow relaxation rate with a mean-field model (see Table 1 in ref 30). We have made yet another completely independent estimate of the barrier height by using Kramers theory for a onedimensional barrier crossing94,95 and an approximate form for the pre-exponential factor (eqs 24-26). The assumptions that produce this simple form are that the curvature in the unfolded well is the same as that of the inverted barrier top and that the

Estimating Free-Energy Barrier Heights diffusion coefficient at the barrier top is the same as that of the folded well, determined by interpreting the observed 70 ns relaxation as reconfiguration of the polypeptide chain in a harmonic potential. The free-energy barrier is then simply calculated from the measured folding time (τF) of 4.6 µs and the 70 ns relaxation time (τ) to be 1.6 ( 0.3 kcal mol-1 (eq 26). This kinetic estimate of the barrier height is in remarkable agreement with the value obtained independently by Naganathan et al. with a one-dimensional mean-field free-energy model.30 The agreement between these two kinetic estimates is particularly encouraging because the two methods are fundamentally different and involve different assumptions. Three different results suggest that the folding barrier of villin at the midpoint temperature is marginal but still higher than the 0.5 kcal mol-1 estimated from the variable-barrier model using Freire’s baseline for the folded state. First, the 1.1 kcal mol-1 estimated with the upshifted baseline and the Ising-like barriers of 2.1 and 2.3 kcal mol-1 at the folding temperature are closer to the 1.6 kcal mol-1 obtained from the Kramers analysis, which makes no assumptions about the nature of the reaction coordinate, except that it is one-dimensional. Second, the much smaller temperature dependence of the barrier height of the Ising-like model is more consistent with the temperatureindependent barrier height and simple linear temperature dependence of the diffusion coefficient of the Kramers analysis and also with independent estimates of the activation energy for the folding diffusion coefficient of villin.30 Finally, a double mutant folding in ∼700 ns at 70 °C still shows biphasic kinetics,38 indicating the presence of a barrier, even though it corresponds to a barrier height that has been reduced by 1.3 kcal mol-1 compared with wild-type (assuming that the diffusion coefficient of eq 25 is not increased). Interestingly, 700 ns is close to the speed limit of 350 ns predicted from the approximate relation of Kubelka et al. (speed limit ) (N/100 µs)-1, where N is the number of residues).37 In spite of the differences among the three methods of calculation, they all yield marginal freeenergy barriers to folding that are sufficiently low that it should be possible to engineer this protein or find solution conditions that would completely eliminate the barrier to create the “downhill” folding scenario of Wolynes and Onuchic.3 Finally, there is an interesting difference between the interpretations of the 70 ns phase in the three models. In the kinetic analysis using the Kramers relation (eq 26), it is interpreted as relaxation in the harmonic well of the folded state (eq 24). In the Ising-like model, it would appear to arise from the exchange of the population at intermediate values of the Q or P reaction coordinate with the unfolded and/or folded state. In the variable-barrier model, it might arise from changes at the barrier top where there is a significant population that could rapidly move downhill to the folded or unfolded well in response to a temperature jump, as discussed by Gruebele and co-workers for the λ repressor.26 Only a detailed kinetic analysis might be able to distinguish among these possibilities, and such an analysis is in progress (J. Kubelka, E. R. Henry, J. Hofrichter, and W. A. Eaton, manuscript in preparation). Acknowledgment. We thank Wai-Ming Yau for synthesis and purification of the villin peptide and Attila Szabo for many helpful discussions on theoretical issues. J.M.S.-R. was supported by Grant BIO2006-07332 from the Spanish Ministry of Education and Science and FEDER Funds and by Grant CVI771 from the Junta de Andalucia. R.G.-R. is a recipient of a postdoctoral fellowship from the Human Frontier Science

J. Phys. Chem. B, Vol. 112, No. 19, 2008 5947 Program. Work at NIH was supported by the Intramural Research Program of NIDDK. References and Notes (1) Bryngelson, J. D.; Wolynes, P. G. Intermediates and Barrier Crossing in a Random Energy-Model (with Applications to Protein Folding). J. Phys. Chem. 1989, 93, 6902. (2) Sali, A.; Shakhnovich, E.; Karplus, M. How does a protein fold. Nature 1994, 369, 248. (3) Bryngelson, J. D.; Onuchic, J. N.; Socci, N. D.; Wolynes, P. G. Funnels, pathways, and the energy landscape of protein-folding - a synthesis. Proteins: Struct., Funct., Genet. 1995, 21, 167. (4) Boczko, E. M.; Brooks, C. L. First-principles calculation of the folding free-energy of a 3- helix bundle protein. Science 1995, 269, 393. (5) Socci, N. D.; Onuchic, J. N.; Wolynes, P. G. Diffusive dynamics of the reaction coordinate for protein folding funnels. J. Chem. Phys. 1996, 104, 5860. (6) Munoz, V.; Thompson, P. A.; Hofrichter, J.; Eaton, W. A. Folding dynamics and mechanism of beta-hairpin formation. Nature 1997, 390, 196. (7) Dobson, C. M.; Sali, A.; Karplus, M. Protein folding: A perspective from theory and experiment. Angew. Chem. 1998, 37, 868. (8) Chan, H. S.; Dill, K. A. Protein folding in the landscape perspective: Chevron plots and non-Arrhenius kinetics. Proteins: Struct., Funct., Genet. 1998, 30, 2. (9) Munoz, V.; Eaton, W. A. A simple model for calculating the kinetics of protein folding from three-dimensional structures. Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 11311. (10) Clementi, C.; Nymeyer, H.; Onuchic, J. N. Topological and energetic factors: What determines the structural details of the transition state ensemble and “en-route” intermediates for protein folding? An investigation for small globular proteins. J. Mol. Biol. 2000, 298, 937. (11) Shea, J. E.; Brooks, C. L. From folding theories to folding proteins: A review and assessment of simulation studies of protein folding and unfolding. Ann. ReV. Phys. Chem. 2001, 52, 499. (12) Gruebele, M. Protein folding: the free energy surface. Curr. Opin. Struct. Biol. 2002, 12, 161. (13) Shoemaker, B. A.; Wang, J.; Wolynes, P. G. Exploring structures in protein folding funnels with free energy functionals: The transition state ensemble. J. Mol. Biol. 1999, 287, 675. (14) Portman, J. J.; Takada, S.; Wolynes, P. G. Microscopic theory of protein folding rates. I. Fine structure of the free energy profile and folding routes from a variational approach. J. Chem. Phys. 2001, 114, 5069. (15) Bruscolini, P.; Pelizzola, A. Exact solution of the Munoz-Eaton model for protein folding. Phys. ReV. Lett. 2002, 88. (16) Henry, E. R.; Eaton, W. A. Combinatorial modeling of protein folding kinetics: free energy profiles and rates. Chem. Phys. 2004, 307, 163. (17) Shen, T. Y.; Hofmann, C. P.; Oliveberg, M.; Wolynes, P. G. Scanning malleable transition state ensembles: Comparing theory and experiment for folding protein U1A. Biochemistry 2005, 44, 6433. (18) Honeycutt, J. D.; Thirumalai, D. Metastability of the folded states of globular-proteins. Proc. Natl. Acad. Sci. U.S.A. 1990, 87, 3526. (19) Snow, C. D.; Nguyen, N.; Pande, V. S.; Gruebele, M. Absolute comparison of simulated and experimental protein-folding dynamics. Nature 2002, 420, 102. (20) Snow, C. D.; Sorin, E. J.; Rhee, Y. M.; Pande, V. S. How well can simulation predict protein folding kinetics and thermodynamics? Annu. ReV. Biophys. Biomol. Struct. 2005, 34, 43. (21) Onuchic, J. N.; Luthey-Schulten, A.; Wolynes, P. G. Theory of protein foilding: the energy landscape perspective. Annu. ReV. Phys. Chem. 1997, 48, 545. (22) Eaton, W. A. Searching for “downhill scenarios” in protein folding. Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 5897. (23) Sabelko, J.; Ervin, J.; Gruebele, M. Observation of strange kinetics in protein folding. Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 6031. (24) Garcia-Mira, M. M.; Sadqi, M.; Fischer, N.; Sanchez-Ruiz, J. M.; Munoz, V. Experimental identification of downhill protein folding. Science 2002, 298, 2191. (25) Yang, W. Y.; Gruebele, M. Folding at the speed limit. Nature 2003, 423, 193. (26) Ma, H. R.; Gruebele, M. Kinetics are probe-dependent during downhill folding of an engineered lambda(6-85) protein. Proc. Natl. Acad. Sci. U.S.A. 2005, 102, 2283. (27) Gruebele, M. Downhill protein folding: evolution meets physics. Comput. Rend. Biol. 2005, 328, 701. (28) Naganathan, A. N.; Perez-Jimenez, R.; Sanchez-Ruiz, J. M.; Munoz, V. Robustness of downhill folding: Guidelines for the analysis of equilibrium folding experiments on small proteins. Biochemistry 2005, 44, 7435. (29) Ma, H. R.; Gruebele, M. Low barrier kinetics: Dependence on observables and free energy surface. J. Comp. Chem. 2006, 27, 125.

5948 J. Phys. Chem. B, Vol. 112, No. 19, 2008 (30) Naganathan, A. N.; Doshi, U.; Munoz, V. Protein folding kinetics: barrier effects in chemical and thermal denaturation experiments. J. Am. Chem. Soc. 2007, 129, 5673. (31) Matouschek, A.; Kellis, J. T.; Serrano, L.; Fersht, A. R. Mapping the transition-state and pathway of protein folding by protein engineering. Nature 1989, 340, 122. (32) Fersht, A. R.; Matouschek, A.; Serrano, L. The folding of an enzyme .1. Theory of protein engineering analysis of stability and pathway of protein folding. J. Mol. Biol. 1992, 224, 771. (33) Fersht, A. R.; Daggett, V. Protein folding and unfolding at atomic resolution. Cell 2002, 108, 573. (34) Daggett, V.; Fersht, A. R. Is there a unifying mechanism for protein folding? Trends Biochem. Sci. 2003, 28, 18. (35) Chan, C. K.; Hu, Y.; Takahashi, S.; Rousseau, D. L.; Eaton, W. A.; Hofrichter, J. Submillisecond protein folding kinetics studied by ultrarapid mixing. Proc. Natl. Acad. Sci. U.S.A. 1997, 94, 1779. (36) Schuler, B.; Lipman, E. A.; Eaton, W. A. Probing the free-energy surface for protein folding with single-molecule fluorescence spectroscopy. Nature 2002, 419, 743. (37) Kubelka, J.; Hofrichter, J.; Eaton, W. A. The protein folding ‘speed limit’. Curr. Opin. Struct. Biol. 2004, 14, 76. (38) Kubelka, J.; Chiu, T. K.; Davies, D. R.; Eaton, W. A.; Hofrichter, J. Sub-microsecond protein folding. J. Mol. Biol. 2006, 359, 546. (39) Munoz, V.; Sanchez-Ruiz, J. M. Exploring protein-folding ensembles: A variable-barrier model for the analysis of equilibrium unfolding experiments. Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 17646. (40) Naganathan, A. N.; Sanchez-Ruiz, J. M.; Munoz, V. Direct measurement of barrier heights in protein folding. J. Am. Chem. Soc. 2005, 127, 17970. (41) Best, R. B.; Hummer, G. Diffusive model of protein folding dynamics with Kramers turnover in rate. Phys. ReV. Lett. 2006, 96. (42) Shakhnovich, E. Protein folding thermodynamics and dynamics: Where physics, chemistry, and biology meet. Chem. ReV. 2006, 106, 1559. (43) McKnight, C. J.; Doering, D. S.; Matsudaira, P. T.; Kim, P. S. A thermostable 35-residue subdomain within villin headpiece. J. Mol. Biol. 1996, 260, 126. (44) Frank, B. S.; Vardar, D.; Buckley, D. A.; McKnight, C. J. The role of aromatic residues in the hydrophobic core of the villin headpiece subdomain. Protein Sci. 2002, 11, 680. (45) McKnight, C. J.; Matsudaira, P. T.; Kim, P. S. NMR structure of the 35-residue villin headpiece subdomain. Nat. Struct. Biol. 1997, 4, 180. (46) Chiu, T. K.; Kubelka, J.; Herbst-Irmer, R.; Eaton, W. A.; Hofrichter, J.; Davies, D. R. High-resolution x-ray crystal structures of the villin headpiece subdomain, an ultrafast folding protein. Proc. Natl. Acad. Sci. U.S.A. 2005, 102, 7517. (47) Duan, Y.; Kollman, P. A. Pathways to a protein folding intermediate observed in a 1-microsecond simulation in aqueous solution. Science 1998, 282, 740. (48) Lee, M. R.; Duan, Y.; Kollman, P. A. Use of MM-PB/SA in estimating the free energies of proteins: Application to native, intermediates, and unfolded villin headpiece. Proteins: Struct., Funct., Genet. 2000, 39, 309. (49) Sullivan, D. C.; Kuntz, I. D. Conformation spaces of proteins. Proteins: Struct., Funct., Genet. 2001, 42, 495. (50) Zagrovic, B.; Snow, C. D.; Shirts, M. R.; Pande, V. S. Simulation of folding of a small alpha-helical protein in atomistic detail using worldwide-distributed computing (vol 323, pg 927, 2002). J. Mol. Biol. 2002, 324, 1051. (51) Hansmann, U. H. E. Protein-folding simulations in generalized ensembles. Int. J. Quant. Chem. 2002, 90, 1515. (52) Zagrovic, B.; Snow, C. D.; Khaliq, S.; Shirts, M. R.; Pande, V. S. Native-like mean structure in the unfolded ensemble of small proteins. J, Mol. Biol. 2002, 323, 153. (53) Sullivan, D. C.; Kuntz, I. D. Protein folding as biased conformational diffusion. J. Phys. Chem. B 2002, 106, 3255. (54) Shen, M. Y.; Freed, K. F. All-atom fast protein folding simulations: The villin headpiece. Proteins: Struct., Funct., Genet. 2002, 49, 439. (55) Mukherjee, A.; Bagchi, B. Correlation between rate of folding, energy landscape, and topology in the folding of a model protein HP-36. J. Chem. Phys. 2003, 118, 4733. (56) De Mori, G. M. S.; Micheletti, C.; Colombo, G. All-atom folding simulations of the villin headpiece from stochastically selected coarsegrained structures. J. Phys. Chem. B 2004, 108, 12267. (57) Fernandez, A.; Shen, M. Y.; Colubri, A.; Sosnick, T. R.; Berry, R. S.; Freed, K. F. Large-scale context in protein folding: Villin headpiece. Biochemistry 2003, 42, 664. (58) Kinnear, B. S.; Jarrold, M. F.; Hansmann, U. H. E. All-atom generalized-ensemble simulations of small proteins. J. Mol. Graph. Model. 2004, 22, 397. (59) Mukherjee, A.; Bagchi, B. Contact pair dynamics during folding of two small proteins: Chicken villin head piece and the Alzheimer protein beta-amyloid. J. Chem. Phys. 2004, 120, 1602.

Godoy-Ruiz et al. (60) Ripoll, D. R.; Vila, J. A.; Scheraga, H. A. Folding of the villin headpiece subdomain from random structures. analysis of the charge distribution as a function of pH. J. Mol. Biol. 2004, 339, 915. (61) Jayachandran, G.; Vishal, V.; Pande, V. S. Using massively parallel simulation and Markovian models to study protein folding: Examining the dynamics of the villin headpiece. J. Chem. Phys. 2006, 124. (62) Bandyopadhyay, S.; Chakraborty, S.; Bagchi, B. Exploration of the secondary structure specific differential solvation dynamics between the native and molten globule states of the protein HP-36. J. Phys. Chem. B 2006, 110, 20629. (63) Bandyopadhyay, S.; Chakraborty, S.; Bagchi, B. Coupling between hydration layer dynamics and unfolding kinetics of HP-36. J. Chem. Phys. 2006, 125. (64) Trebst, S.; Troyer, M.; Hansmann, U. H. E. Optimized parallel tempering simulations of proteins. J. Chem. Phys. 2006, 124. (65) Zagrovic, B.; van Gunsteren, W. F. Comparing atomistic simulation data with the NMR experiment: How much can NOEs actually tell us? Proteins: Struct., Funct., Bioinformat. 2006, 63, 210. (66) Wickstrom, L.; Bi, Y.; Hornak, V.; Raleigh, D. P.; Simmerling, C. Reconciling the solution and X-ray structures of the villin headpiece helical subdomain: Molecular dynamics simulations and double mutant cycles reveal a stabilizing cation-pi interaction. Biochemistry 2007, 46, 3624. (67) Jayachandran, G.; Vishal, V.; Garcia, A. E.; Pande, V. S. Local structure formation in simulations of two small proteins. J. Struct. Biol. 2007, 157, 491. (68) Kubelka, J.; Eaton, W. A.; Hofrichter, J. Experimental tests of villin subdomain folding simulations. J. Mol. Biol. 2003, 329, 625. (69) Wang, M. H.; Tang, Y. F.; Sato, S. S.; Vugmeyster, L.; McKnight, C. J.; Raleigh, D. P. Dynamic NMR line-shape analysis demonstrates that the villin headpiece subdomain folds on the microsecond time scale. J. Am. Chem. Soc. 2003, 125, 6032. (70) Brewer, S. H.; Vu, D. M.; Tang, Y. F.; Li, Y.; Franzen, S.; Raleigh, D. P.; Dyer, R. B. Effect of modulating unfolded state structure on the folding kinetics of the villin headpiece subdomain. Proc. Natl. Acad. Sci U.S.A. 2005, 102, 16662. (71) Buscaglia, M.; Kubelka, J.; Eaton, W. A.; Hofrichter, J. Determination of ultrafast protein folding rates from loop formation dynamics. J. Mol. Biol. 2005, 347, 657. (72) Brewer, S. H.; Song, B. B.; Raleigh, D. P.; Dyer, R. B. Residue specific resolution of protein folding dynamics using isotope-edited infrared temperature jump spectroscopy. Biochemistry 2007, 46, 3279. (73) Alm, E.; Baker, D. Prediction of protein-folding mechanisms from free-energy landscapes derived from native structures. Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 11305. (74) Galzitskaya, O. V.; Finkelstein, A. V. A theoretical search for folding/unfolding nuclei in three-dimensional protein structures. Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 11299. (75) Garbuzynskiy, S. O.; Finkelstein, A. V.; Galzitskaya, O. V. Outlining folding nuclei in globular proteins. J. Mol. Biol. 2004, 336, 509. (76) Irun, M. P.; Garcia-Mira, M. M.; Sanchez-Ruiz, J. M.; Sancho, J. Native hydrogen bonds in a molten globule: The apoflavodoxin thermal intermediate. J. Mol. Biol. 2001, 306, 877. (77) Guzman-Casado, M.; Parody-Morreale, A.; Robic, S.; Marqusee, S.; Sanchez-Ruiz, J. M. Energetic evidence for formation of a pH-dependent hydrophobic cluster in the denatured state of Thermus thermophilus ribonuclease H. J. Mol. Biol. 2003, 329, 731. (78) Freire, E. Differential scanning calorimetry. In Protein Stability and Folding. Theory and Practice; Shirley, B. A., Ed.; Humana Press: Totowa, NJ, 1995; Vol. 40, p 191. (79) Hackel, N.; Hinz, H. J.; Hedwig, G. R. A new set of peptide-based group heat capacities for use in protein stability calculations. J. Mol. Biol. 1999, 291, 197. (80) Makhatadze, G. I.; Privalov, P. L. Heat-capacity of Proteins .1. Partial molar heat-capacity of individual amino-acid-residues in aqueoussolution - hydration effect. J. Mol. Biol. 1990, 213, 375. (81) Takahashi, K.; Sturtevant, J. M. Thermal-denaturation of streptomyces subtilisin inhibitor, subtilisin Bpn’, and the inhibitor-subtilisin complex. Biochemistry 1981, 20, 6185. (82) Martinez, J. C.; Filimonov, V. V.; Mateo, P. L.; Schreiber, G.; Fersht, A. R. A Calorimetric study of the thermal-stability of barstar and its interaction with barnase. Biochemistry 1995, 34, 5224. (83) Thirumalai, D.; Ha, B. Y. Statistical Mechanics of Semiflexible Chains: A Mean Field Variational Approach. In Theoretical and Mathematical Models in Polymer Research; Grosberg, A., Ed.; Academic Press: New York, 1998; p 1. (84) Lapidus, L. J.; Steinbach, P. J.; Eaton, W. A.; Szabo, A.; Hofrichter, J. Effects of chain stiffness on the dynamics of loop formation in polypeptides. Appendix: Testing a 1-dimensional diffusion model for peptide dynamics. J. Phys. Chem. B 2002, 106, 11628. (85) Buscaglia, M.; Lapidus, L. J.; Eaton, W. A.; Hofrichter, J. Effects of denaturants on the dynamics of loop formation in polypeptides. Biophys. J. 2006, 91, 276.

Estimating Free-Energy Barrier Heights (86) Zhou, H. X. Loops in proteins can be modeled as worm-like chains. J. Phys. Chem. B 2001, 105, 6763. (87) Privalov, P. L.; Makhatadze, G. I. Heat-Capacity of Proteins .2. Partial Molar Heat-Capacity of the Unfolded Polypeptide-Chain of Proteins - Protein Unfolding Effects. J. Mol. Biol. 1990, 213, 385. (88) Robertson, A. D.; Murphy, K. P. Protein structure and the energetics of protein stability. Chem. ReV. 1997, 97, 1251. (89) Murphy, K. P.; Bhakuni, V.; Xie, D.; Freire, E. Molecular-Basis of Cooperativity in Protein Folding .3. Structural Identification of Cooperative Folding Units and Folding Intermediates. J. Mol. Biol. 1992, 227, 293. (90) Naganathan, A. N.; Doshi, U.; Fung, A.; Sadqi, M.; Munoz, V. Dynamics, energetics, and structure in protein folding. Biochemistry 2006, 45, 8466.

J. Phys. Chem. B, Vol. 112, No. 19, 2008 5949 (91) Brady, G. P.; Sharp, K. A. Entropy in protein folding and in proteinprotein interactions. Curr. Opin. Struct. Biol. 1997, 7, 215. (92) Daquino, J. A.; Gomez, J.; Hilser, V. J.; Lee, K. H.; Amzel, L. M.; Freire, E. The magnitude of the backbone conformational entropy change in protein folding. Proteins: Struct., Funct., Genet. 1996, 25, 143. (93) Thompson, P. A.; Munoz, V.; Jas, G. S.; Henry, E. R.; Eaton, W. A.; Hofrichter, J. The helix-coil kinetics of a heteropeptide. J. Phys. Chem. B 2000, 104, 378. (94) Kramers, H. A. Brownian motion in a field of force and the diffusion model of chemical reactions. Physica 1940, VII, 284. (95) Berezhkovskii, A.; Szabo, A. One-dimensional reaction coordinates for diffusive activated rate processes in many dimensions. J. Chem. Phys. 2005, 122, 014503.