Folding of a SH3 Domain: Standard and ... - American Chemical Society

Aug 27, 2009 - 630090 NoVosibirsk, Russia. ReceiVed: April 10, 2009; ... the native state stability region. A standard analysis of the folding process...
0 downloads 0 Views 5MB Size
J. Phys. Chem. B 2009, 113, 12759–12772

12759

Folding of a SH3 Domain: Standard and “Hydrodynamic” Analyses Igor V. Kalgin,† Martin Karplus,*,‡,§ and Sergei F. Chekmarev*,†,| Department of Physics, NoVosibirsk State UniVersity, 630090 NoVosibirsk, Russia, Laboratoire de Chimie Biophysique, ISIS UniVersite´ de Strasbourg, 67000 Strasbourg, France, Department of Chemistry and Chemical Biology, HarVard UniVersity, Cambridge, Massachusetts 02138, and Institute of Thermophysics, SB RAS, 630090 NoVosibirsk, Russia ReceiVed: April 10, 2009; ReVised Manuscript ReceiVed: June 26, 2009

Discrete molecular dynamics has been used to study the folding of a SH3 domain with a CR-based Goj-model at a temperature within the native state stability region. A standard analysis of the folding process, based on consideration of the mean-force (free energy) surfaces, contact maps and folding time distributions, is complemented by a “hydrodynamic” description of folding flows (Chekmarev et al., PRL, 2008, 018107) using two and three collective variables. Two types of folding trajectories (fast and slow) follow essentially different routes in the final stage of folding. The hydrodynamic description makes possible the calculation of folding flows corresponding to these routes. The results show that the probability flows do not correspond to the free energy surface and that vortex formation is involved in the slow trajectories. Comparison of the simulation results with the experimental data suggests that the two-state kinetics observed for Fyn and Src SH3 domain folding are associated with the slow trajectories, in which a partly formed N- and C-terminal β sheet hinders the RT-loop from attaching to the protein core; the fast trajectories are not observed because they are in the dead time (1 ms) of the experiments. I. Introduction The problem of protein folding has attracted the attention of biologists, chemists, and physicists for more than four decades. Although considerable progress has been made in recent years and the problem has been stated to be practically solved,1 many aspects of protein folding remain a challenge.2,3 One of them is the theoretical description of the folding process. In late 1960s, Levinthal4 showed that because of the astronomical number of conformations possible for a polypeptide chain, a random search for the unique native structure would require 1011 or so years. This result suggested that the folding of a protein must involve a biased search for the native structure, rather than a random exploration of the conformation space. Early models of protein folding, concerned primarily with the structural evolution of the polypeptide chain,5-7 have made clear that a “unified” view of protein folding could serve as a basis for the interpretation of the increasing number of experimental and computer simulation results. The efforts of many research groups8-22 have resulted in a “new view” of protein folding17 based on the socalled “landscape” description.11-15 Information about folding pathways is obtained from the potentials of mean-force, which are usually represented as the free energy surfaces (FESs) as a function of certain variables, for example, the radius of gyration and fraction of native contacts. The FESs are typically such that the energy is biased toward the native state, leading to folding on the experimental time scales. The new view has led to essential progress in the understanding of the principles of protein folding.23-25 However, because the FESs provide only a mean-force thermodynamic picture of the process, their utility * To whom correspondence should be addressed. E-mail: (S.F.C.) [email protected]; (M.K.) [email protected]. † Novosibirsk State University. ‡ ISIS Universite´ de Strasbourg. § Harvard University. | SB RAS.

for gaining insights into the folding kinetics is limited. The mean-force description in terms of the chosen variables can hide the complexity of the multidimensional conformation space, and the thermodynamic characterization can hide the kinetic phenomena. An understanding of these limitations has led to a growing number of studies in which the interstate transitions observed in folding trajectories are used to determine the folding kinetics.26-37 In such analyzes, using the potential energy26,27,32 and free energy28-30,32 disconnectivity graphs, the “minimumsaddle-minimum” sequences31 or kinetic networks33-37 protein conformations are discriminated/clustered on the basis of their kinetic connectivity. This offers the possibility of describing the folding reaction as a Markov process of transitions between a limited number of the essential states characterizing the folding pathways.26-37 One essential disadvantage of the FESs is that they determine only the probability for the system to be in a given state and do not show the direction of the motion; that is, the system can either fold or unfold when it visits this state. As a result, the same values of the free energy can correspond, for example, to conformations from which the system goes directly to the native state, and to conformations that are not on a direct pathway (that is, they may require partial unfolding). This agrees with the conclusion from studies of the isomerization transitions of the alanine dipeptide for which it was shown that the FES may not reflect the local flows of transitions in the reduced space of collective variables.38,39 Recently, we have employed a “hydrodynamic” description of the folding process,40 which offers the possibility to gain insight into folding kinetics in the framework of the mean-force description. Similar to the previously mentioned kinetic approaches,26-37 the hydrodynamic approach uses the calculated trajectories to determine the interstate transitions. However, in contrast to the previous approaches, the average local “flows” in the reduced configuration space are determined, and then, in analogy with hydrodynamics,41 the

10.1021/jp903325z CCC: $40.75  2009 American Chemical Society Published on Web 08/27/2009

12760

J. Phys. Chem. B, Vol. 113, No. 38, 2009

“vectors” and “streamlines” of the flows are calculated. Using a lattice R-helical hairpin as a model system, we have shown that the distributions of the flows from the unfolded state to the native state can be different from what is expected from the FES landscape. Specifically, we have found that the flows are concentrated in a narrow region of the FES, forming a “beam” of streamlines leading to the native state. Moreover, the beam does not follow the lowest contours of semicompact states on the FES but is displaced to the side of more compact conformations with considerably higher free energy. The rest of the low free energy basin is occupied by a flow “vortex”, which represents a new type of “dead-end” that does not have a fingerprint on the FES. Correspondingly, such a dead-end cannot be detected through the analysis of the FES landscape. It was shown that the vortex arises because the helices, which are readily formed, do not create the full turn characteristic of the native hairpin. Therefore, one of the helices has to unwind and then reform to reach the helical hairpin state. Numerous unsuccessful attempts to do this introduce the reciprocal flows in a local region of the conformation space, which lead to a vortex motion in this region. The analogy between the “hydrodynamic” description and “real” hydrodynamics is formal. However, this is an advantage of the hydrodynamic approach rather than its disadvantage, because it allows a wide choice of variables to describe the process. The parameters of primary interest can be chosen for such variables; moreover different combinations of the variables can be used to supplement each other, which offers the possibility of obtaining a detailed insight into kinetics of the given process. More generally, such a hydrodynamic description can be applied to any dynamic process for which the evolution of the system in the space of collective variables (order parameters) is known. A referee pointed us to a paper that describes the reaction pathways in terms of the probability flows and streamlines42 and illustrated the formulation with a two-dimensional model system. The authors numerically solved the backward Kolmagorov equation for the committor function in the full configuration (phase) space and then, assuming the system to be in equilibrium on the reaction pathways, calculated the probability flows between the reactant and product states. The authors noted that in the case of many dimensional systems the solution of the Kolmagorov equation “becomes prohibitively expensive”, and a reduced set of variables is required to parametrize the committor function. The present approach has the benefit of direct simulations to calculate the local flows of transitions in the reduced space, which does not require the system to be in equilibrium. In this paper, we study folding of a model SH3 domain, and extend the standard analysis of folding, based on the examination of the FESs, contacts maps and folding time distributions with the hydrodynamic analysis of folding flows. SH3 domains are known to fold to very similar native conformations (Src,43-46 Fyn,46-49 R-Spectrin50,51 and c-Crk52), even though the sequence identity can be as low as 33%. In the native state, they have a β-barrel fold which consists of five β-strands and a short 310 helix. The β-strands are arranged as two orthogonally packed antiparallel β sheets (Figure 1a); one sheet consists of the terminal strands and RT-loop (the β1-β5-RT sheet), and the other of the n-src loop and the distal hairpin (the β2-β3-β4 sheet). Extensive experimental studies of folding of Src43-46 and Fyn46-49 SH3 domains, which are most closely related (the sequence identity is ∼70%), have shown that the protein folds rapidly, on a time scale of 10 ms. Mutagenic (Φ-value) analysis

Kalgin et al.

Figure 1. Ribbon (a) and CR (b) representations of the Fyn SH3 domain, and the contact map for the native structure (c).

has revealed that the distal beta hairpin and the diverging turn are formed in the transition state, while the remainder of the protein is largely unstructured. Recently it has been found that kinetic intermediates are possible in the initial (submillisecond) stage of folding, either in the form of R-helices (Src and Fyn SH3)46 or low-populated partially structured β2-β3-β4 sheets (Fyn SH3 mutants).48,49 Simulation studies, using an all-atom representation of the protein25,53-56 or CR and Cβ based Gojtype models,57-62 are qualitatively in accord with the experimental results. These studies have shown that the correct packing of the diverging turn and the distal hairpin into the three-stranded hydrophobic β-sheet (β2-β3-β4) is a key element for fast folding, and that this β-sheet is the most structured element of the transition state. The association of the N- and C-terminal

Standard and Hydrodynamic Analyses of SH3 Domain strands and the packing of the RT-loop with the β-sheet complete the folding process. The equilibrium FESs for a Src SH3 domain were calculated in refs 25, 54, and 55 for an all-atom model of the protein in explicit solvent. To construct the FESs, the fraction of native contacts (F) and the radius of gyration (r) were used as the collective variables. At room temperature, the FES was found to be downhill. However, at higher temperatures, 323 to 363 K, three catchment basins were observed that were separated by two barriers at F ) 0.3 and F ) 0.8.54 The FESs varied with temperature so that at T ) 323 K only the barrier at F ) 0.8 was found, at T ) 343 K two barriers at F ) 0.8 and F ) 0.3 were found, and at T ) 363 K only the barrier at F ) 0.3 was found. The barrier at F ) 0.3 was associated with the transition between the unfolded and folded states, and the folding temperature was estimated to be between 323 and 343 K. Interestingly, use of the number of native hydrogen bonds as the collective variable (instead of the radius of gyration F) led to a slightly different FES landscape; in particular, at T ) 323K the barrier at F ) 0.3 was observed, which was not visible in the FES in the (F, r) space. A similar dependence of the FES landscape on the choice of collective variables was found in ref 56. Performing an unfolding simulation of a Src SH3 domain with use of a charge-neutralized all-atom model, the authors showed that a principal coordinate analysis (PCA) provides the barrier between folded and unfolded states in the PCA space, while in the space composed of the fraction of native contacts and radius of gyration the barrier is not evident. The paper is organized as follows. Section II describes a Gojtype bead model for the SH3 domain, the discrete molecular dynamics method used for simulations (IIA), the heat capacity calculations (IIB), and the hydrodynamics approach employed in the analysis of the folding process (IIC). Section III presents the results. It discusses the thermodynamics of the system (IIIA) and the folding process (IIIB). The latter subsection includes a standard analysis of folding (FESs, folding time distribution, and contact probability maps; IIIB1) and the hydrodynamic analysis of the folding process in two-dimensional (2D) and three-dimensional (3D) spaces of collective variables (IIIB2). Section IV summarizes the results and presents some concluding remarks. II. Methods A. System and Simulation Background. The model SH3 domain we study is Fyn SH3 (Figure 1a, PDB code is 1fyn63). It is one of the two SH3 domains, Fyn and Src, that have the highest sequence identity among SH3 domains (∼70%). Fyn SH3 is a nearly pure β sheet protein of 62 residues with the following structural elements: β1 (6-9), RT-loop (10-23), diverging turn (24-27), β2 (28-33), n-Src loop (34-38), β3 (39-44), distal hairpin (45-49), β4 (50-54), 310-helix (55-57), β5 (58-60) (the numbers in brackets indicate the numbers of the residues counted from the N-terminus). For the simulations, we use a CR based Goj-model,64,65 in which the residues are represented by monomers (beads) centered at the CR atoms (Figure 1b), and the interaction between the monomers is determined by the structure of the native state of the protein. The monomers are assumed to interact with square-well potentials to speed up the simulations. For bonded monomers, i and i + 1, the potential is

J. Phys. Chem. B, Vol. 113, No. 38, 2009 12761

{

∞, ri,i+1 < (1 - δ)σb ui,i+1 ) 0, (1 - δ)σb e ri,i+1 e (1 + δ)σb ∞, ri,i+1 > (1 + δ)σb and for nonbonded monomers, i and j, it is

{

∞, ri,j < σc ui,j ) -, σc e ri,j e σd 0, ri,j > σd Here ri,j is the distance between the CR atoms of residues i and j, σb ) 3.8 Å is the bond length, δ ) 0.1 is the bond-length flexibility, σc ) 4.27 Å is the hard-core diameter, and σd ) 6.405 Å is the attractive sphere diameter. According to a Gojtype model,9 ui,j is set equal to s if the monomers are in contact in the native structure and 0 otherwise. On the basis of the Miyazawa-Jernigan residue-residue contact potential,66,67 the attractive energy  is assumed to be equal 2.2 kcal/mol.68 The native contact between monomers i and j is considered to be formed if the distance between the monomers, ri,j, is less than the attractive sphere diameter σd. Figure 1c shows the contact map for the native state. The total number of native contacts Nnat is equal to 136, and the (potential) energy of the protein in the native state Unat ) -136. In what follows, σc,  and τ ) (Mσ2c /)1/2, where M ) 110 Da (τ ≈ 1.5 ps), are taken as the length, energy, and time scales, respectively; all corresponding values are given in terms of these units. For the simulations, a discrete molecular dynamics method64,65,69,70 was employed. The monomers move freely until a collision occurs either between two (or more) monomers or between a monomer and an imaginary heat-bath ghost particle (ghost collision). In the latter case, the velocity of the monomer is reset from the Maxwell distribution at the simulated temperature. The time interval between two successive ghost collisions is randomly selected from a Poisson distribution with the mean frequency of the collisions corresponding to the density of the ghost particles ng ) 0.1/σ3c .65 In the neighborhood of the native state, the ghost collisions typically occur approximately 102 times more rarely than the collisions between the monomers. In the calculation of folding time and the probability for the system to be in a certain state, it is assumed that during the time interval between two successive collisions the system resides in the state which it occupies after the preceding collision. B. Heat Capacity Calculation. To calculate the heat capacity, a multiple histogram method71 was used. The concept on which this method is based is as follows (for implementation details, see, for example, ref 72). In a canonical ensemble, the probability that the system has energy E is determined as p(E;T) ) Ω(E) exp (sE/kBT)/Z(T), where Ω(E) is the density of states, Z(T) is the partition function, and kB is the Boltzmann constant.75 The simulation at a temperature T gives the number of times the system has energy E at this temperature, that is, the energy histogram at T, N(E;T) ∼ Ω(E)exp(sE/kBT)/Z(T). Calculating the overlapping histograms for a set of close temperatures, it is possible to evaluate Ω(E) up to unknown multiplicative factor. This yields p(E) ∼ Ω(E)exp(sE/kBT) for a temperature of interest. Specifically, the temperatures were gradually increased from T ) 0.2 to T ) 1.5 by steps of ∆T ) 0.02. At each temperature, the system was equilibrated for the time interval, ∆tequil, corresponding to 4 × 107 collisions, including those between the monomers and between the monomers and the ghost particles. After equilibration, the simulation was continued for

12762

J. Phys. Chem. B, Vol. 113, No. 38, 2009

Kalgin et al.

∆t ) 10 × ∆tequil, and the data were stored as a histogram for the given T. Having the probabilities of distributions p(E;T), as a result of the multiple histogram analysis procedure, the heat capacity was then calculated as Cν )(〈E2〉 - 〈E〉2)/kBT2,75 where the angle brackets denote the ensemble averages. C. Hydrodynamic Description of Folding. The hydrodynamic description of folding40 is based on the calculation of the transitions in a space of collective variables (order parameters) g, which are chosen to characterize the folding process. These transitions are organized into the local transition probability flows j(g). In the case of three variables, g ) (g1, g2, g3), the g1-component of the flow at a point g is determined as g1′′-g1′>0

jg1(g) ) [



g1′′-g1′ C1), create a “stream tube”, which contains the C2 C1 fraction of the total flow. III. Results and Discussion A. Thermodynamics of Folding. Figure 2 presents the reduced heat capacity per bead as a function of temperature. There are two relatively well-defined peaks in CV, one is at T ≈ 0.7 and the other is at T ≈ 0.94. The heat capacity is rather broad, corresponding to a physical temperature range of 200°

Figure 2. The reduced heat capacity of the Fyn SH3 domain (per bead) as a function of temperature.

for each peak. References 58 and 60 found peaks for SH3 domains with the width in the 10 and 100° range, respectively. A recent study74 of six small proteins with a simplified physical potential function finds broad peaks in the range of 50 to 175°. The origin of the variation in the heat capacity of the various computational models is not clear. Experimentally, heat capacity peaks for Src SH343 and other small globular proteins73 range over 10 to 20°. To understand the origin of the CV variation with T, the contact probability maps at characteristic temperatures of Figure 2 are compared with that for the native structure (Figure 1c) (here and below, for convenience of comparison, the contacts for the native structure are shown in the lower triangles of the contact maps). Figure 3 shows such maps at two temperatures below the peaks (T ) 0.4 and T ) 0.54), at a temperature between the peaks (T ) 0.84), and at a temperature above both peaks (T ) 1.0). It can be seen that below the peaks (that is, at T ) 0.4 and T ) 0.54), all native contacts are formed with a probability close to unity. Between the peaks (T ) 0.84), the protein conformations differ from those below the peaks in that the contacts between the RT-loop and the β4 strand and between β1 and β5 strands are nearly absent. Finally, above the two peaks (T ) 1.0), all native contacts are formed with a low probability (less than 0.5), so that the protein can be regarded to be in an unfolded state. Thus, the peak in CV at T ≈ 0.7 can be associated with a partial (“surface”) melting of the native structure,65 and the peak at T ≈ 0.94 with overall protein denaturation. The first peak is in good correspondence with the peak in the heat capacity observed in ref 58 for a model c-Crk SH3 (T ) 0.63), and the second peak with that in ref 60 for a model Fyn SH3 (T ≈ 0.9). We performed simulations of protein folding at T ) 0.27, which is well within the native state stability region (Figure 2) and is slightly below the temperature at which the mean folding (first-passage) time has the minimum value (Tmin ≈ 0.33, data not shown). This temperature is close to the folding temperature Tf ≈ 0.26 found in ref 57 for a model Src SH3 domain (with Tmin ≈ 0.4). With the attractive energy  ) 2.2 kcal/mol (see Section IIA), the given value of T ) 0.27 corresponds to an absolute temperature of 300 K. We note that this implies that the melting temperature for the model protein, T ≈ 0.7, is ∼780 K, that is, it is approximately 2 times higher than a typical experimental temperature for protein heat denaturation (∼350 K for Src SH3 domain).43 This effect can be attributed to the simplicity of the protein model we use. First, it does not take into account the presence of the side chains, which make a dominant contribution to the increase of conformation entropy

Standard and Hydrodynamic Analyses of SH3 Domain

J. Phys. Chem. B, Vol. 113, No. 38, 2009 12763

Figure 3. Native contacts at T ) 0.4 (a), T ) 0.54 (b), T ) 0.84 (c), and T ) 1.0 (d) (upper triangles of the maps) in comparison with the contacts for the native structure (the lower triangles).

Figure 4. Equilibrium free energy surface at T ) 0.27. Radius of gyration is in angstroms, and the free energy is in kcal/mol.

when the protein unfolds,76 and second, the Goj-type potential is employed, which has an overly strong bias toward the native state. Figure 4 shows the “equilibrium” free energy surface (FES) for the protein at T ) 0.27. It uses the radius of gyration (r) and the fraction of native contacts (q) as the progress variables. To construct the surface, the probability for the system to be at a given point of the (q, r) space, P(q, r), was calculated with the help of the multiple histogram method71,72 in a manner similar to that used for calculation of the heat capacity. This probability was then converted into the free energy as

F(q, r) ) -kBT ln P(q, r) + C

(3)

The arbitrary constant C was determined by the condition that the entropy of the native state is equal to zero, because the native state presents a unique conformation, that is, F(qnat,rnat) ) Unat. The FES is downhill in agreement with the equilibrium FES at T ) 298 K, which was obtained with all-atom simulations25,54 and used the same progress variables as in Figure 4, except that the definition of the fraction of native contacts (F) was slightly different (a native contact was assumed to be formed if the distance between the centers of two side-chain groups was less than 6.5 Å with a tolerance 0.3 Å). However, the bias to the native state in the FES of Figure 4 is ∼5 times larger than that in the FES of refs 25 and 54; this effect presumably has the same origin as the previously discussed shift of the melting transition region to higher temperatures in Figure 2. B. The Folding Process. To follow the folding process, a set of 4.5 × 103 folding trajectories was calculated. Each trajectory started in a randomly generated fully extended state and terminated upon reaching the native state ensemble (no fewer than 98% of the native contacts were formed). The calculated probability for the system to be at the (q,r) point, P(q, r), was then transformed into the free energy F(q, r) according to eq 3. Such pseudo FESs characterize the kinetics of the first-passage to the native state rather than the stability of the native state and have been used in other studies.77,78 The resulting FES is shown in Figure 5a. It differs from the equilibrium FES (Figure 4) in that the contribution of the native state basin is negligible, because the trajectories were terminated upon reaching the native state, and two basins, centered at points 1 (0.77, 2.58) and 2 (0.87, 2.16) of the (q, r) plane, are observed. At the same time, the pseudo FES of Figure 5a, which is assumed to corresponds to T ) 300 K, is surprisingly similar

12764

J. Phys. Chem. B, Vol. 113, No. 38, 2009

Kalgin et al.

Figure 6. Folding time distribution.

Figure 5. Free energy surfaces based on (a) the complete set of folding trajectories and the subsets of (b) fast and (c) slow trajectories. The free energy is in kcal/mol.

to the equilibrium FES for Src SH3 domain at T ) 323 K, which was obtained in ref 54 with all-atom simulations. First, both the FESs have two basins centered approximately at the same values of the fraction of native contacts and the radius of gyration, and these basins are separated by barriers at the same value of the fraction of native contacts (∼0.8). Second, the variation of the free energy throughout the surfaces (from the unfolded state to the basin bottoms) is approximately the same (∼7 kcal/mol). It is of interest that the barrier at the fraction of native contacts F ) 0.8 in the all-atom simulations was associated with expulsion of water molecules between the

two hydrophobic sheets of the protein to close the hydrophobic core through the RT loop-β4 interaction, and a similar effect was observed early in ref 79, where a Goj-model augmented with a desolvation potential was used to characterize the folding of the SH3 domain. In the present simulations, which do not take into account the desolvation effects, the barrier at q ≈ 0.8 can be associated with a lost of conformation entropy that accompanies formation of the contacts between the RT-loop and β4-strand (see, e.g., Figure 9 below). 1. Standard Projection Analysis. The distribution of folding (first-passage) times, Figure 6, shows that the folding kinetics differ from two-state kinetics, which would be a simple straightline on this semilog plot. The simulation data are well fitted by two exponentials with the characteristic times for the fast and slow trajectories equal to τfast ≈ 5 × 102 and τslow ≈ 2.3 × 104 (Figure 6), respectively. A similar separation of folding trajectories into fast and slow trajectories was observed at T < Tf in refs 59 and 61. In particular, in ref 61 where the kinetic Monte Carlo simulations of folding of a CR Goj-type model of src SH3 domain with a more realistic potential (different residues interacted with different energies) was performed, the ratio of the characteristic times τfast/τslow was of the same order as found here (0.65 × 10-2 in ref 61 and 2.2 × 10-2 in Figure 6). The mean folding time tf is approximately equal to 2.8 × 103. With the theoretical value of the time scale τ ) 1.5 ps (Section IIA; but see below), tf is obtained to be ∼10 ns. It is approximately 106 times shorter than the mean folding time obtained in the stopped-flow experiments for Fyn47,46 and Src43,46 SH3 domains. This divergence between the simulations and experiment can be attributed in part to the simplifications in the protein model that have been mentioned above in the context of the thermodynamics behavior. One is the Goj-type potential, which has an overly strong bias toward the native state, and the other is the bead representation of the protein, which neglects the contribution of the side-chains to the friction of the residues against the surroundings. In addition, there is no solvent friction in the present model; this has been suggested to slow down folding in all-atom implicit solvent models by a factor of 102. All of these factors make it easier to perform the large number of trajectories required for obtaining statistically meaningful results. A more meaningful estimate for the time scale τ can be obtained by comparing of the times that characterize elementary steps of folding in simulations and experiments.65 The simulated time of formation of the RT-loop is ∼70 ps (data not shown), and the experimental time of formation of short loops (∼10

Standard and Hydrodynamic Analyses of SH3 Domain residues) is ∼1 µs.80 Equating these times gives τ ∼ 0.01 µs. The simulated and experimental times for formation of β-hairpins can also be compared. In the simulations, the hairpins (β1-β2, β2-β3 and β3-β4) are formed approximately within 102ps (data not shown), whereas the corresponding experimental time is estimated as ∼10 µs;81 this gives τ ∼ 0.1 µs. For comparison with the experimental results of refs 43, 46, and 47, the fast trajectories in Figure 6, which have the characteristic time τfast < 0.1 ms, should not be considered, because the deadtime in the stopped-flow experiments is typically >1 ms. Correspondingly, the mean folding time tf should be associated with the slow trajectories (τslow ) 2.3 × 104τ), which gives tf ∼(0.1-1) ms, a value in reasonable correspondence with the experimental times for Fyn SH3 (10 ms47 and 13 ms46) and Src SH3 (17 ms43 and 50 ms46). To determine if different trajectories can be associated with different basins, the free energy surface of Figure 5a was divided into two surfaces, Figure 5b,c, which are based on the fast (t e 3.5 × 103) and slow (t > 3.5 × 103) trajectories, respectively (the corresponding numbers of the trajectories are 4160 and 340). We note that the separation of the trajectories into fast and slow trajectories is somewhat arbitrary because of the presence of “fast” trajectories with t > 3.5 × 103 and “slow” trajectories with t e 3.5 × 103; the analysis based on the prolongation of the exponentials with the decay times τfast and τslow to longer and shorter times, respectively, indicates that up to 0.2% of the fast trajectories and 12% of the slow trajectories could be associated with a wrong set. Figure 5b,c shows that both basins of Figure 5a are present in each FES. The basin centered at 1 (0.77, 2.58) does not experience an essential transformation, while the basin centered at 2 (0.87, 2.16) is different for the fast and slow trajectories. For the slow trajectories, it retains its position and general shape approximately (Figure 5c), whereas for the fast trajectories (Figure 5b), it splits in two smaller and shallower basins, centered at 3 (0.84, 2.19) and 4 (0.91, 2.19). To establish the relation between the FES and the conformations of the protein, contact maps were built at the points of the (q, r) plane that represent the above-mentioned basins. Figure 7a,b shows the contact maps for the basins at 1 (0.77, 2.58) for the fast and slow trajectories, respectively. Specifically, the probabilities of the contacts were calculated at the point (0.77, 2.58), which corresponds approximately to the center of the basin. It is seen that the two maps are practically identical. Comparing these maps with the map for the native state, one can see that all native contacts have been formed except for those between the C- and N terminus β-strands (β1 and β5) and between the RT-loop and β4-strand (an example of the conformations in basin no. 1 is shown in Figure 7c). Thus, the structure is very similar to the equilibrium structure at T ) 0.84 (see Figure 3). Since the FES is downhill on going from the unfolded state to basin no. 1, it follows that all contacts present in Figure 7a,b form easily without occurrence of intermediates that would require partial unfolding of the protein. However, formation of the rest of the native structure requires crossing of barriers so that the system dwells for some time in basin no. 1, until it finds pathways to proceed to the native state. According to the FESs of Figure 5b,c, these pathways are different for the fast and slow trajectories. Figure 8 shows the contact maps for basins nos. 2, 3, and 4 of Figure 5b,c. In all cases, the structures are near native and primarily differ in the RT-β4 and β1-β5 contacts, the ones formed later in the folding process. Basins nos. 2 and 3, corresponding to the fast trajectories, differ in that in basin no.

J. Phys. Chem. B, Vol. 113, No. 38, 2009 12765

Figure 7. Contact maps for basin no. 1 of Figures 5b,c, panels a and b, respectively, and an example of conformations in basin no. 1 (panel c).

12766

J. Phys. Chem. B, Vol. 113, No. 38, 2009

Figure 8. Contact maps for basins nos. 2, 3, and 4 of Figure 5b,c (panels a, b and c, respectively).

2 the contacts between the β1 and β5 strands are absent or present but with a very low probability, while in basin no. 3 the contacts between the RT-loop and β4-strand are missing. This suggests that there are two essentially independent routes in the fast formation of the native structure; in one case, the RT-loop attaches to the β4-strand and then the β1-β5-sheet is formed, while in the other case these events occur in reverse order. In basin no. 4, which is present in the slow trajectories, the contact map (Figure 8c) shows that the β1-β5-sheet is partly formed while the RT-loop is not attached to the β4-strand. It follows that there exist two ways, very different in time, to pass

Kalgin et al.

Figure 9. Average number of native contacts between the structural elements of the chain vs the fraction of native contacts. (a) All trajectories, (b) fast trajectories, and (c) slow trajectories.

from nearly the same conformations, corresponding to basin no. 1, to the native state. To see in which order the native contacts are formed in the downhill region of the FES that precedes basin no. 1 (Figure 5), the average numbers of native contacts between different elements of the protein chain were determined as a function of the fraction of native contacts (q); they are shown in Figure 9a-c. The contacts were grouped according to the native contact map of Figure 1c. Figure 9a shows the numbers of the contacts

Standard and Hydrodynamic Analyses of SH3 Domain averaged over 103 folding trajectories, and Figure 9b,c shows the numbers of the contacts averaged over the fast and slow trajectories included in these 103 folding trajectories (their fractions are approximately 90 and 10%, respectively). To separate the trajectories, the same value of the folding time, t ) 3.5 × 103, was used as for the separation of the basins in Figure 5b,c. At q below 0.7, which corresponds to the downhill region of the FES, the behavior of the contacts for the fast and slow trajectories is very similar; initially, both the trajectories form the β2-β3 sheet, the RT-loop, and the contacts between the distal hairpin and β3 strand, a little later the contacts between the n-src loop and β4 strand, and finally the β1-β2 sheet. As a result, as the system reaches basin no. 1 (q ≈ 0.77), these elements of the secondary structure of the protein are organized into the native fold. The order of formation of the native contacts found here is in agreement with the results of experiments43-45 and previous simulation studies,25,53-59,61,62 for example, the β2-β3-β4 sheet, which presents the most structured element of the protein chain, is formed first. In contrast, the β1-β5-RT sheet, which completes the folding process, is formed differently in the fast and slow trajectories. In the fast trajectories, Figure 9b (together with the contact maps of Figure 8a,b) suggests the following two possible scenarios of folding: Scenario No. 1. The RT-loop attaches to the β4 strand, which moves the protein from basin no. 1 (q ≈ 0.77) to basin no. 2 (q ≈ 0.84). In basin no. 2, the protein undergos a rearrangement, such that the contacts between the RT-loop and β4 strand break and the β1-β5 sheet is partly formed. Finally, after partial rearrangement of the protein in basin no. 3 (q ≈ 0.91), where the formation of the β1-β5 sheet is completed, the RT-loop attaches to β4 strand again, which brings the protein to the native state. Scenario No. 2. The β1-β5 sheet is formed first, followed by the attachment of the RT-loop to β4 strand. According to Figures 9c and 8c, these scenarios of folding are also possible for the slow trajectories, except that the most of the β1-β5 sheet is formed immediately after the system leaves basin no. 1. As is seen from Figure 9, at q ≈ 0.8 a few contacts between β2 and β3 strands and the β4 stand and the n-src loop break, that is, a partial decomposition of the β2-β3-β4 sheet occurs. This value of q corresponds to the free energy barriers which separate basin no. 1 from basins nos. 2, 3, and 4 in the FESs for the fast and slow trajectories (Figure 5b,c). Although we have not done a pfold analysis,82 similar to that performed in refs 53, 59, and 62, these results suggest that the protein states at q ≈ 0.8 can be associated with the transition state ensemble. Its structure corresponds to that from experiment and other simulations (refs 45 and 53, 54, and 62 respectively); that is, the β2-β3-β4 sheet is mostly formed, while the rest of the protein is mostly unstructured. 2. Hydrodynamic Analysis. Further insight into the final stages of the folding process is obtained by application of the hydrodynamic description40 (Section IIC). We start with the analysis of the process in the space of two variables, g ) (g1, g2) with g1 and g2 corresponding to the fraction of native contacts q and the radius of gyration r, respectively. Figures 10 and 11 show the “streamlines” of the folding flows for the fast and slow trajectories, which are superimposed on the FESs of Figure 5b,c. To obtain the numbers of transitions n(q′′, r′′; q′, r′), which are required to calculate the flow vectors, eq 1, and the streamlines, eq 2, we used the same ensembles of the fast and slow trajectories as those employed to construct the FESs. The

J. Phys. Chem. B, Vol. 113, No. 38, 2009 12767

Figure 10. (a) Free energy surface and streamlines for the fast trajectories. Black and red lines show the streamlines, with the red line separating the flows for two different scenarios of folding (see the text). The figures at the streamlines denote the values of the stream function Ψ(q, r) (eq 2). The free energy is in kcal/mol. (b) Schematic illustration of formation of β1-β5 sheet (the upper arrow) and the attachment of the RT-loop to β4-strand (the lower arrow).

discretization intervals dq and dr were chosen to obtain good resolution along the r and q axes, specifically, dq ) 7.35 × 10-3 (which corresponds to the change of the native contacts by 1) and dr ) 0.03 (0.13 Å). Since every trajectory started at the unfolded state reaches the native state and is terminated at that point, the total flow toward the native state is the same through each q-cross-section. Thus, the process of folding can be considered as a steady flow of representative points of the system from the unfolded state to the native state. Figure 10a shows that most of the flow (∼80%) is concentrated in a narrow region of the FES, forming a “beam” of the streamlines from the unfolded state to the native state; a similar effect was observed for the R-helical hairpin.40 Since the number of transitions from a given (q, r) point to all other points of the (q, r) plane is proportional to the probability of visiting the (q, r) point, small flows in a region of conformation space where the free energy is of the same order as in the region where the flows are large, imply that the forward and backward transitions in that region essentially cancel each other. Because of this, the folding flows do not follow the contours of the FES; that is, the streamlines can cross the boundaries of the basins on the FES. On the basis of the above given consideration of the contact maps (Figure 8a,b) and the sequence of formation of native contacts (Figure 9b), the total flow in Figure 10a can be approximately separated into two flows. One flow, containing approximately two thirds of the total, goes through basin no. 2 and partly crosses basin no. 3; it is associated with folding scenario no. 1. The other flow, containing approximately one

12768

J. Phys. Chem. B, Vol. 113, No. 38, 2009

Kalgin et al.

Figure 11. (a) Free energy surface and streamlines for the slow trajectories. Black and red lines show the streamlines; the unclosed lines with the figures indicating the values of the stream function Ψ(q, r) (eq 2) show the streamlines leading to the native state, and the closed lines show the vortices. White dots indicate places where the contact maps were calculated. The free energy is in kcal/mol. (b) Schematic illustration of attachment of the RT-loop to the β4 strand (the arrow). The white dotted line along the protein chain shows a wrong position of the C-terminus, which hinders the attachment of the RT-loop to the β4 strand. (c) Typical contact map for basin no. 4.

third of the total, goes solely through basin no. 3 and is associated with scenario no. 2. Figure 10b illustrates the formation of the β1-β5 sheet and the attachment of the RTloop to β4 strand. Figure 11a presents the corresponding picture, that is, the superposition of the streamlines on the FES, for the slow folding trajectories. In contrast to the fast trajectories (Figure 10a), local regions with closed streamlines are observed in basin no. 4. In hydrodynamics,41 a closed streamline is associated with vortex motion generated by opposite directed local flows; in the present case this corresponds to local folding flows jq(q, r), determined by eq 1. Positive values of jq(q, r) increase the stream function Ψ(q, r), eq 2, whereas the negative values of jq(q, r) decrease it. Therefore, as the increase (decrease) of Ψ(q, r) on a certain interval of r is compensated by its decrease (increase) on the neighboring interval, Ψ(q,r) returns to the initial value and encloses the streamline.40 The vortex motion implies that the system dwells in the region occupied by the vortex. This is well demonstrated by comparing Figures 10a and 11a; the fast trajectories pass through basins no. 2 and no. 3 without vortices, whereas the slow trajectories, for which vortices are present, dwell in basin no. 4 for a relative long time (2.3 × 104 versus 5 × 102 for the fast trajectories, Figure 6). It should be

emphasized that a closed streamline does not mean that the system is trapped in the vortex region. Since the streamlines are associated with the net flows but not with the elementary transitions (Section IIC), elementary transitions across the streamline exist, allowing the system to leave one vortex and get into another. As has been shown in ref 40, vortex motion results from repeated attempts of the system to rearrange its structure when the system reaches a conformation from which it is difficult to proceed to the native state without a considerable realignment of the elements of secondary structure. To see why the rearrangement of the protein chain is difficult in the given case, we calculated contact maps at several randomly chosen points within basin no. 4; they are indicated in Figure 11a by white markers. In contrast to the previous contact maps, where only the native contacts were taken into consideration, we included both native and non-native contacts. The condition of formation of a non-native contact was the same as for the native contact (Section IIA). All contact maps were found very similar. Figure 11c presents one of them, which was calculated at the point (0.88, 2.28) of the (q, r) space by averaging over 250 slow trajectories. It can be seen that the contacts that have a high probability differ from the native contacts (the lower triangle

Standard and Hydrodynamic Analyses of SH3 Domain

J. Phys. Chem. B, Vol. 113, No. 38, 2009 12769

Figure 12. Three dimensional flow field for the fast trajectories. (a) The q-component of the flow, jq, and (b) the flow vectors, j. Color levels (panel a) correspond to different values of jq, blue (0.007), green (0.014), yellow (0.021), orange (0.029), and red (0.036).

of the map) in that the RT-β4 contacts and partly the β1-β5 contacts are not formed, and a few non-native contacts appear between the β5 strand and the RT-loop, diverging turn and β4 strand, and between the β1 and β3 strands. It follows that the conformation from which it is difficult to proceed to the native state presents the conformation in which β5 strand, and partly the β1 strand, shift toward the protein core and thus hinder the RT-loop to attach to the β4 strand (Figure 11b). The other nonnative contacts, which have low probabilities, can be associated with various attempts of the protein to rearrange its structure; the distributions of these contacts are slightly different at different points of basin no. 4. A clearer picture of the flows in the final stage of folding is obtained in a 3D space of collective variables, which is composed of the fraction of native contacts and two variables that represent the numbers of native contacts between the RTloop and β4 strand (NRT-β4) and between β1 and β5 strands (Nβ1-β5), that is, the point g of the 3D conformation space (Section IIC) is defined as g ) (q, NRT-β4, Nβ1-β5). In the native state NRT-β4 ) 9 and Nβ1-β5 ) 19. Figures 12 and 13 show the q-component of the flow jq(g) and the flow vectors j(g) for the fast and slow trajectories, respectively. The discretization interval along each axis corresponds to the change of the native contacts by 1. Figure 12 makes evident that the two previously discussed routes for the fast trajectories, corresponding to folding scenarios nos. 1 and 2, do exist, and they are essentially independent. At q < 0.78, approximately 90% of the total flow is directed along the coordinate box edge (q, 0, 0), with the small rest of the nat , 0) and (q, 0, flow distributed between the edges (q, NRT-β4 nat Nβ1-β5) (∼7 and ∼3%, respectively); that is, the native contacts between the RT-loop and β4 strand and between β1 and β5 strands are very rare. At q ≈ 0.78, which corresponds approximately to basin no. 1 (Figure 5b), the main flow bifurcates in two smaller flows. One flow proceeds initially along the

Nβ1-β5 ) 0 cube face, forming the RT-β4 contacts, and then nat face, forming the β1-β5 contacts; along the NRT-β4 ) NRT-β4 the formation of the β1-β5 contacts is accompanied by (partial) breaking and subsequent formation of the RT-β4 contacts. This flow corresponds to folding scenario no. 1. The other flow proceeds along the NRT-β4 ) 0 and Nβ1-β5 ) Nnat β1-β5 faces, forming, respectively, the β1-β5 and RT-β4 contacts, that is, it corresponds to the scenario no. 2. Separating the flows by the plane NRT-β4 ) 5 and counting the number of transition through the plane q ) 0.9, we find that the flows corresponding to scenarios nos. 1 and 2 contain, respectively, 37 and 63% of the total flow. These figures correct the previously given estimate for the fractions of these flows, which was based on the projection of the flows on the 2D FES (Figure 10) and suggested for them ∼67 and ∼33%, respectively. The corresponding 3D hydrodynamic picture for the slow trajectories is shown in Figure 13. In contrast to the fast trajectories (Figure 12), the “input” flow, directed along the box edge (q, 0, 0), initially proceeds along the NRT-β4 ) 0 face, forming the β1-β5 contacts. At q ≈ 0.87, which corresponds to basin no. 4 in Figure 11a, the flow is separated into two flows. One flow seems to follow folding scenario no. 2 for the fast trajectories, that is, the rest of the β1-β5 sheet and the RT-β4 contacts are successively formed. However, since Figure 13 represents slow trajectories, the system can not directly continue the motion along the NRT-β4 ) 0 face; it should spend additional time elsewhere. This happens in the attempts of the system to attach the RT-loop to the β4 strand with the β1-β5 sheet being partly formed (Nβ1-β5 ≈ 11 - 15). If the attempt(s) is(are) successful, that is, the native RT-β4 contacts are formed, the formation of the β1-β5 sheet is completed along the NRT-β4 ) Nnat RT-β4 face. If the attempt is unsuccessful, the system either tries again or proceeds along the NRT-β4 ) 0 face, forming the rest of the β1-β5 sheet and then the RT-β4 contacts. Thus, in addition to the previously discussed folding scenarios nos. 1

12770

J. Phys. Chem. B, Vol. 113, No. 38, 2009

Kalgin et al.

Figure 13. Three dimensional flow field for the slow trajectories. (a) The q-component of the flow, jq, and (b) the flow vectors, j. Color levels (panel a) correspond to different values of jq, blue (-0.03), green (0.03), yellow (0.15), orange (0.23), and red (0.3). The lengths of the flow vectors (panel b) are given in the decimal logarithm scale.

and 2, two scenarios specific of the slow trajectories are possible in the final stage of folding. Scenario No. 3. The β1-β5 sheet is partly formed, then the RT-β4 contacts (presumably in many attempts), and finally the rest of the β1-β5 sheet. Scenario No. 4. The β1-β5 sheet is partly formed, which is followed by unsuccessful attempts to form the RT-β4 contacts. After that, the rest of the β1-β5 sheet and the RT-β4 contacts are successively formed. The fractions of the flows corresponding to folding scenarios nos. 3 and 4 are ∼42 and ∼58%, respectively; they were calculated by separation of the flows by the plane NRT-β4 ) 1 at q ) 0.93. The attempts of formation of the RT-β4 contacts are well displayed by the interchange of positive and negative values of jq in Figure 13a (the green and blue colored levels, respectively). In particular, it is seen that the appearance of the RT-β4 contacts (green colored levels) is accompanied by the negative values of jq at Nβ1-β5 ≈ 11-15 in the NRT-β4 ) 0 face (blue colored levels). This interchange of positive and negative values of jq results in the vortex flows in basin no. 4 in Figure 11a. IV. Conclusion Using a CR based Goj-model to represent the protein and a discrete molecular dynamics method to perform the simulations, we have studied folding of a SH3 domain at a temperature within the native state stability region. With the MiyazawaJernigan potential,66,67 this temperature is close to T ) 300 K. We have found that the equilibrium free energy surface (FES), as a function of the fraction of native contacts (q) and the radius of gyration (r), is downhill in agreement with the results of allatom simulations at this temperature.25,54 However, the FES

constructed on the basis of folding trajectories, that is, the trajectories started in an unfolded state of the protein and terminated upon reaching the native state, has revealed two catchment basins in the final stage of folding, centered at q ≈ 0.77 and q ≈ 0.9, which are separated by a barrier at q ≈ 0.8. Interestingly, this pseudo FES is very similar to the equilibrium FES for Src SH3 domain at T ) 323 K, which was obtained in ref 54 with all-atom simulations. The folding time distribution has been found to correspond to a double-exponential with drastically different decay times. The results agree with some simulations,59,61 but contrasts with the experimental observations that SH3 domains have two-state folding kinetics.43-47 The fractions of the fast and slow trajectories are ∼92 and ∼8%, respectively. Thus, according to the model, even though the folding is dominated by the fast trajectories, only the slow ones were accessible in the stopped-flow experiments43,46,47 because of the large value of the dead time (>1 ms). We have found that the initial stage of the folding process follows the diffusion-collision mechanism5,83 and the “framework model”,8 that is, the secondary structure elements are formed initially and subsequently assemble to obtain the tertiary structure. The analysis of the contact maps and the variation of the numbers of native contacts with q has shown that in the region preceding the catchment basins, where the FES is downhill, the β2-β3 sheet, the RT-loop and the contacts between the distal hairpin and β3 strand are formed initially; then, a little later the contacts between the n-src loop and β4 strand appear, and finally the β1-β2 sheet. This order of formation of secondary structure elements is the same for the fast and slow trajectories and agrees well with the results of experiments43-45,47 and previous simulation studies,25,53-59,61,62

Standard and Hydrodynamic Analyses of SH3 Domain that is, in particular, the β2-β3-β4 sheet, which presents the most structured element of the protein chain, is formed first. The final stage of folding, in which the formation of the β1-β5-RT sheet completes the folding process, has been found to be complex. In this stage, the system passes through the catchment basins on the FES, which are partly different for the fast and slow trajectories. The states corresponding to the free energy barrier at q ≈ 0.8, which separates the basins centered at q ≈ 0.77 and q ≈ 0.9, have been found to be very similar to the transition state ensembles characterized in experiments and simulations (refs 45 and 53, 54, and 62 respectively); that is, the β2-β3-β4 sheet is mostly formed, while the rest of the protein is mostly unstructured. The 2D hydrodynamic picture of folding, constructed in the (q, r) space, has shown that the folding flows do not follow the contours of the FES. Also, the flows for the fast and slow trajectories differ in that the former are smooth and directed to the native state, whereas the latter have vortices in the catchment basin preceding the native basin. A deeper insight into the mechanism of folding has been obtained with a hydrodynamic analysis of folding in a 3D space of collective variables composed of the fraction of native contacts and two variables that characterize specifically the formation of the elements of the β1-β5-RT sheet; they are the number of native contacts between the RT-loop and β4 strand and between the terminal β1 and β5 strands. The 3D picture made possible the calculation of folding flows corresponding to the different folding scenarios. The fast trajectories involve two essentially independent routes, which are characterized by reverse order of formation of the β1-β5 and RT-β4 contacts. The route, in which the β1-β5 sheet is formed first, occurs in ∼63% of the folding flows for the fast trajectories, and the other, in which RT-β4 contacts are initially formed, in ∼37% of the flows. The slow trajectories also involve two routes, but they are not independent. With a few of the β1-β5 contacts having been formed, the system attempts to attach the RT-loop to the β4 strand. If the attempts are successful, the RT-β4 contacts are formed and then the formation of the rest of the β1-β5 sheet is completed; if not, the rest of the β1-β5 sheet is formed and then the RT-loop attaches to the β4 strand. These routes represent, respectively, ∼42 and ∼58% of the folding flows for the slow trajectories. Comparison of the characteristic times of formation of secondary structural elements in the present simulations and experiments80,81 (specifically, the loops and β-hairpins were examined) indicates that the time scale of the simulations corresponds to τ ∼(0.01-0.1) µs. With this value of τ, the decay time for the fast trajectories is less than ∼0.1 ms, which is, at least, ten times shorter than the dead-time in the stopped-flow experiments of refs 47, 43, and 46, where two-state kinetics were observed for SH3 domains. This suggests that only the slow folding trajectories were detected in these papers. With the given value of τ, these trajectories have the decay time ∼(0.1-1) ms, which is in reasonable agreement with the experimental times for Fyn (10 ms47 and 13 ms46) and Src (17 ms43 and 50 ms46) SH3 domains. Accordingly, this suggests that the observed folding rates of the SH3 domains are determined by the final stage of folding; more specifically, by the attempts to attach the RT-loop to the β4 strand with the β1-β5 sheet only partly formed. Acknowledgment. We thank A. Palyanov for useful discussions and a referee for bringing ref 42 to our attention. This work was supported in part by the grant from the U.S. Civilian Research and Development Foundation (CRDF) and the Russian Foundation for Basic Research (RFBR) (CRDF CGP-RFBR

J. Phys. Chem. B, Vol. 113, No. 38, 2009 12771 II, RUB2-2913-NO-07/08-04-91104). The research at Harvard was supported in part by a grant from the National Institutes of Health. References and Notes (1) Service, R. F. Science 2008, 321, 784–786. (2) Dinner, A. R.; Sˇali, A.; Smith, L. J.; Dobson, C. M.; Karplus, M. Trends Biochem. Sci. 2000, 25, 331–339. (3) Protein Dynamics and Folding (Thematic Issue). Chem. ReV. 2006, 106, 1543-1977. (4) Levinthal, C. In Mossbauer Spectroscopy in Biological Systems; Debrunner, P., Tsibris, J. C. M., Munck E., Eds.; University of Illinois Press: Urbana, IL, 1969; pp 22-24. (5) Karplus, M.; Weaver, D. L. Nature 1976, 260, 404–406. (6) Dill, K. A. Biochemistry 1985, 24, 1501–1509. (7) Kim, P. S.; Baldwin, R. L. Annu. ReV. Biochem. 1990, 59, 631– 660. (8) Ptitsyn, O. B. Dokl. Akad. Nauk SSSR 1973, 210, 1213–1215. (9) Goj, N. Annu. ReV. Biophys. Bioeng. 1983, 12, 183–210. (10) Harrison, C.; Durbin, R. Proc. Natl. Acad. Sci. U.S.A. 1985, 82, 4028–4030. (11) Bryngelson, J. D.; Wolynes, P. G. J. Phys. Chem. 1989, 93, 6902– 6915. (12) Leopold, P. E.; Montal, M.; Onuchic, J. N. Proc. Natl. Acad. Sci. U.S.A. 1992, 89, 8721–8725. (13) Wolynes, P. G.; Onuchic, J. N.; Thirumalai, D. Science 1995, 267, 1619–1620. (14) Lazaridis, T.; Karplus, M. Science 1997, 278, 1928–1931. (15) Dill, K. A.; Chan, H. S. Natl. Struct. Biol. 1997, 4, 10–19. (16) (a) Sˇali, A.; Shakhnovich, E.; Karplus, M. J. Mol. Biol. 1994, 235, 1614–1636. (b) Sˇali, A.; Shakhnovich, E.; Karplus, M. Nature (London) 1994, 369, 248-251. (17) Baldwin, R. L. Nature (London) 1994, 369, 183–184. (18) Pande, V. S.; Grosberg, A. Y.; Tanaka, T. Proc. Natl. Acad. Sci. U.S.A. 1994, 91, 12972–12975. (19) Guo, Z. Y.; Thirumalai, D. Biopolymers 1995, 36, 83–102. (20) Zwanzig, R. Proc. Natl. Acad. Sci. U.S.A. 1995, 92, 9801–9801. (21) Boczko, E. M.; Brooks, C. L., III Science 1995, 269, 393–396. (22) Fersht, A. R. Curr. Opin. Struct. Biol. 1997, 7, 3–9. (23) Onuchic, J. N.; Luthey-Schulten, Z.; Wolynes, P. G. Annu. ReV. Phys. Chem. 1997, 48, 545–600. (24) Dobson, C. M.; Sˇali, A.; Karplus, M. Angew. Chem., Int. Ed. 1998, 37, 869–893. (25) Shea, J.-E.; Brooks III, C. L. Annu. ReV. Phys. Chem. 2001, 52, 499–535. (26) Becker, O. M.; Karplus, M. J. Chem. Phys. 1997, 106, 1495–1517. (27) Krivov, S. V.; Chekmarev, S. F.; Karplus, M. Phys. ReV. Lett. 2002, 88, 038101. (28) Krivov, S. V.; Karplus, M. J. Chem. Phys. 2002, 117, 10894–10903. (29) Evans, D. A.; Wales, D. J. J. Chem. Phys. 2003, 118, 3891–3897. (30) Gavrilov, A. V.; Chekmarev, S. F. In Bioinformatics of Genome Regulation and Structure; Kolchanov, N., Hofestaedt, R., Eds.; Kluwer Academic Publishers: Boston, MA, 2004; pp 171-178. (31) Ball, K. D.; Berry, R. S.; Kunz, R. E.; Li, F.-Y.; Proykova, A.; Wales, D. J. Science 1996, 271, 963–966. (32) Wales, D. Energy Landscapes; Cambridge University Press: Cambridge, 2003. (33) Krivov, S. V.; Karplus, M. Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 14766–14770. (34) Palyanov, A. Yu.; Krivov, S. V.; Karplus, M.; Chekmarev, S. F. J. Phys. Chem. B 2007, 111, 2675–2687. (35) Chodera, J. D.; Singhal, N.; Pande, V. J.; Dill, K. A.; Swope, W. C. J. Chem. Phys. 2007, 126, 155101. (36) Noe´, F.; Horenko, I.; Schu¨tte, C.; Smith, J. C. J. Chem. Phys. 2007, 126, 155102. (37) Muff, S.; Caflisch, A. J. Phys. Chem. B 2009, 113, 3218–3226. (38) Ma, A.; Nag, A.; Dinner, A. R. J. Chem. Phys. 2006, 124, 144911. (39) Maragliano, L.; Fischer, A.; Vanden-Eijnden, E. J. Chem. Phys. 2006, 125, 024106. (40) Chekmarev, S. F.; Palyanov, A. Yu.; Karplus, M. Phys. ReV. Lett. 2008, 100, 018107. (41) Landau, L. D.; Lifshitz, E. M. Fluid Mechanics; Pergamon: New York, 1987. (42) Metzner, P.; Schu¨tte, C.; Vanden-Eijnden, E. J. Chem. Phys. 2006, 125, 084110. (43) Grantcharova, V. P.; Baker, D. Biochemistry 1997, 36, 15685– 15692. (44) Grantcharova, V. P.; Riddle, D. S.; Santiago, J. V.; Baker, D. Nat. Struct. Biol. 1998, 5, 714–720. (45) Riddle, D. S.; Grantcharova, V. P.; Santiago, J. V.; Alm, E.; Ruczinski, I.; Baker, D. Nat. Struct. Biol. 1999, 6, 1016–1024.

12772

J. Phys. Chem. B, Vol. 113, No. 38, 2009

(46) Li, J.; Shinjo, M.; Matsumura, Y.; Morita, M.; Baker, D.; Ikeguchi, M.; Kihara, H. Biochemistry 2007, 46, 5072–5082. (47) Plaxco, K. W.; Guijarro, J. I.; Morton, C. J.; Pitkeathly, M.; Campbell, I. D.; Dobson, C. M. Biochemistry 1998, 37, 2529–2537. (48) Korzhnev, D. M.; Salvatella, X.; Vendruscolo, M.; Di Nardo, A. A.; Davidson, A. R.; Dobson, C. M.; Kay, L. E. Nature 2004, 430, 586–590. (49) Neudecker, P.; Zarrine-Afsar, A.; Choy, W.-Y.; Muhandiram, D. R.; Davidson, A. R.; Kay, L. E. J. Mol. Biol. 2006, 363, 958–976. (50) Viguera, A. R.; Serrano, L.; Wilmanns, M. Nat. Struct. Biol. 1996, 10, 874–880. (51) Martinez, J. C.; Viguera, A. R.; Berisio, R.; Wilmanns, M.; Mateo, P. L.; Filimonov, V. V.; Serrano, L. Biochemistry 1999, 38, 549–559. (52) Wu, X. D.; Knudsen, B.; Feller, S. M.; Zheng, J.; Sˇali, A.; Cowburn, D.; Hanafusa, H.; Kuriyan, J. Structure 1995, 3, 215–226. (53) Gsponer, J.; Caflisch, A. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 6719–6724. (54) Guo, W.; Lampoudi, S.; Shea, J.-E. Proteins: Struct., Funct., Bioinform. 2004, 55, 395–406. (55) Guo, W.; Lampoudi, S.; Shea, J.-E. Biophys. J. 2003, 85, 61–69. (56) Mitomo, D.; Nakamura, H.; Ikeda, K.; Yamagishi, A.; Higo, J. Proteins: Struct., Funct., Bioinform. 2006, 64, 883–894. (57) Hoang, T. X.; Cielpak, M. J. Chem. Phys. 2001, 113, 8319–8328. (58) Borreguero, J. M.; Dokholyan, N. V.; Buldyrev, S. V.; Shakhnovich, E. I.; Stanley, H. E. J. Mol. Biol. 2002, 318, 863–876. (59) Borreguero, J. M.; Ding, F.; Buldyrev, S. V.; Stanley, H. E.; Dokholyan, N. V. Biophys. J. 2004, 87, 521–533. (60) Ollerenshaw, J. E.; Kaya, H.; Chan, H. S.; Kay, L. E. Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 14748–14753. (61) Sutto, L.; Tiana, G.; Broglia, R. A. Protein Sci. 2006, 15, 1638– 1652. (62) Lam, A. R.; Borreguero, J. M.; Ding, F.; Dokholyan, N. V.; Buldyrev, S. V.; Stanley, H. E.; Shakhnovich, E. I. J. Mol. Biol. 2007, 373, 1348–1360. (63) Musacchio, A.; Saraste, M.; Wilmanns, M. Nat. Struct. Biol. 1994, 1, 546–551. (64) Zhou, Y.; Karplus, M. Nature 1999, 401, 400–403. (65) Zhou, Y.; Karplus, M. J. Mol. Biol. 1999, 293, 917–951.

Kalgin et al. (66) Miyazawa, S.; Jernigan, R. L. Macromolecules 1985, 18, 534–552. (67) Miyazawa, S.; Jernigan, R. L. J. Mol. Biol. 1996, 256, 623–644. (68) In refs 66 and 67, the contact energies between the residues are given in RT units (R is the universal gas constant). To translate from these units to kcal/mol, the reference temperature should be specified. References 66 and 67 implied that the melting temperature and room temperature, respectively, were used as the reference temperature (see ref 67). Then, the average energy per contact from ref 66 is er ) 3.18RT ≈ 2.2 kcal/mol (the experimental melting temperature for src SH3 domain is T ≈ 350 K)43 and from ref 67 er ) 3.6RT ≈ 2.14 kcal/mol. (69) Rapaport, D. C. J. Phys. A: Math. Gen. 1978, 11, L213–L216. (70) Rapaport, D. C. J. Chem. Phys. 1979, 71, 3299–3303. (71) Labastie, P.; Whetten, R. L. Phys. ReV. Lett. 1990, 65, 1567–1570. (72) Weerasinghe, S.; Amar, F. G. J. Chem. Phys. 1993, 98, 4967– 4983. (73) Privalov, P.; Gill, S. J. AdV. Protein Chem. 1988, 39, 191–234. (74) Ding, F.; Tsao, D.; Nie, H.; Dokholyan, N. V. Structure 2008, 16, 1010–1018. (75) Landau, L. D.; Lifshitz, E. M. Statistical Physics, Part. 1; Pergamon: New York, 1980. (76) Creamer, T. P. Proteins: Struct., Funct., Genet. 2000, 40, 443– 450. (77) Chekmarev, S. F.; Krivov, S. V.; Karplus, M. J. Phys. Chem. B 2005, 109, 5312–5330. (78) Krivov, S. V.; Muff, S.; Caflisch, A.; Karplus, M. J. Phys. Chem. B 2008, 112, 8701–8714. (79) Cheung, M. S.; Garcia, A. E.; Onuchic, J. N. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 685–690. (80) Hagen, S. J.; Hofrichter, J.; Scabo, A.; Eaton, W. A. Proc. Natl. Acad. Sci. U.S.A. 1996, 93, 11615–11617. (81) Mun˜oz, V.; Thompson, P. A.; Hofrichter, J.; Eaton, W. A. Nature 1997, 390, 196–199. (82) Du, R.; Pande, V. S.; Grosberg, A. Y.; Tanaka, T.; Shakhnovich, E. I. J. Chem. Phys. 1998, 108, 334–350. (83) Karplus, M.; Weaver, D. L. Protein Sci. 1994, 3, 650–668.

JP903325Z