Protein Folding Dynamics: Application of the Diffusion-Collision Model

Computer simulations on an intermediate-resolution protein model. Anne Voegler Smith , Carol K. Hall. Proteins Structure Function and Genetics 200...
3 downloads 0 Views 586KB Size
2498

J. Phys. Chem. 1996, 100, 2498-2509

Protein Folding Dynamics: Application of the Diffusion-Collision Model to the Folding of a Four-Helix Bundle Kanthi K. Yapa† and David L. Weaver*,‡ Department of Physics, UniVersity of Ruhuna, Matara, Sri Lanka, and Molecular Modeling Laboratory, Department of Physics, Tufts UniVersity, Medford, Massachusetts 02155 ReceiVed: August 30, 1995; In Final Form: NoVember 9, 1995X

A de novo designed four-helix bundle polypeptide with a modeled three-dimensional structure has been studied by chemical kinetics and Brownian dynamics simulations as an application of the diffusion-collision model. The complementary studies give information on kinetic pathways in the folding of the peptide and on the statistical propensities and time histories of the positions and orientations of the four helices.

Introduction New techniques in molecular biology have made it possible to consider creating novel peptides of almost arbitrary length and sequence with properties that mimic or improve upon those occurring naturally. These techniques have great potential because even a relatively small polypeptide chain of 100 amino acids can be made in ≈10130 different ways from the 20 amino acids that occur naturally, and a significant number of these sequences may have a thermodynamically dominant fold.1,2 The biological activity of the newly created proteins will be mainly determined by their three-dimensional structures and internal dynamics. Because of the large numbers, it seems unlikely that experimental methods will be sufficient to establish the structure and dynamics for all of the natural and artificial proteins and peptides. Thus, one of the principle elements of a rational approach to protein engineering is the prediction of the properties of a protein from its amino acid sequence. This is expected to be possible because it has been shown3 for a number of proteins that the native structure can fold spontaneously under appropriate conditions without any information other than that contained in the linear sequence of amino acid residues. Our concern in this paper is with the dynamical aspect of the folding problem, that is, the understanding of the mechanism by which a denatured protein folds to the native conformation in solution or in vivo. To understand protein folding requires information on how the peptide elements that make up the protein interact dynamically to produce the native folding pattern. A search problem arises in folding because the correct fold requires long-range interactions among the amino acids. Simple statistical arguments, often termed the Levinthal paradox,4,5 show that randomly searching through all possible structures of even a moderately sized protein takes far too long. A possible mechanism by which a protein may make use of more sophisticated search procedures is the diffusion-collision model6,7 in which fluctuating quasiparticles, called microdomains, which may be incipient secondary structures (Rhelices, β-sheets) or hydrophobic clusters move via diffusion and collide with other microdomains. Coalescence may occur between transiently folded and correctly oriented microdomains, and folding then proceeds as a series of coalescence steps, the exact kinematical behavior being determined by the properties of the microdomains and the barriers involved in coalescence. †

University of Ruhuna. Tufts University, Medford, MA 02155. X Abstract published in AdVance ACS Abstracts, January 15, 1996. ‡

0022-3654/96/20100-2498$12.00/0

The available experimental and theoretical evidence for the diffusion-collision model has recently been reviewed7 and supports many aspects of the model. In what follows, we apply the diffusion-collision model to an artificial four-helix bundle protein. Four-helix bundles are a well-defined motif in several naturally occurring proteins, e.g., myohemerythrin,8 hemerythrin,9 and designed proteins.10,11 In previous work,12 we discussed a general folding algorithm and illustrated some aspects of the algorithm with a model-built fourhelix bundle of 73 residues based on the Regan-DeGrado sequence.10 In this paper, we extend discussion of the model with full details of the chemical kinetics simulation and with a Brownian dynamics simulation of the statistical aspects of the folding kinetics, using a simplified representation of the amino acids in order to extend the time scale of the simulation to 1 µs. The artificial four-helix bundle of Regan and DeGrado10 was chosen to represent this structural motif because it is small enough (73 residues) to be used in the 1 µs Brownian dynamics simulation and because it has no complications (e.g., disulfide bridges) in the amino acid sequence. The two methods of studying the kinetics of folding are complementary. Below, we outline pertinent aspects of the diffusion-collision model. The diffusion-collision model6,7,13 views the protein as composed of several parts (elementary microdomains), each short enough for all conformational alternatives to be searched through rapidly, as compared with the time scale of the entire folding process. Thus, the existence of microdomains in the folding polypeptide chain and their importance in the kinetics of folding provide a way for the folding protein to avoid examining the entire set of conformational alternatives, and the small size of the elementary microdomains implies that their structure, though accessible by random events, is only marginally stable. The microdomains move diffusively under the influence of internal and random external forces, and microdomain-microdomain collisions take place. Collisions sometimes lead to coalescence into microdomain pairs and higher aggregates (higher-order microdomains or subdomains). This can occur if both microdomains have formed at least part of their secondary structure (i.e., that in the contact region), and the collision involves an appropriate orientation.14 Overall folding would consist of a series of steps leading to a conformation with portions of the backbone close to the native conformation. It is possible that this stage of folding leads to the molten globule state,15-17 where some secondary structural elements are loosely associated. The final step of the folding process would then be the formation of the exact tertiary structure, including close © 1996 American Chemical Society

Protein Folding Dynamics of a Four-Helix Bundle

J. Phys. Chem., Vol. 100, No. 7, 1996 2499

packing of the side chains. The detailed dynamics and whether one, a few, or many pathways are important are expected to depend on the properties of the microdomains for the protein under consideration. Consequently, it is useful for a semiquantitative analysis of protein folding dynamics to concentrate on the microdomains as the basic entities. This has made it possible to develop a scheme for calculating the folding pathways and the time course of species formed during the folding reaction. In this respect, the diffusion-collision model differs from other descriptions of folding, which are generally limited to the pictorial aspects of the folding process (see ref 7 for a discussion of other descriptions of folding). Two methods have been used to calculate the diffusioncollision dynamics involved in the folding of peptides and proteins, the chemical kinetics method and the Brownian dynamics method. In the next section, the chemical kinetics method is described and applied to the folding of the four-helix bundle protein. This is followed by the details of the simplified representation used for the Brownian dynamics simulation and the results of that simulation. The results are discussed in the final section. Chemical Kinetics Method In the chemical kinetics approach to the diffusion-collision model, the dynamics of folding is simulated by a set of diffusion equations that describe the motion of the microdomains in aqueous solution and by coupled boundary conditions that provide for their collision and possible coalescence. Detailed analyses18 showed that the diffusion-collision dynamics of a multimicrodomain protein reduces to a network of twomicrodomain steps in which the calculable rate constants depend on physical properties of microdomains (size, shape, reactivity, orientation) and the equilibrium state of the system. The folding kinetics are approximated by solving the kinetic equations that couple the elementary steps; this corresponds to applying the chemical kinetics approximation to the diffusion-collision model. The individual rate constants for the coalescence of two entities (e.g., two elementary microdomains or microdomain complexes) can be written analytically in terms of the physical parameters of the system. The general form of the forward folding time τf for microdomain-microdomain coalescence in a spherical microdomain approximation (each microdomain is assigned a specific radius as described below) is

τf )

l2 L∆V(1 - β) + D βDA

(1)

Rmax2 (1 - 9/5 + 2 - 1/56) 3 (1 - 3)

helices

residues

accessible area loss by helix

total accessible area loss

A B B C C D A D B D A C

1-16 20-35 20-35 39-54 39-54 58-73 1-16 58-73 20-35 58-73 1-16 39-54

339 (A) 348 (B) 320 (B) 316 (C) 263 (C) 273 (D) 171 (A) 180 (D) 90 (B) 95 (D) 2 (A) 2 (C)

687

(2)

where  ) Rmin/Rmax. The diffusion volume (denoted by ∆V) is 4/3π(Rmax3 - Rmin3). The parameter L has units of length. Its

636 536 351 185 4

value is

RRmax tanh[R(Rmax - Rmin)] - 1 1 1 +R ) L Rmin RRmax - tanh[R(Rmax - Rmin)]

(3)

where

R≡

(

)

λ1 + λ2 D

1/2

(4)

The quantity λ1 is the rate for the switching of a nonassociated microdomain pair from a state in which both microdomains are correctly folded and oriented for coalescence to any other state, and λ2 is the rate for the reverse process. We have β ) λ2/(λ1 + λ2). We write R ) 1/[Dtc]1/2 where 1/tc ) λ1 + λ2 is the characteristic rate of helix-coil transitions and take tc ) 1 ns. For the backward, unfolding rate 1/τb, the accessible surface area AAB buried between microdomains A and B when they coalesce is used, as well as the free energy change per unit area f appropriate for this surface area measure.19,20 The parameter ν determines the degree to which the folding process goes to completion. The equation for τb is

τb ) ν-1ef(AAB/kBT)

(5)

The coupled first-order rate equations for the probabilities of each microdomain or multimicrodomain complex (corresponding to concentrations in unimolecular processes) pi(t) of the possible intermediate states are7,18

dpi dt

The inner boundary for diffusion is the spherical shell of closest approach (governed by the van der Waals envelopes of the microdomains). In this approximation it has a radius which is the sum of the radii of the microdomains, denoted by Rmin. The outer boundary is the maximum radial separation of the microdomains determined by the intervening microdomains and the polypeptide chain between microdomains, denoted by Rmax. The volume available for diffusion of each microdomain pair ∆V, their relative target surface area for collisions A, their relative diffusion coefficient D, and their relative geometry parameter G are calculated for diffusion in a spherical space. The parameter l2 has units of length squared and the value

l2 )

TABLE 1: Solvent accessible Area Loss (Å2) from Helix-Helix Packing

m

) ∑Kijpj

(6)

j)1

In eq 6, Kij are the elements of the rate matrix to be determined from the model (eqs 1-5) and m is the number of independent states of the folding protein. If there are n pairwise interactions between the microdomains, then there are 2n states (including the initial unfolded state and the final folded state) and n! pathways to the final state. The forward rates in eq 6 are given by eq 1 with the appropriate parameters and the backward rates by eq 5. This method has previously been applied to study the dynamics of two microdomain13,21-23 and three microdomain18 systems and to the overall folding kinetics of apomyoglobin24 and the λ-repressor,25 as well as the initial step in the folding of cytochrome c.26 Preliminary results for the four-helix bundle folding problem have also been described.12 Since its structure is not known, a model of the designed bundle10 was built,12 starting from the known X-ray structures of hemerythrin (2MHR)8 and myohemerythrin (1HMQ A chain),9 using the homologous-model-building program COM-

2500 J. Phys. Chem., Vol. 100, No. 7, 1996

Yapa and Weaver

TABLE 2: Pairwise Interactions in the Kinetic States

POSER.27 Schematically, the sequence is HA-T1-HB-T2-HC-T3HD, where Hi is [Gly-Glu-Leu-Glu-Glu-Leu-Leu-Lys-Lys-LeuLys-Glu-Leu-Leu-Lys-Gly] and Ti is [Pro-Arg-Arg]. The helices are labeled A, B, C, and D in order in the sequence. They are considered to be the microdomains that diffuse, collide, and coalesce into higher aggregates in the diffusion-collision model. The modeling method employed produces a structure which necessarily resembles the two naturally occurring proteins, including the packing of the helices. Lacking a detailed model of early folding intermediates, it is assumed that the hydrophobic interactions observed in the “native” structure of the ReganDeGrado sequence10 also characterize the same interactions in kinetic intermediates during folding. The solvent accessible area loss from helix-helix packing (difference between the solvent accessible area of the helix without and with the other helix in the folded conformation) is given in Table 1 (by helix) for all the possible pairings. For calculation, we take the four largest helix-helix interactions, as determined by area loss, namely, AB, BC, CD, and AD. This leads to 16 states of microdomain aggregates for the folding protein, including the initial, completely unfolded state (state 1) and the completely folded state (state 16), as shown in Table 2, which also shows the helixhelix contacts in each state and the binary number associated with the state. The pairings lead to 13 different microdomains. The radii of the 13 microdomains are calculated in a spherical approximation by summing the van der Waals volumes of the individual atoms (extended atom radii), multiplying by 3x2/π, the packing ratio for spheres in tetrahedral packing (as an approximation to the actual irregular packing), and calculating the radius of a sphere with the same volume. The diffusion constant D is defined as D ) kBT/6πηrx, where η is the viscosity at absolute temperature T and rx is the radius of the microdomain. The temperature is 293 K. The radii, whose sum gives the minimum distance between microdomain centers (Rmin), are given in Table 3 along with the microdomain masses, diffusion constants, and the accessible surface area28 using a 1.4 Å probe radius. The folding kinetics of the four-helix bundle is taken to be the transitions between the 16 states which occur with the formation or breaking of only one helix-helix contact. These transitions are given in the first column of Table 4 (e.g., 2 f 6 means the transition between state 2 and state 6). Table 4 also gives the microdomains in the initial state, the helix-helix interaction causing the transition, the final state microdomains, and the pair of microdomains that coalesce in the folding

TABLE 3: Properties of Microdomainsa microdomain

radius r (Å)

mass (amu)

D (Å2/ns)

accessible area (Å2)

A B C D AB AD BC CD ABC ABD ACD BCD ABCD

9.07 9.06 9.06 9.08 11.42 11.43 11.42 11.43 13.07 13.08 13.08 13.08 14.39

1702 1700 1700 1716 3402 3418 3400 3416 5102 5118 5118 5116 6818

36.19 36.22 36.22 36.15 28.73 28.71 28.75 28.72 25.10 25.09 25.09 25.10 22.80

1846 1864 1834 1858 3023 3353 3062 3156 4221 4373 4649 4254 5089

a

The calculation of the parameters is described in the text.

reaction (this pair would dissociate in the corresponding unfolding process). Also in Table 4 are the descriptions of the relative diffusion coefficient, the minimum distance between microdomains (Rmin), and the maximum distance between microdomains (Rmax). As illustrated in Table 4, Rmax between two microdomains is the fully extended length of the connecting chain between them taken as 3.5 Å times the number of intervening coil residues (CIJ ) 10.5 Å for all cases considered) plus the distances on each end to the two microdomain centers. If there are helical microdomains along the connecting chain, their contribution to the length is taken to be the diameter of the microdomain in the spherical approximation (Table 3). For transitions involving multimicrodomain clusters, the distance Rmax is taken as the shortest chain between the interacting microdomains. For example, in the transition from state 2 (BC-AD) to state 4 (B-ACD) in which the microdomains AD and C coalesce due to the CD pairing, the maximum separation of the AD and C microdomains (center-to-center distance) is the radius of AD (rAD) plus the radius of C (rC) plus the length of the chain between C and D (CCD, which is 10.5 Å). In the four-helix bundle folding, there are also “internal” transitions involving the association of microdomains that are in the same multimicrodomain complex. This means that the complex must retain some flexibility (molten globulelike) in order for the transition to take place. It also means that Rmax must be modified from the example above to apply to this class of folding transitions. The time scale for the internal transitions will depend on the geometry of the multimicrodomain complex. We take for an estimate of the folding time eq 1 where Rmin is

Protein Folding Dynamics of a Four-Helix Bundle

J. Phys. Chem., Vol. 100, No. 7, 1996 2501

TABLE 4: Helix Bundle Transitions. States, Bonds, and Parameters transition initial state MDs bond formed final state MDs MDs coalescence 1f2 1f3 1f5 1f9 2f4 3f4 2f6 5f6 3f7 5f7 2f10 9f10 3f11 9f11 5f13 9f13 4f8 6f8 7f8 4f12 10f12 11f12 6f14 10f14 13f14 7f15 11f15 13f15 8f16 12f16 14f16 15f16

A-B-C-D A-B-C-D A-B-C-D A-B-C-D B-C-AD A-B-CD B-C-AD A-D-BC A-B-CD A-D-BC B-C-AD C-D-AB A-B-CD C-D-AB A-D-BC C-D-AB B-ACD AD-BC A-BCD B-ACD C-ABD AB-CD AD-BC C-ABD D-ABC A-BCD AB-CD D-ABC ABCD1 ABCD2 ABCD3 ABCD4

AD CD BC AB CD AD BC AD BC CD AB AD AB CD AB BC BC CD AD AB CD AD AB BC AD AB BC CD AB BC CD AD

A+D C+D B+C A+B AD + C A + CD B+C A+D B + CD BC + D AD + B AB + D A+B C+D A + BC AB + C ACD + B AD + BC A + BCD ACD + B ABD + C AB + CD AD + BC ABD + C ABC + D A + BCD AB + CD ABC + D (A + B)int (B + C)int (C + D)int (A + D)int

B-C-AD A-B-CD A-D-BC C-D-AB B-ACD B-ACD AD-BC AD-BC A-BCD A-BCD C-ABD C-ABD AB-CD AB-CD D-ABC D-ABC ABCD1 ABCD1 ABCD1 ABCD2 ABCD2 ABCD2 ABCD3 ABCD3 ABCD3 ABCD4 ABCD4 ABCD4 ABCD5 ABCD5 ABCD5 ABCD5

TABLE 5: Four-Helix Bundle Transitions. Explicit Folding Parameters from Equation 1 transition

D (Å2/ns)

Rmin (Å)

Rmax (Å)

l2/D (ns)

LδV/DA (ns)

1f2 1f3 1f5 1f9 2f4 3f4 2f6 5f6 3f7 5f7 2f10 9f10 3f11 9f11 5f13 9f13 4f8 6f8 7f8 4f12 10f12 11f12 6f14 10f14 13f14 7f15 11f15 13f15 8f16 12f16 14-16 15f16

72.34 72.37 72.44 72.41 64.93 64.91 72.44 72.34 72.37 64.90 64.93 64.88 72.37 72.41 64.94 64.95 61.31 57.46 61.29 61.31 61.31 57.45 57.46 61.31 61.25 61.29 57.45 61.25 72.41 72.44 72.37 72.34

18.15 18.14 18.12 18.13 20.49 20.50 18.12 18.15 18.14 20.50 20.49 20.50 18.14 18.13 20.49 20.48 22.14 22.85 22.15 22.14 22.14 22.85 22.85 22.14 22.15 22.15 22.85 22.15 18.13 18.12 18.14 18.15

85.89 28.64 28.62 28.63 30.99 59.62 30.99 61.99 30.99 31.00 30.99 59.62 28.63 28.64 30.99 30.98 32.64 30.99 32.65 32.64 32.64 33.35 33.35 32.64 32.65 32.65 33.35 32.65 45.21 45.21 45.21 45.21

107.9 1.985 1.981 1.983 2.416 27.61 2.658 34.65 2.656 2.418 2.416 27.63 1.982 1.985 2.415 2.414 2.727 2.325 2.729 2.727 2.727 2.992 2.991 2.727 2.731 2.729 2.992 2.731 10.987 10.993 10.983 10.977

24.73 1.741 1.740 1.741 1.782 14.37 2.164 18.81 2.161 1.782 1.782 14.37 1.739 1.742 1.782 1.782 1.802 1.488 1.802 1.802 1.802 1.842 1.842 1.802 1.803 1.802 1.842 1.803 7.034 7.040 7.030 7.024

as defined for “external” transitions and Rmax is half the circumference of the multimicrodomain complex in the spherical approximation. D is the relative diffusion coefficient for the pair of microdomains involved in the internal transition. This gives an upper limit to the folding times in these cases. The internal transitions could readily take place in a molten

D (Å2/ns)

Rmin(Å)

Rmax (Å)

DA + DD DC + DD DB + DC DA + DB DAD + DC DA + DCD DB + DC DA + DD DB + DD DBC + DD DAD + DB DAB + DD DC + DD DA + DB DA + DBC DAB + DC DACD + DB DAD + DBC DA + DBCD DACD + DB DABD + DC DAB + DCD DAD + DBC DABD + DC DABC + DD DA + DBCD DAB + DCD DABC + DD DA + DB DB + DC DC + DD DA + DD

rA + rD rC + rD rB + rC rA + rB rAD + rC rA + rCD rB + rC rA + rD rB + rD rBC + rD rAD + rB rAB + rD rC + rD rA + rB rA + rBC rAB + rC rACD + rB rAD + rBC rA + rBCD rACD + rB rABD + rC rAB + rCD rAD + rBC rABD+rC rABC + rD rA + rBCD rAB + rCD rABC + rD rA + rB rB + rC rC + rD rA + rD

rA + cAB + 2rB + cBC + 2rC + cCD + rD rC + cCD + rD rB + cBC + rC rA + cAB + rB rAD + cCD + rC rA + cAB + 2rB + cBC + rCD rB + cBC + rCD rA + cAB + 2rBC + cCD + rD rB + cBC + rCD rBC + cCD + rD rAD + cAB + rB rAB + cBC + 2rC + cCD + rD rA + cAB + rB rC + cCD + rD rA + cAB + rBC rAB + cBC + rC rB + cBC + rACD rAD + cCD + rBC rA + cAB + rBCD rB + cAB + rACD rC + cCD + rABD rAB + cBC + rCD rAD + cAB + rBC rABD + cBC + rC rABC + cCD + rD rA + cAB + rBCD rAB + cBC + rCD rABC + cCD + rD πrABCD πrABCD πrABCD πrABCD

globulelike intermediate state in which the prior microdomainmicrodomain associations are loose enough for movements to occur rather than being rigidly fixed in a particular orientation by the other pairwise interactions. The parameter β in eq 1 determines the probability that, in a collision between microdomains, coalescence occurs. The β for each of the 32 transitions (see Table 6) is a product of four individual microdomain probabilities, two for each microdomain or multimicrodomain involved in the pairwise interaction. Each member of the interacting pair has an orientational probability and a folding probability. The orientation probabilities are parameters βai and βaj for microdomains i and j which are defined as the ratio of the lost accessible area (the individual helix area losses for each helix pair interaction are given in Table 1) to the total accessible area of the elementary microdomain or microdomain cluster (given in Table 3). For example, in the transition 11 f 12, which involves the coalescence AB + CD and the interaction of the pair AC, βai ) 171/3023 ) 0.0566 and βaj ) 180/3353 ) 0.0537. The orientation probabilities are given in Table 6 for the appropriate simulation runs. The folding probabilities βi and βj are the probabilities that the microdomains are correctly folded when they collide. For example, for helical microdomains, these probabilities could be helix-coil equilibrium constants which are generally small for isolated helical fragments in aqueous solution. The resulting β is a number between zero and one given by

β ) βai βaj βi βj

(7)

Since the orientational probabilities βai and βaj are inherently small (see Table 6, simulations 2 and 3), β will be much smaller than one in every transition except internal transitions. An alternative possibility and the one that we used previously12,24 is that all orientation probabilities are essentially one, because “surface” diffusion occurs rapidly after collision and loose coalescence by correctly folded microdomains (see Table 6,

2502 J. Phys. Chem., Vol. 100, No. 7, 1996

Yapa and Weaver

TABLE 6: Values of β for the Chemical Kinetics Simulations simulation 1

simulation 2

simulation 3

transition

βai βaj

βi

βj

βai

βaj

βi

βj

βai

βaj

βi

βj

1f2 1f3 1f5 1f9 2f4 3f4 2f6 5f6 3f7 5f7 2f10 9f10 3f11 9f11 5f13 9f13 4f8 6f8 7f8 4f12 10f12 11f12 6f14 10f14 13f14 7f15 11f15 13f15 8f16 12f16 14f16 15f16

1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

0.093 0.143 0.172 0.184 0.081 0.093 0.172 0.093 0.172 0.086 0.101 0.057 0.184 0.143 0.184 0.106 0.068 0.081 0.093 0.073 0.062 0.057 0.101 0.073 0.041 0.184 0.106 0.062 1.000 1.000 1.000 1.000

0.097 0.147 0.172 0.187 0.143 0.057 0.172 0.097 0.100 0.147 0.187 0.097 0.187 0.147 0.114 0.172 0.172 0.086 0.043 0.187 0.143 0.057 0.114 0.172 0.097 0.082 0.100 0.147 1.000 1.000 1.000 1.000

1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

0.093 0.143 0.172 0.184 0.081 0.093 0.172 0.093 0.172 0.086 0.101 0.057 0.184 0.143 0.184 0.106 0.068 0.081 0.093 0.073 0.062 0.057 0.101 0.073 0.041 0.184 0.106 0.062 1.000 1.000 1.000 1.000

0.097 0.147 0.172 0.187 0.143 0.057 0.172 0.097 0.100 0.147 0.187 0.097 0.187 0.147 0.114 0.172 0.172 0.086 0.042 0.187 0.143 0.057 0.114 0.172 0.097 0.082 0.100 0.147 1.000 1.000 1.000 1.000

0.01 0.01 0.01 0.01 0.1 0.01 0.01 0.01 0.01 0.1 0.1 0.1 0.01 0.01 0.01 0.1 0.5 0.1 0.01 0.5 0.5 0.1 0.1 0.5 0.5 0.01 0.1 0.5 1.0 1.0 1.0 1.0

0.01 0.01 0.01 0.01 0.01 0.1 0.01 0.01 0.1 0.01 0.01 0.01 0.01 0.01 0.1 0.01 0.01 0.1 0.5 0.01 0.01 0.1 0.1 0.01 0.01 0.5 0.1 0.01 1.0 1.0 1.0 1.0

1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

simulation 1). Further study of surface diffusion on protein surfaces is required to clarify the latter possibility. To complete the calculation of β for each forward transition, we must choose values for the folding probabilities βi and βj in eq 7, e.g., 10-2 for elementary microdomains, 0.1 for two microdomain clusters, and 1 for three or more microdomain clusters, assuming that microdomain-microdomain interactions make the coalescing microdomains more stable. Clearly, the choices for β will have a substantial effect on the probabilities of occurrence of each of the kinetic states as a function of time. The kinetic calculations mainly solving eq 6 with the microdomain parameters given in Tables 1-6, have been carried out with MATLAB, a numerical analysis software product, wellsuited to the matrix calculations required, and having the advantage of running on most computer hardware platforms.29 The required MATLAB code is available from the authors. Chemical Kinetics Simulations Since the physical parameters of microdomain size, surface area, area loss upon coalescence, and connectivity are fixed, only the parameters βi, βj, βai, and βaj whose product determines β for each of the transitions in the 24 pathways leading from the initial, completely unfolded molecule (state 1) to the final, completely folded molecule (state 16) can be adjusted to show different types of kinetic behavior. State 16 will be molten globulelike until the detailed interdigitations take place that lead to the stable folded structure. With four microdomainmicrodomain interactions, there are 24 different possible pathways between the initial state (state 1), the final state (state 16), and the 14 kinetic intermediate states. In principle, each of the intermediate reactions is reversible, and this is included in the simulations. Simulation 1. The first simulation was carried out to show the effects of the geometry of the four-helix bundle and of the physical properties of the four helical microdomains. These parameters are listed in Table 4 for the general case, and the

numerical values are given in Table 5. In this simulation, each of the individual parameters βai, βaj, βi, and βj are set to unity (see Table 6). This would be the case if each helical microdomain were completely reactive at every collision (i.e., βi, βj ) 1) and if any local rearrangement required to have the matching patches of surface area be aligned takes place on a time scale that is small compared to the diffusion time scale (i.e., βai, βaj ) 1). The results of chemical kinetic simulation 1 are shown in Figure 1a-d. In this unimolecular process, the probability of a state as a function of time is equivalent to the time-dependent concentration of a bi- or multimolecular processes. Of the one-pair states 2, 3, 5, and 9, only state 2 does not accumulate significantly as a kinetic intermediate because the AD pair in state 2 is produced more slowly than the other pairs. As discussed also below for the Brownian dynamics simulation of the simplified four-helix bundle, the two end helices, A and D, tend to remain relatively far apart because their diffusion space is larger than that allowed for the other helix-helix pairs. Among the two-pair states 4, 6, 7, 10, 11, and 13, states 4, 6, and 10 (which are partially fed from state 2) do not accumulate as kinetic intermediates. All of the threepair intermediates accumulate as kinetic intermediates in this simulation. The final state (state 16) reaches a probability of 0.5 after about 11 ns. Assuming an exponential increase for state 16, the folding time is about 14.3 ns. The kinetic results for simulation 1 are shown schematically in Figure 1e, where those intermediates that do not accumulate kinetically are shown as faint dashed boxes. Figure 1e gives a useful pictorial impression of the important kinetic pathways. Simulation 2. The β parameters for simulation 2 are listed in Table 6. All other parameters are the same as for simulation 1 as listed in Table 5. In simulation 2, the effect of the individual areas of contact on each helix at which the hydrophobic interactions between the helices takes place is included in the kinetic parameters by the β parameters βai, βaj. These parameters are the ratio of the lost accessible surface area to the total surface area (see above) and are considered orientational probabilities. It is necessary to include orientational probabilities in the kinetics if the helices must be correctly oriented when a collision takes place for coalescence to occur. The results of simulation 2 are shown in Figure 2a-d. Because of the inclusion of the orientational probabilities, the time scale of folding has increased. In this simulation, state 16 accumulates to a 0.5 probability after 175 ns and the folding time, assuming an exponential increase, is 223 ns. Again, one-pair states 3, 5, and 9 accumulate as kinetic intermediates followed by twopair kinetic intermediates states 7, 11, and 13. In this simulation, there is no holdup at the three-pair intermediates, with no significant accumulation in states 8, 12, 14, and 15. The results of simulation 2 are further illustrated in Figure 2e, where again the nonaccumulating intermediates are shown in light dashed boxes. Simulation 3. The β parameters for simulation 3 are listed in Table 6. All other parameters are the same as for simulation 2 as listed in Table 5, including the orientational probabilities. In simulation 3, the effect of the individual areas of contact on each helix at which the hydrophobic interactions between the helices takes place is included in the kinetic parameters as in simulation 2. In addition, specific values have been chosen for the folding probabilities βi, βj. In this case, the chosen folding probability parameters are βi ) 1 × 10-2 for each of the elementary helical microdomains A-D. This means that the individual helices are helical on average 1% of the time when they are not participating in an interaction with another helix. For each interacting pair of microdomains AB, BC, CD, and AD, and βi are 0.1 because it is assumed that a favorable helix-

Protein Folding Dynamics of a Four-Helix Bundle

J. Phys. Chem., Vol. 100, No. 7, 1996 2503

Figure 1. (a-d) Diffusion-collision model calculation of the kinetic probabilities as a function of time of the 16 states (described in Table 2) in the folding kinetics of the four-helix bundle protein. The values for the individual components of β are listed in Table 6 under simulation 1. (e, bottom structure) Schematic view of the 16 states (described in Table 2) and the 24 native pathways in the diffusion-collision model folding kinetics of the four-helix bundle protein using the individual components of β listed in Table 6 under simulation 1. The states which accumulate kinetically to a probability of 0.05 or more are shown as dark outline boxes. The states which accumulate to less than 0.05 probability are shown as dashed outline boxes.

helix interaction will stabilize a correct microdomain pairing. Thus, the helix pairs are coalesced 10% of the time when they are not participating in another interaction. Microdomains with two pairs of interacting helices have βi ) 0.5 to reflect the greater stability of an additional correct pairing. In the kinetic results of simulation 3, the half-time of state 16 is 2.59 × 105 ns and the folding time is 3.36 × 105 ns. In this simulation, only the one-pair intermediates 3, 5, and 9 accumulate kinetically and even these states have substantiality lower probabilities than the corresponding states in simulations 1 and 2. The results of simulations 1-3 demonstrate the effect of different diffusion-collision parameters on the time scale of folding and on the accumulation of kinetic intermediates. In this work, since the helices are identical, it is sensible to have the folding probabilities for each helix be the same. In naturally occurring proteins, the individual microdomains are generally different and their folding probabilities would also be different. This would mean that some kinetic intermediates and kinetic pathways become relatively more important than others. The

multiple pathways shown in Figures 1 and 2 are a result of the similar physical properties of the four helical microdomains, the assumption of similar stability for each of them and the central role played by the microdomains and the inter-microdomain interactions in diffusion-collision folding dynamics. Multiple equivalent pathways suggest the “jigsaw puzzle” model of folding,30 a possible realization of the diffusion-collision model when microdomain stabilities and other physical properties are about the same for all microdomains.7 In most cases in real proteins, the properties of individual microdomains and microdomain cluster will be different and not all pathways will be equally probable. Brownian Dynamics Method The early steps in folding can also be studied using a stochastic simulation of the Brownian motion of the polypeptide chain in a solvent environment. We used a 73 residue polypeptide chain which has 4 helices, each with 16 residues,

2504 J. Phys. Chem., Vol. 100, No. 7, 1996

Yapa and Weaver

Figure 2. (a-d) Diffusion-collision model calculation of the kinetic probabilities as a function of time of the 16 states (described in Table 2) in the folding kinetics of the four-helix bundle protein. The values for the individual components of β are listed in Table 6 under simulation 2. (e, bottom structure) Schematic view of the 16 states (described in Table 2) and the 24 native pathways in the diffusion-collision model folding kinetics of the four-helix bundle protein using the individual components of β listed in Table 6 under simulation 2. The states which accumulate kinetically to a probability of 0.05 or more are shown as dark outline boxes. The states which accumulate to less than 0.05 probability are shown as dashed outline boxes.

and 3 connecting loops each of 3 residues. Each amino acid in the polypeptide chain is replaced with a soft sphere with the center at the Cβi position of the ith residue31 and a volume corresponding to valine, which determines the radius of 3.25 Å. The spheres are linked by virtual bonds at their centers.31-33 The choice of a single amino acid (valine) throughout the chain makes the simulation simpler and less time consuming. The motion of the residues is largely determined by the time variation of the nonbonded interactions (interactions between residues which are three or more residues apart). These interactions produce forces that act to speed up or slow down the motion of a residue in a given direction. The diffusion equation, which is used in the chemical kinetics simulations and which provides a continuum approximation to the motion of the microdomains, is replaced by the discrete, coupled Langevin equations of motion for the residue spheres which interact according to an effective potential energy function; the effect of solvent is included in the potential and in the frictional drag and random force terms of the Langevin equation. These equations can be

solved numerically34 to treat the full folding problem or to obtain information related to the chemical kinetics approximation. The simplified character of the model permits rapid calculation of the chain motion and allows the study of dynamical events in the nano- to microsecond time range, which is of great interest in the early stages of protein folding. While such a model cannot accurately represent the detailed atomic interactions, it can provide an approximate description of the overall structure and flexibility of polypeptides with less specific side chain interactions, e.g., steric and hydrophobic interactions.19,20,35 The method has been applied in simulating the helix-coil transition of an R-helical33,36 microdomain, the sheet-coil transitions of a β-hairpin37 microdomain, and the diffusion and collision of two R-helices (microdomains) connected by a coil segment.38 Polyvaline was chosen as the model for the four-R-helix bundle, both because valine represents a simple hydrophobic residue and because this provides consistency with the previous studies of R-helices using a simplified representation.33,38,39 This type of representation by a simplified model is based on the

Protein Folding Dynamics of a Four-Helix Bundle

J. Phys. Chem., Vol. 100, No. 7, 1996 2505

Figure 3. Probability distributions of the six helix pair center-to-center distances in the Brownian dynamics simulation of the four-helix bundle.

earlier work of Flory.32 The spheres are linked by virtual bonds and bond angles with harmonic restoring potentials. The virtual dihedral angle φi is defined in terms of the centers of residues i-3, i-2, i-1, and i (φi ) 0 for the eclipsed conformation) and is associated with an intrinsic torsional potential. Interactions between residues separated by three or more virtual bonds are given by central force potentials that represent excluded-volume and net attractive effects in aqueous surroundings. Additional torsional potentials are used to stabilize R-helical secondary structures, corresponding to backbone hydrogen bonds between residues in a full representation. The energy function is chosen to approximate the potential of mean force which corresponds to a thermal average over polypeptide degrees of freedom that are omitted in the model and over solvent degrees of freedom. The total energy E describing the polypeptide chain can be written as the sum of six terms33,36-38

E ) Eb + Eθ + Eφ + Eev + Esol. + ER

(8)

representing the bond (Eb), bond angle (Eθ), dihedral angle (Eφ), excluded volume (Eev), van der Waals and solvent (Esol.), and helix (hydrogen-bonding) torsional (ER) potential energies. The equilibrium bond lengths and bond angles were calculated by reference to a detailed atomic model with standard R-helix geometry.40 The equilibrium bond length and bond angle are b0 ) 5.14 Å and θ0 ) 87.2°. The dihedral angle for the R-helical residues is φi ) 38.3°. Further details of the potential function are given in ref 38. The motion of a residue is largely determined by the time variation of its nonbonded interactions with neighboring residues. These interactions produce forces that act to speed up or slow down the motion in a given direction. In the simulations, the hydrodynamic interactions among residues are not included; the diffusion tensor is diagonal and constant. The form of the Langevin equation which describes the motion of residues in

the diffusive limit is

db ri dt

D D ∂V + )B R kBT ∂b r i kBT i

(9)

D is the diffusion constant, b ri is the position vector of the ith residue, and B Ri represents the random force exerted on the ith residue by the surrounding water molecules. The Langevin equations are integrated using the algorithm of Ermak and McCammon.34 This yields the position of the ith residue at time t + ∆t as

r i(t) b r i(t + ∆t) ) b

D ∂ V(b r i(t))∆t + B χi kBT ∂b ri

(10)

where B χi is the displacement due to the stochastic force. The quantities on the right-hand side of eq 10 are evaluated at the beginning of the time step. The time step ∆t must be chosen to be small enough so that the systematic forces on the residues do not change appreciably during the time of a step. This requires a value for ∆t of no greater than 0.04 ps. It is adjusted automatically to a smaller value during a simulation if the potential energy exceeds a chosen value. Furthermore, in doing the integration to obtain eq 10, it is assumed that the momentum relaxation time is much shorter than the position time scale so that particle momenta are characterized by their equilibrium distribution function. The simulation was carried out for a total time of 1000 ns (1 µs). The parameters were chosen so that a stable conformation of the polypeptide chain does not form at any time during the simulation. Hence, the helices may collide many times. Centerto-center distances between helices and their relative orientations were recorded every 500th step. The chain began in an extended conformation with the loop sections fully extended. Although all dihedral angles were allowed to vary throughout the

2506 J. Phys. Chem., Vol. 100, No. 7, 1996 simulation, the four helical segments retained their shape due to the helix stabilization energy (the end residues of the helices showed some variations). This is consistent with the diffusioncollision model which suggests that the instantaneous structure in the microdomains is important to the formation of kinetic intermediates and allows the polypeptide chain to pack in only a limited number of ways. The bond and dihedral angles of the loop residues (17-19, 36-38, and 55-57) vary significantly over all the angular range throughout the simulation. The probability distributions of the helix-helix center-tocenter distances are shown in Figure 3 for the complete simulation. All the nearest neighbor pairs of helices (AB, BC, and CD) spend a significant part of their time in the coalescence range of center-to-center distances of Rij e 16 Å.38 Because of the flexibility of the loops, the helix pair AD also can be within coalescence range (Figure 3d) but with a smaller probability due to the many other available conformations in their larger relative motion diffusion space. The nearest neighbor pairs begin with a center-to-center distance of 36.1 Å and the AD pair at 104.7 Å separation. It is clear from Figure 3a,f that the length of the simulation is not sufficient for the chain to sample all conformational possibilities. The results for Rab and Rcd are qualitatively similar in shape but differ in detail. Since the residues are spherically symmetrical in the Brownian dynamics approximation, a sufficiently long simulation would show identical probability distributions for these distances, but such is not the case, as seen in Figure 3. The overall behavior of the chain can also be seen in the distance between the centers of the first and last residues (Rete). The probability distribution of Rete is similar to that of Rad (see Figure 3c) with peaks at about the same locations. The time history of Rete shows the wide-ranging behavior of the chain ends. The ends of the polypeptide chain remain fairly close together for significant periods of time, indicating a relatively collapsed chain, and at other periods of time the ends are far apart, indicating that the chain is quite extended. Figure 4 shows the distributions of center-to-center distances making up the prominent two pair states as observed in the chemical kinetics simulations (see Figures 1 and 2). The figures correspond to the two-pair states 7 (Figure 4a), 11 (Figure 4b), and 13 (Figure 4c). The horizontal and vertical dashed lines at 16 Å separate the correlated distance spaces into folded (lower left-hand corner) and unfolded (the rest of the space) two-pair states. As Figure 4 illustrates, some of the conformational possibilities correspond to two-pair kinetic intermediates in the chemical kinetics folding simulations with state 11 (the pairs AB and CD formed) being somewhat more probable. In the Brownian dynamics simulation, 33.6% of the observed conformations had Rab e 16 Å, 17.5% had Rbc e 16 Å, and 18.9% had Rcd e 16 Å; 4.7% of the conformations had both Rab and Rbc e 16 Å, 7.12% had both Rab and Rcd e 16 Å, 2.04% had both Rbc and Rcd e 16 Å (as shown in Figure 4), and 0.47% had Rab, Rbc, and Rcd e 16 Å. The most difficult pair to find within folding range is the AD pair. During the simulation 0.14% of the observed conformation had Rad e 16 Å. But, none of those states had all of the other three center-to-center distances within the assigned folding distance of 16 Å. From the conformations with the AB, BC, and CD nearest-neighbor pairs within folding distance, the closest pair A and D that came together was 22.3 Å at 434 ns. A schematic view of this conformation and the path leading to it are shown in Figure 5. In the path, the BC pair forms first at 408 ns, followed by the CD pair at 430 ns. The A helix folds onto the B helix at 434 ns and is fairly close to D at that time. In a real polypeptide, the stabilization of partially folded conformations relative to the fully unfolded chain would extend the time for D and A to

Yapa and Weaver

Figure 4. Joint distributions of the three most prominent helix-helix pairs in the chemical kinetics simulations and in the Brownian dynamics simulations. The dashed lines at 16 Å delimit the region of folded distances (the lower left-hand corner) for each of the dual pairs.

Figure 5. Schematic view using Molscript45 of the path leading to the structure at 434.58 ns for the four-helix Brownian dynamics simulation. The D helix is shown darkest. The arrows indicate collision of the indicated helix-helix pair.

come together. It is clear that the chain flexibility allows the two end helices to be close together, but the wide range of alternative conformations makes it unlikely. Thus, the observation of N-terminus-C-terminus interactions early in folding, as in cytochrome c41 and myoglobin,42,43 indicates that the physical and chemical properties of individual helices are of great importance in determining the folding pathways, since they can make more probable pathways through kinetic intermediates that are otherwise very unlikely. For example, if the AD pair were more stable than the other pairs (AB, BC, and CD), the

Protein Folding Dynamics of a Four-Helix Bundle

J. Phys. Chem., Vol. 100, No. 7, 1996 2507

Figure 6. Time histories of the four main helix-helix center-to-center distances in the Brownian dynamics simulation. In a, the solid line is Rab and the dotted line is Rad. In b, the solid line is Rbc and the dotted line is Rcd. The horizontal dashed line is at 16 Å, the center-to-center contact distance described in Lee et al.38

stability could overcome the statistical preference for conformations in which the end helices are not close in space and pathways through AD could dominate the folding of the fourhelix bundle. In naturally occurring four-helix bundles,8,9 the four individual helices are not identical and particular pairs will be more important than those found from conformational statistics alone. The time histories of the four main helix-helix center-tocenter distances are shown in Figure 6. This figure gives a good view of the times during which pairs of helices are around their folding contact distance of 16 Å. The higher mobility of the end helices A and D is apparent in Figure 6a (dotted line) with Rad exploring much of the available conformational space. The three other center-to-center distances are more restrained by their position in the chain, all having less mobility than Rad and all spending a significant amount of time in the Rij e 16 Å region, considered the upper limit of folding contact for this amino acid representation.38 We can further quantify the dynamical behavior of the residues by calculating the autocorrelation function CAA(τ) that shows how the value of some dynamical parameter A at time t is correlated with the value of A at time t + τ:

CAA(τ) ) 〈A(t) A(t + τ)〉

(11)

where the average is taken, in principle, over all times t. For a trajectory (time series) calculated at N discrete time steps ti, eq 11 is approximated by

1 N-k

∑ A(ti) A(ti + tk)

CAA(tk) )

1

CAB(τ) ) 〈A(t) B(t + τ)〉

(13)

Equation 13 is approximated by

CAB(tk) )

A(×i) B(ti + tk)

1 N-k



[∑

N i)1 1

N

N i)1

A(ti)2

1

N

∑B(ti)2

N i)1

]

1/2

(14)

Note that CAB(τ) is not necessarily equal to 1 at τ ) 0. In the discussion below, the dynamical parameters used will be taken to be the fluctuations relative to the mean values. The autocorrelation cross-correlation functions for the centerto-center distances of the three helix-helix pairs that contribute significantly to kinetic intermediates in the chemical kinetics simulations are shown in Figure 7. Figure 7a shows the autocorrelation of the three pairs Rab, Rbc, and Rcd, which relax on a 20-50 ns time scale, somewhat more than an order of magnitude longer than the autocorrelations of individual residues found in previous Brownian dynamics simulations.36,37 The cross correlations (Figures 7b-d) are generally small, indicating that the formation of a helix-helix pair does not contribute substantially to an incresaed probability of formation of another helix-helix pair, although in the two cases of pairs sharing a helix, namely, 〈Rbc,Rcd〉 (Figure 7b) and 〈Rab,Rbc〉 (Figure 7d), there is some indication of a small lagging correlation. Discussion

N i)1

N

A at time t is correlated with the value of B at time t + τ:

(12)

∑A(ti)2

N i)1

the denominator normalizes the autocorrelation function to unity at τ ) 0. The probability of two-pair state formation can be examined further by calculating the cross-correlation function CAB(τ) that shows how the value of some dynamical parameter

Two complementary methods of simulating aspects of the folding of a four-helix bundle peptide have been presented in this paper. Each method involves a simplification of the representation of the polypeptide chain in order to facilitate calculating the time course of folding events. The methods are complementary because the Brownian dynamics simulation gives insight into the relative positions and orientations of the four helices as they move in possible folding pathways and into and out of possible kinetic intermediate states, information which

2508 J. Phys. Chem., Vol. 100, No. 7, 1996

Yapa and Weaver

Figure 7. Autocorrelation and cross-correlation functions for the three most prominent helix-helix pairs in the chemical kinetics simulations and in the Brownian dynamics simulations. In a, the solid line is the autocorrelation of Rab, the dotted line is the autocorrelation of Rbc, and the dashed line is the autocorrelation of Rcd. In b, the solid line is the cross correlation 〈Rbc,Rcd〉, and the dashed line is the cross correlation 〈Rcd,Rbc〉. In c, the solid line is the cross correlation 〈Rab,Rcd〉, and the dashed line is the cross correlation 〈Rcd,Rab〉. In d, the solid line is the cross correlation 〈Rab,Rbc〉, and the dashed line is the cross correlation 〈Rbc,Rab〉.

has been suppressed by the spherical approximation employed in the chemical kinetics calculations. The two methods are related to the two distinct aspects to the folding together (coalescence) of a pair of helices. The first aspect is the statistical one mentioned above which determines the most likely relative positions and orientations of the folding pair. This information is contained in the Brownian dynamics simulations. The second aspect is the propensity of a pair of marginally stable helices to coalesce upon collision into a multimicrodomain kinetic intermediate, the likelihood of this event being measured by the parameter β in the chemical kinetics calculations. The chemical kinetics simulations have confirmed and expanded on previously reported work on the four-helix bundle,12 and the Brownian dynamics simulation gives insight into the statistical tendencies of the relative motion of the helices. The chemical kinetics simulations (Figures 1 and 2) show that, other factors being equal, the positions of microdomains in the chain, in this case the four helices, are important in determining the significance of pathways through particular states. For example, the one- and two-pair states with the AD pairing (states 2, 4, 6, and 10) have almost no accumulation as kinetic intermediates because the AD pair forms much more slowly than the other three pairs of helices (see Table 5). This result is also apparent in the Brownian dynamics simulation. Figures 3c and 6a show, respectively, the probability distribution and the time history of Rad, indicating that the two end helices come together rarely (compared to the other helices) on the time scale of 1 µs. Of course, “other factors” are often not equal in proteins and one would expect the other factors to dominate the statistical preferences in many cases of folding proteins, cytochrome c26,41 and myoglobin42,43 being examples, as mentioned above. This behavior was illustrated in a chemical kinetics diffusion-collision calculation on a hypothetical three microdomain peptide.7 In this work, the effect of an enhanced β for the coalescence of the two end microdomains relative to the βs for the two nearest neighbor pairs was shown to dominate the early kinetics and allowed the kinetic intermediate involving the two end microdomains to appear first.7 Calculations of this type in which the propensity for coalescence is altered show that the folding pathways in any particular protein will be determined, at least in part, by the properties of individual

microdomains. This means that modification by mutation or chemical methods of likely folding microdomains in globular proteins should be able to alter the time scale of some folding pathways and the probability of occurrence of the related kinetic intermediates. This would give useful information on the mechanism of folding for the particular protein. Other proteins, even in the same family, might have different important pathways and intermediates. This is an important consequence of a microdomain-dependent folding mechanism as in the diffusion-collision model. It will be important to investigate experimental systems with several microdomains to determine the kinetic pathways and their dependence on the microdomains of the system. Experiments using NMR and other physical techniques on various lysozymes and homologues are in progress to examine the dependence of the folding pathways in the helical domain on the amino acid sequence.46 The results of these experiments should provide the quantitative information necessary for applications of the diffusion-collision model. Another aspect of the diffusion-collision chemical kinetics simulations is the characteristic expansion and contraction of the folding complexity (as measured by available kinetic intermediates and possible pathways) as folding proceeds in time. This is well illustrated by Figure 1e, which shows the chain proceeding from a single state (state 1) through the four one-pair states (states 2, 3, 5, and 9), each of which has three paths to the two-pair states (states 4, 6, 7, 10, 11, and 13), each of which has two paths back to the one-pair states and two paths forward to the three-pair states. The three-pair states (states 8, 12, 14, and 15) each have one path forward to the final state (state 16) and three paths back to the two-pair states. By this measure of complexity, the folding possibilities appear as two funnels with their larger ends placed together and their smaller ends at states 1 and 16. This dual funnel complexity is merged with the reduction in diffusion space (corresponding to a more compact structure with fewer conformations) available for conformational adjustments as folding proceeds. Because proteins are quite large molecules, the folding kinetics can show the dual funnel behavior even as folding proceeds and the structure becomes more compact (smaller radius of gyration). The results of the Brownian dynamics simulation (Figures 3-7) show the statistical tendencies of the four helices as they

Protein Folding Dynamics of a Four-Helix Bundle move in response to the potential energy function (eq 8) and the random force of solvent. Because of the length of the simulation (1 µs), only the fully extended starting conformation was used. In a collection of polypeptides of the same type, many different conformations would be present when folding commences and the initial contacts between helices would, on average, occur more quickly than observed in the simulation. In the chemical kinetics simulations, an average over the available diffusion space is used to calculate the coalescence rates. An interesting feature of the Brownian dynamics simulation is the absence of correlation among the positions of the helix centers. This is well-illustrated in Figures 6 and 7. For example, in Figure 6b, one sees that Rbc and Rcd are just as often far apart in size as they are nearly the same in size and their values as often move in opposite directions as in the same direction. This is explicitly quantified in Figure 7 which shows the lack of cross correlation among the pair center-to-center distances. These results indicate that the microdomains tend to move independently, an implicit assumption in the chemical kinetics calculations. When the helices are in contact (Rij e 16 Å) in the Brownian dynamics simulation, one or more pairs of residues in the two helices will be touching (an interresidue separation of 6.5 Å or less). Strong contacts, as required in kinetic folding intermediates, will lead to a number of contacts. For example, at 252.8 ns in the simulation, helices A (residues 1-16) and B (residues 20-35) were within the 16 Å folding range. At that time, the interresidue contacts were 1-34, 5-27, 5,30, 8,27, 12-20, 1223, and 15-20, a total of seven contacts among five residues in each helix. The contacts correspond to a substantial solvent accessible surface area loss compared to the starting conformation of the chain, and thus, a large hydrophobic interaction, as required for folding stability. The methods described in this paper illustrate two stages in an algorithm for solving the protein structure prediction problem mentioned in the Introduction. The first stage is the modelbuilding of a possible structure from the sequence of the polypeptide10 and the homology of known four-helix bundles.8,9 This stage produces, in principle, the possible microdomains and allows the calculation of their physical properties. It is also necessary to have some idea of their hydrophobic interaction if the relative orientation of microdomains upon coalescence is important (i.e., βai and βaj in eq 7 are important; see discussion below eq 7). The next stage is the dynamics of folding of the microdomains to the loosely associated state at the end of the kinetic folding pathway (e.g., molten globule state). The last stage would be energy minimization and molecular dynamics on a full representation of the polypeptide chain. This stage represents the relatively small movements necessary for close association of the hydrophobic core. In this regard, Schneller et al.44 have recently developed an electrostatic multipole representation (EMR) of a polypeptide chain which allows for scaling up from a reduced to an all-atom representation at any stage in a calculation. The EMR could be used in stage 2 above in reduced form for computational speed and then scaled up for the third stage involving detailed interdigitation of residues. In addition, the increased speed of EMR simulations will allow the inclusion of effects of the marginal stability of the microdomains. Previous studies36,37 have shown that the characteristic times of sheet-strand and helix-coil transitions are in the subnanosecond range, allowing many transitions in a microsecond time scale. With increased computing power and new calcu-

J. Phys. Chem., Vol. 100, No. 7, 1996 2509 lational methods, improvements in the protein structure prediction problem are likely in the near future. Acknowledgment. The authors would like to thank Martin Karplus for providing the intellectual framework upon which this work was carried out. References and Notes (1) Chan, H. S.; Dill, K. A. Proc. Natl. Acad. Sci. U.S.A. 1990, 87, 6388-6392. (2) Shakhnovich, E. I.; Gutin, A. M. Nature 1990, 346, 773-775. (3) Anfinsen, C. B. Science 1973, 181, 223-230. (4) Levinthal, C. J. Chim. Phys. 1968, 65, 44-45. (5) Levinthal, C. Sci. Am. 1966, 214, 42-52. (6) Karplus, M.; Weaver, D. L. Nature 1976, 260, 404-406. (7) Karplus, M.; Weaver, D. L. Protein Sci. 1994, 3, 650-668. (8) Sheriff, S.; Hendrickson, W. A.; Smith, J. L. J. Mol. Biol. 1987, 197, 273-296. (9) Stenkamp, R. E.; Sieker, L. C.; Jensen, L. H. Acta Crystallogr. 1983, B39, 697. (10) Regan, L.; DeGrado. Science 1988, 241, 976-978. (11) Hecht, M. H.; Richardson, J. S.; Richardson, D. C.; Ogden, R. C. Science 1990, 249, 884-891. (12) Yapa, K.; Weaver, D. L. Biophys. J. 1992, 63, 296-299. (13) Karplus, M.; Weaver, D. L. Biopolymers 1979, 18, 1421-1437. (14) Bashford, D. J. Chem. Phys. 1986, 85, 6999-7010. (15) Ohgushi, M.; Wada, A. FEBS Lett. 1983, 164, 21-24. (16) Ptitsyn, O. B. J. Protein Chem. 1987, 6, 272-293. (17) Kuwajima, K. Proteins 1989, 6, 87-103. (18) Weaver, D. L. Biopolymers 1984, 23, 675-694. (19) Richmond, T. J.; Richards, F. M. J. Mol. Biol. 1978, 119, 537555. (20) Chothia, C. Nature 1974, 248, 338-339. (21) Weaver, D. L. J. Chem. Phys. 1980, 72, 3483-3485. (22) Weaver, D. L. Biopolymers 1982, 21, 1275-1300. (23) Bashford, D.; Weaver, D. L. J. Chem. Phys. 1986, 84, 2587-2592. (24) Bashford, D.; Cohen, F. E.; Karplus, M.; Kuntz, I. D.; Weaver, D. L. Proteins 1988, 4, 211-227. (25) Bashford, D.; Weaver, D. L.; Karplus, M. J. Biomol. Struct. Dynam. 1984, 1, 1243-1255. (26) Bashford, D.; Karplus, M.; Weaver, D. L. In Protein Folding: Deciphering the Second Half of the Genetic Code; Gierasch, L. M., King, J., Eds.; American Association for the Advancement of Science: Washington, D.C., 1990; pp 283-290. (27) Sutcliffe, M. J.; Haneef, I.; Carney, D.; Blundell, T. L. Protein Eng. 1987, 1, 377-384. (28) Lee, B.; Richards, F. M. J. Mol. Biol. 1971, 55, 379-400. (29) MATLAB Reference Guide; The Mathworks, Inc.: 1992. (30) Harrison, S. C.; Durban, R. Proc. Natl. Acad. Sci. U.S.A. 1985, 82, 4028-4030. (31) Levitt, M. J. Mol. Biol. 1976, 104, 59-107. (32) Flory, P. J. Statistical Mechanics of Chain Molecules; Wiley: New York, 1969. (33) McCammon, J. A.; Northrup, S. H.; Karplus, M.; Levy, R. M. Biopolymers 1980, 19, 2033-2045. (34) Ermak, D. L.; McCammon, J. A. J. Chem. Phys. 1978, 69, 13521361. (35) Dill, K. A. Biochemistry 1990, 29, 713-715. (36) Schneller, W.; Weaver, D. L. Biopolymers 1993, 33, 1519-1535. (37) Yapa, K.; Weaver, D. L.; Karplus, M. Proteins 1992, 12, 237265. (38) Lee, S.; Karplus, M.; Bashford, D.; Weaver, D. L. Biopolymers 1987, 26, 481-506. (39) Pear, M. R.; Northrup, S. H.; McCammon, J. A.; Karplus, M.; Levy, R. M. Biopolymers 1981, 20, 629-632. (40) Pauling, L.; Corey, R. B.; Branson, H. R. Proc. Natl. Acad. Sci. U.S.A. 1951, 37, 205-211. (41) Roder, H.; Elove, G. A.; Englander, S. W. Nature 1988, 335, 700704. (42) Hughson, F. M.; Wright, P. E.; Baldwin, R. L. Science 1990, 249, 1544-1548. (43) Hughson, F. M.; Barrick, D.; Baldwin, R. L. Biochemistry 1991, 30, 4113-4118. (44) Schneller, W.; Pappu, R. V.; Weaver, D. L. J. Comput. Chem., to be published. (45) Kraulis, P. J. J. Appl. Crystallogr. 1991, 24, 946-950. (46) Radford, S. A.; Dobson, C. M. Philos. Trans. R. Soc. London, B 1995, 348, 17-25.

JP952543E