6902
J Phys. Chem. 1989, 93, 6902-6915
Intermediates and Barrier Crossing in a Random Energy Model (with Applications to Protein Folding) Joseph D. Bryngelson* Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91 125
and Peter G . Wolynes* Noyes Laboratory, University of Illinois, Urbana, Illinois 61801 (Received: February 7, 1989; In Final Form: May 26, 1989)
We show, by application to a simple protein folding model, how a stochastic Hamiltonian can be used to study a complex chemical reaction. This model is found to have many metastable states and the properties of these states are investigated. A simple generalization of transition-state theory is developed and used to estimate the folding time for the model. It is found to have a glass phase where the dynamics is very slow. The relevance of our results to protein folding and the general problem of complex chemical reactions is discussed.
I. Introduction Many, if not all, condensed-phase chemical reactions are complex in that they involve crossing a large number of energy barriers. Examples of complex chemical reactions are isomerizations and diffusional transport in glasses,‘ ligand binding in heme proteins,* and the folding of globular protein^.^ However, most of our ideas about chemical reaction dynamics come from studies of gas-phase reactions or simple condensed-phase isomerizations in liquids! Even at a descriptive level, complex chemical reactions are not as well understood as these simple reactions. The free energy surface of a complex chemical reaction is generally high dimensional, not precisely known, and may even vary from sample to sample. It is widely believed that these free energy surfaces are very rough, having many saddle points and local extrema. The analysis of kinetics on a realistic rough free energy surface is a challenging problem. It appears that our understanding complex chemical reactions would advance if we had a simple model. In this paper we investigate such a simple model that captures both the many minimum and multidimensional nature of the problem. One way to do this is to change from a complex deterministic description of the problem to a probabilistic one. We can replace the complex, unknown potential function by a stochastic one that has the same statistical characteristics. (Then we use some of the powerful theoretical methods that have been developed for the study of amorphous solids to analyze the kinetics on the resulting free energy surface.) The use of random functions to model complex Hamiltonians dates back to Wigner’s treatment of highly excited states of heavy nuclei.* More recently, stochastically defined Hamiltonians have been used in many areas of condensed-matter physics. Quantum treatments of statistical Hamiltonians have also arisen in the area of radiationless transitions. Most recently, in chemical physics, they have been used by Stein to study protein dynamics6 and by us in our analysis of the equilibrium statistical mechanics of protein folding.’ The Bryngelson/Wolynes model is equivalent to a random energy ( I ) Blumen, A.; Klafter, J.; Zumofen, G. In Optical Spectroscopy of Glasses; Zchokke, I., Ed.; Reidel: Nowell, MA, 1986 (distributed by KIuwer in USA). (2) Frauenfelder, H.; Wolynes, P. G. Science 1985, 229, 337. (3) Ghtlis, C.; J. Yon, J. Protein Folding; Academic Press: New York, 1982. (4) Glasstone, S.; Laidler, K. J.; Eyring, H. The Theory of Rate Processes; McGraw-Hill: New York, 1946. ( 5 ) Wigner, E. P.Proc. Cambridge Philos. SOC.1951, 47, 790. (6) Stein, D.Proc. Natl. Acad. Sci. USA 1985, 82, 3670. (7) Bryngelson, J. D.;Wolynes, P. G . Proc. Natl. Acad. Sci. USA 1987, 84, 7524.
0022-3654/89/2093-6902$01.50/0
model spin glass with superimposed ferromagnetic order. It was the study of the dynamics of our folding model that led us to the present study. It is appropriate to point out in this Festschrift article that we were chagrined to discover (after we had done a lot of the work) that Bob Zwanzig had been led to similar considerations in studying complex chemical reactions.* (Being “scooped” by Bob is not uncommon in our business.) In his usual elegant manner, Zwanzig considered a simple one-dimensional version of the problem that is essentially exactly solvable. Some of the results of our approximate theory of the many-dimensional model are anticipated in that calculation and the interested reader will benefit greatly by reading Zwanzig’s article first. Nevertheless, there are some extremely interesting twists in the many-dimensional problem that we hope will be brought out here. Although the considerations we describe should be rather generally applicable to complex chemical reactions, we discuss the problem using the language of our protein folding model. This model is formulated in terms of a discrete set of microstates, which in the general context could be taken as the minima in a continuously defined potential energy surface. The discreteness of the states greatly simplifies the mathematics of our statistical analysis. In this model, despite the discreteness, there is still a natural collective coordinate measuring the progress of the reaction. The equilibrium aspects of our model for folding are closely tied to the finite range Potts glass, which one of us has argued elsewhere may be a rather generic description of glassy system^.^ We have also chosen to use the language of protein folding because, in a certain limit, one ends up with a picturesque result that has been anticipated in that area. This result is that in some circumstances each configuration must be searched in order to successfully carry out the reaction. We call this the Levinthal limit, after his famous argument about the difficulty of protein folding.’OJ1 We should point out, however, that our assumed dynamics in this model may well be quite unrealistic for actual proteins (although not necessarily for a large class of simulations of proteins). Local correlations in the energy surface as well as collective movements almost certainly influence the real process. The organization of this paper is as follows. In section 11, the folding model is introduced and the topography of the stochastic (8) Zwanzig, R. Proc. Natl. Acad. Sci. USA 1988, 85, 2029. (9) Wolynes, P. G. In Festschrut f o r Hans Frauenfelder; American Institute of Physics: New York, 1988. (10) Levinthal, C. J . Chim. Phys. 1968, 65, 44. (1 1) Levinthal, C. In Mossbauer Spectroscopy in Biological Systems; (proceedings of a meeting held at Allerton House, Monticello, IL); University of Illinois Press: Urbana, IL, 1969; p 22.
0 1989 American Chemical Society
The Journal of Physical Chemistry, Vol. 93, No. 19, 1989 6903
Dynamics of a R E M energy landscape is investigated. We argue that the model has many local minima. We calculate their number and their distribution and the lifetimes of the corresponding metastable states. These metastable states may be related to the kinetic intermediates observed in folding experiments.I2 In section 111, we move on to calculate the time scale of folding. We will argue that the lifetime distribution we have calculated can be used to model the folding process by a continuous time random walk (CTRW), then show that the CTRW is closely approximated by a generalization of the Fokker-Planck equation, and finally find the folding time scale by solving a first passage time problem. In section IV, we summarize our results and discuss their relevance to both folding and the general problem of complex chemical reactions. 11. Lifetimes of Kinetic Intermediates The calculations of this paper are based on the model discussed in a note by the present authors.' We shall assume that the reader has at least a loose understanding of the ideas and results of this earlier work. For completeness we remind the reader of the important features of this model. We model a protein as a linear chain of N amino acids, each with v 1 states. One of these states is present in the ideal, native, folded conformation and we will call this the native state of the amino acid. We refer to an amino acid in its native state as a native amino acid. There are v nonnative states for each amino acid and we take v to be of order 10. Next we divide the energies involved in protein folding into three groups. First there is an energy associated with the state in which an amino acid residues. We take this energy to be -e if the amino acid is in its native state and distributed with mean zero and standard deviation A€ if the amino acid is not in its native state. Next, there is an energy which arises from the interaction between two amino acids that are close together on the chain. We take this energy to be -J if both amino acids are in their native state and distributed with mean zero and standard deviation AJ otherwise. Finally, there is an energy that arises from the interaction between two amino acids that are far apart along the chain but close together in space. We take this energy to be -Kif both amino acids are in their native state and distributed with mean zero and standard deviation AK otherwise. Typically each amino acid will be neighbored in space by z other amino acids that are distant from it in sequence. We take z to be a fixed parameter. In ref 7 we argued that the probability of a given state with pN native amino acids having energy between E and E dE is given by
+
1
+
1
than the total number of protein states. This general picture will remain valid so long as the number of amino acids that change simultaneously is much smaller than the total number of amino acids in the protein. Since we have adopted a dynamics where one amino acid changes its state at a time, it is clearly sensible to connect a protein state with pN native amino acids to protein states with pN, pN - 1 , or pN 1 native amino acids. For a protein microstate with Np native amino acids, in our model there are N ( l - p ) ways to change the state of a single native amino acid which will add a native amino acid, Npv such ways to subtract a native amino acid, and N(l - p ) ( v - 1 ) ways to change the state of a single amino acid so that the number of native amino acids remains unchanged. Therefore we connect each state with Np native amino acids to N ( l - p ) states with Np + 1 native amino acids, Npv states with Np - 1 native amino acids, and N( 1 - p ) ( v - 1) states with Np native amino acids. A change of states of single amino acids can change the energy of the protein by a very large amount because this change affects the spatial positions of many amino acids. Therefore we will not restrict the state to state changes in energy. Finally we must complete our dynamical scheme by writing an equation for the rate the protein travels from state to state. We use Metropolis dynamics, Le., if a protein is in state A, the rate it leaves A to go to state B, which it is connected (in the sense discussed above) to, is for EB> E, R = Ro eXp[-(EB - E A ) / T ]
+
= Ro
for E, C E,
(6)
where Rois a constant which is on the order of inverse nanoseconds. The use of Metropolis dynamics is attractive for our model because it gives the appropriate Arrhenius factor for activated events. A state is a local minimum if all of the states connected to it have higher energy. When N ( l - p ) >> 1 the distribution of energies g(E,p) changes very little when the number of native amino acids changes by one, i.e., p changes 1 IN. Therefore, g(E,p) is an excellent approximation for the energy distribution of states connected to a particular state with pN native amino acids. We use this approximation throughout this paper. Thus the probability that a state with pN native amino acids and energy Eo is a local minimum is (7)
[ E - E(P)12 g(E,p) d E = [ 2 ~ A E ( p ) ~ ]exp - l / -~ ~ A E ( P ) ~dE ( ' )
so the average number of local minima is
where
E(p) = N [ - q - LpZ] A E ( P ) ~ = N [ A € ~ (-Ip )
+ A L ~ ( -I p 2 ) 1
where
(3) (9)
(4)
L=J+zK AL2 = A P
(2)
+ZAP
(5)
-
For the probability distribution to be well-defined in the thermodynamic limit we must require d E Na, where 0 < a < 1 . 1 3 Equations 1-5 are the mathematical starting point of the present calculation. Next we discuss the way the protein moves from one state to another, Le., the dynamical connectivity of the microstates. The simplest reasonable dynamical scheme for our model is one where the protein changes state by changing the state of one amino acid at a time. Then each protein state is connected to Nv other protein states by a single transformation. This approximation will be very good if the typical time between two changes of the protein state is much greater than the typical time of a conformational transition. More generally, each protein state is connected to n other protein states, where n is much larger than 1 and much smaller (12) Kim, P. S.; Baldwin, R. L. Annu. Rev. Biochem. 1982, 51, 459. (13) Derrida, B. Phys. Rev. B 1981, 24, 2613.
is the number of states with pN native amino acids. Now we wish to evaluate the integral E
1'-dEo g(Eo*p) PLM(Eo,P) -m
(10)
Integrating by parts gives
Substituting (1 1) into (8) gives the average number of local minima (nm)
( v + 1)N =Nv 1
+
(12)
This result has also been obtained by Kauffman and Levin.14 Notice that this estimate is independent of the distribution of (14) Kauffman, S.; Levin, S. J . Theor. Biol. 1987, 128, 11-45.
6904
The Journal of Physical Chemistry, Vol. 93, No. 19, 1989
energy levels. Typically, v is of order lo1 and N is of order lo2, so the number of local minima is enormous. Also notice that the time required to run a computer program which searches a nonnegligible fraction of this minima grows exponentially with the size of the protein. In the course of folding a protein molecule will be temporarily trapped in some of these minima. We expect that some of the deeper minima would be observable as intermediate structures in protein folding. These folding intermediates are simply misfolded structures. Folding intermediates are observed in experiments. We now investigate the energy distribution of local energy minima. The probability a given local minimum has energy Eo, which we call gLM(EO,p), is proportional to the probability that an energy level with energy Eo is a local minimum times the probability that an energy level exists at Eo, so the desired dis= (Nu l)g(Eo,p)P L M ( E o , ~We ) . have tribution is gLM(EO,p) so that that integral of gLM(&@) over Eo normalized gLM(EO,p) is one. For the vast majority of minima,
+
Bryngelson and Wolynes changes of protein state) away from the local minimum state. For our random energy model this means the rate of leaving the local minimum stafe is the same order of magnitude as the rate of leaving the basin of attraction of that local minimum state.I4 Therefore, we calculate the rate of leaving a local minimum state with energy Eo. The distribution of energies of states connected to a local minimum with energy Eo and pN native amino acids is
=0
for E C Eo
(20)
The rate of the protein molecule leaving the local minimum state is simply the sum of the rates for the molecule going from the local minimum state to the states connected to the local minimum state, so R = R O E exp(
-?)
so
_-
gLM(EO,p) = Nv g(Eo,p) e x p-[ - N v-j E-o dE g ( E , p ) ] ( 1 4 )
We evaluate the above integral and approximate the resulting error functions to find
Inspecting (15) we see that, for Eo C E ( p ) - ( 2 log N V ) I / ~ A E ( ~ ) , the second term in the exponential is negligible and for Eo > Z(p) - ( 2 log N v ) ' / ~ A E (the ~ )second term is large and negative so gLM(E0,P) is close to zero in this regime. Therefore, we approximate gLM(E0,p) by
where Ei is the energy of each connected state. Using the distribution gmnn(E,p)we find the average value of R is
[
- ( 2 log Nv)'/'AE(p)
(17)
Recall that the number of protein states with pN native amino acids is given by Q(Np)in eq 9 , so the energy of the lowest lying state with pN native amino acids is approximately Elow(p),where Q ( N p )g(El,,.,(p), P ) = 1. Solving for EloW(p)gives E,ow(P) = E(P) - [2S*(P)1'/2AE(P)
(18)
where S * ( P )= log W N P )= N [ - P log P
- (1 - P ) log ((1
( ''-2')]
@
T '2711'
$)
erf
AE~ +CE T
(16)
where E,(p) = E ( p )
Eo-6
We assume that Eo C E , which should be true for almost all of the local minima. The arguments of the error functions in eq 22 grow as N ' / 2 ,so we use the asymptotic expression of the error function to obtain (i) for Eo
for Eo > E,(p)
+
1 - sgn ( E o - E
gLM(EO@)==
=0
[
~a~ exp
p L M ( ~ O=, p )
AE2 (ii) for Eo + - > i? T
I ' : :[ (
RLM(EO,p) = R@v exp
--
(23b)
since N >> log N for large N . We may now calculate the distribution of average escape rates from local minima
-~)/y)l
(19)
We shall meet both Elow(p)and S*(p) later in this paper. A better understanding of folding dynamics may be acquired by studying the lifetimes of these metastable states. In the strictest sense this is a very difficult problem because we cannot really say that a protein molecule has "escaped" from a particular local energy minimum until it has left a certain region, which we will call the basin of attraction of the local minimum, where it has a high probability of falling back into the local minimum state. However, each state is connected to many (Nu) other states and these states will generally have a broad range of energies so the protein molecule will usually leave the basin of attraction of a local minimum after taking only a couple of steps (by steps we mean
Substituting eq 16, 22, and 23 into eq 24 gives PLM(R,P)=
\
I
The Journal of Physical Chemistry, Vol. 93, No. 19. 1989 6905
Dynamics of a REM
for R d p )
> R > R,,,(P), and PLM(R,P)= 0
for R
(27)
> Rfastb)or R < R , I ~ ~ ( Pwhere ).
R&)
Rfastb)
RLM(Ec,P)
Rslowb) e
RLM(EIow,P)
E
use this physical application to introduce the approach. The transport of electrons (or holes) in an amorphous insulator can be modeled as hopping between trapping sites in the solid. The rate of escape from a trap varies from trap to trap. A given electron voyaging through the solid encounters many traps of varying escape rates. Following the review of Pfister and Scherzo we fold this distribution of trapping events into a single function of time, \ k ( t ) . The function \ k ( t ) is the probability density for a particle to remain in a trap for a time, t , and it is the result of a weighted sampling of the distribution of escape rates. More specifically, if P(R) is the probability for an electron to fall into a trap with escape rate R then \k(t) =
RONV~ X P [ - ( W L J ) ~ ) / ~ ~ I (28)
and Ec(p) and Elow(p)are defined in eq 17 and 18. The salient feature of these distributions is the fact that the distribution of eq 26 is much flatter than the distribution of eq 25. For the set of states with pN native amino acids there are two important temperatures Thighb) =
W P )
[2 log N v ] ’ / ~
(29)
and
where S*(p) is defined in eq 19. When T > Thi&(p) then RSp(p) > Rf&) > Rslow(p),so the distribution of escape rates is always given by eq 25; that is, it is log normal, and very long escape times are fairly uncommon. When Thigh(p)> T > T g ( p ) , then Rfast(p) > R,, ( p ) > Rslow(p),so the distribution of faster rates ( R > R,,,(p!) follows eq 26 and the distribution of slower rates ( R < R,,(p)) follows eq 25. In this regime the distribution of escape rates is fairly flat from the fastest rate for escape from a metastable state down to R,,(p) and then distribution falls off more quickly for slower rates. For these temperatures a few very long lived intermediates will be observed. When T&) > T, then RfaSt(p) > Rslow(p)> R,,(p) so the distribution of escape rates is always given by eq 26. We call this low-temperature regime the glassy phase of folding dynamics. In the glassy phase the distribution of escape rates is quite flat all the way from the fastest down to the slowest escape rate in the system. In this regime very slow escape rates are reasonably common. Notice that the dynamic glass transition occurs at exactly the same temperature as the equilibrium glass transition in ref 7. 111. Time Scale of Folding Next we investigate the effect of the intermediates on the time required for a protein to fold. Our calculation has three goals. First, we would like to find out what new kinetic effects arise from presence of many metastable states with a broad distribution of lifetimes. Second, we would like to get a very rough order of magnitude estimate for the folding time for our model. Finally, we would like to develop mathematical techniques that can later be used to study more realistic models of protein folding. In our calculation of the folding time we will use several mathematical tools. We pause here to introduce and briefly review these tools for readers who may not be familiar with all of them. We also give references to some more complete elementary treatments of these tools. Our first tool is the continuous time random walk (CTRW).15,i6 The CTRW approach has been used extensively and successfully in studying the charge transport in amorphous and we (15) Montroll, E. W.; Weiss, G. H. J . Math. Phys. 1965, 6, 167. (16) Montroll, E. W.; Shlesinger, M. F. Studies in Statistical Mechanics, Vol. 1 1 , Nonequilibrium Phenomena 11. In Stochastics to Hydrodynamics; Lebowitz, J. L., Montroll, E. W., Eds.; North Holland: Amsterdam, 1984; P 1. (17) Scher, H.; Lax, M. Phys. Rev. B 1973, 7, 4491.
10
dR P(R)R exp(-Rt)
(31)
since R exp(-Rt) is the probability density for an electron to remain for a time t in a trap with escape rate R. The function \k(t) is called the waiting time distribution for the process. Now we model the heterogeneous lattice with different escape rates at each site by a homogenous lattice with the waiting time distribution \k(t) at each site and the probabilityp(r) for an electron to be displaced r after leaving the trap. In this new homogeneous system it is easy to calculate G(r,tlr’,O), the probability for an electron to be at r at time t if it was at r’at time 0, once we are given \ k ( t ) and p(r). We will illustrate a formal calculation of G(r,tlr’,O) later in this paper. The function G(r,tlr’,O) can be used to calculate many properties of charge transport in the solid. The kinetic intermediates calculation suggests a continuous time random walk approach to calculating the time scale of folding. We can model each state of the protein as a trap with an escape rate, R, which is a function of its energy, E , and the fraction of native amino acids that are in their native state, p. Then we can approximate this hopping model of protein folding by a CTRW with the waiting time distribution \k(t,p)
=
xm
dR P(R,p) R exp(-Rt)
(32)
where
Notice that eq 32 is just eq 31 with a p dependence. Also notice that eq 33 has exactly the same structure as eq 24, so we can use the mathematics of the kinetic intermediates calculation to fine \k(t,p). We will pursue this tactic further in our folding time scale calculation. This CTRW approach is fairly simple and captures some important aspects of our problem, such as the broad distribution of escape rates we found in the previous section, and the variation of escape rate with p. The master equation is another frequently used approach to stochastic p r o c e s ~ e s . ~ ’ -Consider ~~ a set of discrete states {x)and let w(x,x’) be the rate of the system going from x to x’, and let n(x,t) be the probability that the system is in state x at time t . The master equation is simply the expression
Equation 34 assumes that the probability of the system moving from one state to another is independent of the time spent at the first state, or in other words, there is no history dependence in eq 34. We can make our equations even more general by making (18) Scher, H.; Lax, M. Phys. Reu. B 1973, 7,4502. (19) Scher, H.; Montroll, E. W. Phys. Reu. B 1975, 12, 2455. (20) Pfister, G.; Scher, H. Adu. Phys. 1978, 27, 747. (21) van Kampen, N. G. Stochastic Processes in Chemistry and Physics; North Holland: Amsterdam, 1981. (22) Haken, H. Synergetics, An Introduction, 3rd ed.; Springer: Berlin, 1983. (23) Gardiner, C. W. Handbook ofStochastic Processes; Springer: Berlin, 1983.
6906
The Journal of Physical Chemistry, Vol. 93, No. 19, 1989
the probability of moving from one state to another a function of the time spent at the first state. More precisely, we define the rate for a system that has arrived at state x at time zero to move to state x’ between time t and t + dt to be @(x,x’,t) dt. This generalization yields dn(x,t) dt
-=
5J‘ dt’
[@(x’,x,t-t’)
n(x’,t) - @(x,x’,t-t? n ( x , t ) ] (35)
Equation 35 is called the generalized master equation, and it is commonly used in the study of stochastic p r o c e s s e ~ . ’ Kenkre ~~~~ et al. have shown that the CTRW discussed above is equivalent to a generalized master equation.24 Our folding time scales calculation will exploit this equivalence. We will show that the continuum limit of the generalized master equation is a simple generalization of the well-known Fokker-Planck (or Smoluchowski) equation
Bryngelson and Wolynes protein. Each state has the same connectivity as in our original model and q ( t , p ) is determined by eq 32 and 33. We will discuss the form of the function g(E,p) later. Since we are interested in folding and because, by definition, the protein folds by going from a low p to a high p state, we are particularly interested in events that change p. In our continuous time random walk model there is a waiting time distribution \ko(t,p) associated with leaving the set of states with Np native amino acids. Out of the Nu ways to leave each state there are N [ 1 (v - l ) p ] paths which lead to states with Np + 1 or Np - 1 native amino acids, so the probability of leaving the set of states with N p native amino acids on a particular jump is X ( p ) 1 / v + (1 - l / v ) p . Now we can calculate \k o ( t ,p ) in terms of \ k ( t , p ) by using the self-consistency requirement
+
*o(t,p) = M P )
WtdJ)+ [ 1 - WlJ‘ dt’*,(t-t’,p)
Wt’4)
(37)
Equation 37 may easily be solved by the method of Laplace transforms to find where x is now a continuum variable, D ( x ) is the diffusion constant, U(x) is a potential energy, and P is the inverse temperature. Our task is to find the time that it takes for a protein molecule, starting from an unfolded state, to find its native, folded state. More precisely, we divide the (v l)N different protein microstates into the two disjoint sets: U,the unfolded states and F, the folded states. We are interested in finding the time required for a molecule starting in a typical state in U to find its way to one of the states in F. (Here “typical state in U” means an unfolded state with a large Boltzmann weight.) Problems of this sort are called first passage time problem^.^^^^^ Since we are considering a stochastic process the first passage time will be a random variable, so we are interested in the distribution of first passage times and in particular the average or mean first passage time. Mean first passage times are related to reaction rate^.^^-^' We will find a simple expression for the mean first passage time for a system described by a generalized Fokker-Planck equation. The calculation of folding time scales is rather long, so we provide the reader with a road map. First, we express our folding problem in terms of a continuous time random walk and show that the waiting time distributions for this problem are related to the distributions of escape rates from each site. Then we derive the generalized master equation that represents our continuous time random walk. Next we show that the continuum limit of our generalized master equation is a generalized Fokker-Planck equation. After that we calculate the mean first passage time for the generalized Fokker-Planck equation and show that the mean first passage time is related to the first inverse moment of the distribution of the rate of escape from the sites. Then we calculate the inverse moments. Finally we put everything together by deriving a simple approximate expression for the mean first passage time, which is in the spirit of the transition-state theory of reaction and use this approximation to find the time scale of folding for a chemically reasonable set of model parameters. The calculation of the generalized Fokker-Planck equation and mean first passage time is rather long and technical, so the reader who is not interested in technical details may take note of the definition of eq 40 and the free energy expression of eq 74 and then skip directly to eq 114 for the mean first passage time. In the continuous time random walk approximation to the above model each protein state has a waiting time distribution \ k ( t , p ) which depends only on the number of native amino acids in the
+
(24) Kenkre, W. M.; Montroll, E. W.; Schlesinger, M. F. J . Stat. Phys. 1973. 9, 45.
(25) Weiss, G . H . J . Stat. Phys. 1986, 42, 3 . (26) Szabo, A.; Schulten, K.; Schulten, Z . 1.Chem. Phys. 1980, 72,4350. (27) Schulten, K.; Schulten, Z.; Szabo, A. 1.Chem. Phys. 1981, 74, 4426. (28) Pelzer, H.; Wigner, E. 2.Phys. Chem. B 1932, 15, 445. (29) Eyring, H. J. Chem. Phys. 1935, 3, 107. (30) Evans, M. C.; Polanyi, M. Trans. Faraday SOC.1935, 31, 875.
By direct Laplace transform of eq 32 we find \E(s,p) in terms of the distribution of rates (39)
so if we let h(R) be some function of the escape rates and define
then we may write eq 39 as
The expression for \ko(s,p) in terms of the distribution of escape rates can be found by substituting eq 41 into eq 38 to yield
In reading the following calculations the reader may find it easier to think of our model as a one-dimensional array of N sites. Each site has a waiting time distribution \ko(t,p) and the protein can only move one site at a time. When a protein molecule leaves the set of states with Np native amino acids it can go to a state with Np 1 or Np - 1 amino acids. We call the probability of the molecule going to a state with Np 1 native amino acids p+(p) and the probability of the molecule going to a state with Np - 1 native amino acids p-(p). Clearly p+(p) p-(p) = 1. We will calculate p+(p) and p-(p) later. The equivalence of the continuous time random walk and the generalized master equation was first demonstrated by Kenkre and co-workers for a translationally invariant lattice.24 This equivalence was extended to lattices without translational invariance by Kenkre and b o x 3 ’ and by Shugard and re is^.^^ Our derivation of a generalized master equation that is equivalent to our continuous time random walk model loosely follows the work of Shugard and Reiss. First we define
+
+
+
6p
so Np f 1 = N ( p j(t,p,p’)
1/N
(43)
1
b p ) and the “jump” function
= * o ( t , p ) l p + ( p ) 6(P,P’+bP)
+ P-(P)
6(P,P’-6P)l
(31) Kenkre, W. M.; Knox, R. S. Phys. Rev. B 1974, 9, 5279. (32) Shugard, W. J.; Reiss, H . J . Chem. Phys. 1976, 65, 2827.
(44)
The Journal of Physical Chemistry, Vol. 93, No. 19, 1989 6907
Dynamics of a REM where if x = y
6(x,y)= 1
= 0 otherwise
(45)
Notice thatj(t,p,p’) is the probability of jumping from p to p’ after waiting time t at p . Also we define C(p,t) to be the probability of being at p at time t if we started at p = 0 at time zero and 1-
Qo(t,p) 3
dt’ *o(t’,p)
(46)
which is just the probability of remaining a t p for time t . Next we formally solve for G(p,t). Our strategy is to show that this formal solution is the same as the formal solution of the generalized master equation. Summing over all paths from (0,O)to (p,t) gives G ( p d = Qo(t,p) ~ ( P , O )+ O
@a(PJ) = @(P,Pf6P,t)
(63)
In this new notation eq 54 becomes
J 1dt’ Qo(t-t’,p) j(t’,p,O) +
T j ldt‘ 1‘’ dt” Qo(t-t’,p) P
and all other K(p,p’,s) = 0. The expressions for K(p,p’,t) in eq 61 and 62 show that the generalized master equation of eq 54 does indeed describe our CTRW and gives expressions for the coefficients @(p,p’t). We can simplify our notation by taking advantage of the form of the functions @(p,p’,t) and defining
0
j(t’-t’’,p,p’) j(t’:p’,O)
+ ... (47)
If we Laplace transform eq 47 we find G ( P J ) = Q O ( ~ , P )~ ( P , O )+ Qo(s,p) j(s,p,O) + Q O ( ~ , P ) SAP') j(s,p’,O)
+ ...
(48)
Equation 48 may be summed formally if we define the matrices [ Q o ( ~ ) l ~3, ~Qo(s,p) ! %P’) i(s>P,P’)
W I p , p t
(49)
The functions @&,p) can be easily expressed in terms of averages over P(R,p), the distribution of escape rates defined in eq 29, by combining eq 42 and 65 to find
(50)
and the vectors
PI,
6(P,O)
G(PJ)
[G(s)lp
(52)
Then we have C(s) = Qo(s).[l
- j(s)]-I*I
(53)
We can use the long time properties of generalized master equations to calculate the function pa@). First we show that for a function f(t) and its Laplace transformf(s) lim sf(s) = lim f(t ) (67) s-0
where 1 is the identity matrix. A generalized master equation that describes our process is
1-.-
Let kmt)]be the Laplace transform ofAt). From the properties of the Laplace transform of a derivative we have lim sf(s) = lim L#’(t)] f(0) S-0
+
s-0
Equation 54 is of the form = limf(t) I-.-
The Laplace transform of eq 64, the generalized master equation, is
The Laplace transform of eq 55 is sG(p,s) - 6 ( P , O ) = E.K(p,p’,s) G(p’,s)
(56)
P‘
so if we define the matrix [K(s)l,,t
= K(P,P’J)
(57)
and recall the vectors defined in eq 51 and 52, then eq 56 may be formally solved to yield C ( S )= [sl - K(s)]-’-I
(58)
The continuous time random walk and generalized master equation descriptions of our model are equivalent if eq 53 and 58 give the same C(s), which yields K(s) = s l - [ l
- j(s)]QO(s)-l
(59)
From the definition of Qo(s) (eq 49) we find r
If we substitute eq 60 and our definitions from eq 49-52 and 57 into eq 59 we find that
~ G ( P J-) ~ ( P , O ) = @ - ( P + ~ P J ) G(p+Gp,s) + @+(P-~PJ) G ( p - 6 ~ ) [ @ + ( P J ) + @-(PJ)IG(PJ) (69)
We define n e b )to be the equilibrium occupation of the set of states with Np native amino acids so n e b ) = lim 1-.-
WPJ)
(70)
If we multiply eq 69 by s and take the limit as s goes to zero we find @ - ( P + ~ P , O ) ne(p+gp)
+ @ + ( P - ~ P , O ) ne(p-bp) = [@+(P,O) + @ - b , O ) I n e b )
(7 1)
Equation 71 imposes a constraint on our model. Notice that eq 71 is always satisfied if we impose the following generalization of the detailed balance principle, (72) @-(p+Gp,O) ne(p+6p) = @+W) neb) Notice that the usual principle of detailed b a l a n ~ e ~obtains ’-~~ when @&,s) is independent of s. For our purposes the power of eq 72 lies in the fact that the equilibrium distribution n,(p) must be a Boltzmann distribution, so
6908
The Journal of Physical Chemistry, Vol. 93, No. 19, 1989 neb)
exp[-P%)l
(73)
Bryngelson and Wolynes p
and substituting the result into the definition of y ( p ) gives
where F ( p ) is the free energy as found in ref 7.
{ ( );
F(p)=N-
and P
p-
e--
( ):; L--
p2+Tpl0gp+
I/T. Now we may use eq 66 to rewrite eq 7 2 as
where we have defined and
The function ~ ( p is) a typical time for a protein molecule to be ‘‘stuck’! in the set of states with Np native amino acids. We are interested in the case of large N and therefore small 6p, so we find simple expressions for p&) that are valid in the limit of small 6 p . Probably the easiest way to do this calculation is to define AP(PJP) = P-(PJP) - P+(PJP)
(77)
Substituting the expression for Ap(p,bp) in eq 8 1 into eq 89 gives 4 P J ) =
2D(PJ) Y(P)
(90)
6P
so
so
Notice that in eq 77 and 78 we have made the dependence of p+(p) on the size of 6p explicit. We simply expand Ap(p,dp) for small
It has been shown that master equations with (frequency independent) coefficients of the form W(X,Xl) =
6P
AP(P,6P)
= APo(P) + A P l b N P + .’’
(79)
and use eq 78 to find the first two coefficients of this expansion. Setting 6p = 0 in eq 78 we find Ap,,(p) = 0. In retrospect this result is obvious because Ap(p,dp) reflects the difference between “sites” at p 6p and p - 6p, and when 6p = 0 these two ”sites” become identical! Expanding both i d e s of eq 78 to first order in 6 p we find
+
After substituting eq 80 into eq 79 for Ap(p,Gp) we find for small 6P
AP(P&) = Y2Y(P)6P
[ 9+
(81)
where we have defined
[-?+
%]6(x,
x-6x) (92)
-
lead to Fokker-Planck equations in the 6x 0 limit.23 The coefficients of our generalized master equation given in eq 91 is a frequency-dependent generalization of the coefficients of eq 93 so it should come as no surprise that our generalized master equation becomes a generalized Fokker-Planck equation as bp gets small, as we will show. Substituting eq 91 in the Laplace transform of our generalized master equation as given in eq 69 yields 1 sG(p,s) - 6(P,O) = ---[D(P+6P9S) 2(6P) D(P-6P,S) Y(P-6P)
G(p+Gp,s)
Y(P+bP) G(p+6p,s) -
1
+ -[D(P+6P,S) x (bI2 G ( p - b , s ) - 2D(P,S) G ( P J ) l (93)
G(p-6w)l
+ D(p-bp,s)
In the limit of small 6p eq 93 becomes Henceforth we will suppress the 6 p dependence of p&) to save space. Now we can readily show that our generalized master equation goes over to a generalized Fokker-Planck equation in the continuum limit. First we smooth our path to this goal by making several definitions. We define
SG(P,S)
- 6(P) =
a
a2
dP
aP
- [ w J , s ) Y ( P ) G(PJ)l + -
p P d
G(p9s)l (94)
We can cast eq 94 into more familiar looking form. We define
Then, from eq 85 we have Later we will show that D ( p , s ) is a frequency- and position-dependent diffusion constant for motion along the p “direction”. Notice that 2D(P,O)
T(P)
=
(@I2
(84)
which is strongly reminiscent of the definition of the diffusion constant in one dimension. Differentiating eq 84 with respect to
and our generalized Fokker-Planck equation becomes
The Journal of Physical Chemistry, Vol. 93, No. 19, 1989 6909
Dynamics of a REM If the diffusion constant D(p,s) is independent of s, then U(p,s) = @F(p)and eq 97 is
We use the notation D(p) = D(p,O) and inverse Laplace transform eq 98 to find
a
A P J ) = -D(p,s) exP[-U(p,s)laplC(p,s) exP[U(P,s)ll
(107)
Our boundary conditions are a reflecting boundary condition at p = 0, j(0,s) = 0, and an absorbing boundary condition at pf, C(pf,s)= 0. Integrating eq 106 from 0 to p and using the reflecting boundary condition gives j h s ) = -JPf
dp’ [sG(p’,s)
- ni(p’)l
(108)
and combining this result with the expression forj(p,s) in eq 107 yields Equation 99 is just the usual Fokker-Planck equation used in many applications in physics and c h e m i ~ t t y ? ’ - ~and ~ , ~so~we are justified in referring to eq 97 as a generalization of the usual Fokker-Planck equation. We calculate the mean first passage time for our generalized Fokker-Planck equation by integrating eq 97. Recall that eq 97 was derived with the initial condition that all of the molecules start in the p = 0 states. When the initial distribution of molecules is given by n i ( p ) , eq 97 becomes
Next, we integrate eq 109 from boundary condition to obtain
p
to
pf
and use the absorbing
G(p,s) =
(1 10) We are assuming that eventually all of the molecules will make a first passage and therefore be absorbed, so from eq 67 Our original equation is recovered when n,(p) = 6 ( p ) . We will call a protein molecule “folded” when p 1 pf, where typically pf = 0.9,and we will assume that initially no protein molecules are folded, so n i ( p ) = 0 for pf 5 p I 1. As we discussed previously, the folding time of a molecule is required for the molecule to reach pf for the first time, that is, the first passage time. We denote the distribution of first passage times by Fp(t) and define Z ( t ) to be the probability that a molecule has not made a first passage by time t. The distributions F p ( t )and Z ( t ) are related by dZ Fp(t) = -dt The mean first passage time is given by
f=
Lmdt tFp(t)
1-
dt Z ( t )
Notice that the states between p = pf and p = 1 are irrelevant to our present considerations because we assumed that n i ( p ) = 0 for pf I p I 1, and there is no way for a molecule to go fropm p < pf to p > pf with our absorbing boundary, so for p < pf, C(p,t) = 0 for all t . Combining eq 103 and 104 gives
s,” dp G(p,s=O)
(33) Chandrasekhar,S.Rev. Mod. Phys. 1943, 15, 1 . (34) Deutch, J. M. J . Chem. Phys. 1980, 73, 4700.
(111)
m
(1 12) which is the promised expression for C(p,O). Finally, we substitute eq 112 into eq 105 and use the property U(p,O)= @F(p)to obtain our mean first passage time expression
f=
s,” dp j p f d p ’ 1’dp” nib”) exp{B[F(p’) - F(p)ll D(P’90)
0
(113) Equation 1 13 simplifies when ni(p) is peaked about some pi, so n i b ) i= ~ ~ P - P then J,
f=
4; A’ dp
dp’
exp[PFb) - BF(P’11 D(P,O)
(1 14)
Equations 113 and 114 give us expressions for the folding time in terms of the parameters of our model. The function F(p) is given in eq 74 and from eq 84 we have35
Recall that ( l/R)(p) is given by
where
(105)
Our method for obtaining the mean first passage time is a simple generalization of the method of Deutch.” Following Deutch we write eq 100 as
where
e
Taking the limit of eq 110 as s goes to zero and using eq 114 yields
(103)
Suppose that we make the “site” pf a perfect absorber, so a molecule that makes a first passage simply leaves the system. The probability of a molecule being in the system at time t is given by W ,so
f=
sto
P
(1 02)
and substituting eq 101 into eq 102 and integrating by parts gives us a useful expression for the mean first passage time,
f=
lim sG(p,s) = lim G(p,t) = 0
(35) The expert reader may notice a superficial resemblance between eq 1 1 5 and some expressions that occur in the theory of random one-dimensional systems, for example, the work of Alexander et al.36and a paper by our birthday celebrant, Bob Zwangzig.” This resemblance is misleading. The average in eq 115 is taken Over states with particular of p, whereas in the cited studies the analogous average is taken over the transition rates of the entire one-dimensional system. (36) Alexander, S.;Bernasconi, J.; Schneider, W. R.; Orbach, R. Rev. Mod. Phys. 1981, 53, 175. (37) Zwanzig, R. J. Stat. Phys. 1982, 28, 127.
6910 The Journal of Physical Chemistry, Vol. 93, No. 19, 1989
R(E,p) is the rate of a protein molecule leaving a state of energy E with Np native amino acids and g(E,p) is the probability that a protein molecule with Np native amino acids is in a state with energy E. The discussion of g(E,p) will be more clear if we write it as
Bryngelson and Wolynes only for local minima. In our random energy model the distribution of energies connected to a state with Np native amino acids is closely approximated by g(E,p),so the average value of R in eq 121 is given by
v]
Lm
R ( E , ~=) R&u dE' g(E',p) exp[ + E(E,P) = C(E,P)g(E,p) ( 1 17) where, as before, g(E,p) is the probability of a state with Np native amino acids having energy E and is given by eq 1 . The function R&u dE'g(E',p) (122) -_ C(E,p) is proportional to the probability that a particular state with energy E and N p amino acids is occupied. The exact calSubstituting eq 1 for g(E,p) into eq 122 gives culation of C(E,p) for a general nonequilibrium problem is likely to be quite difficult. However, in experimental studies of protein folding thermodynamics many proteins exhibit an all-or-none behavior reminiscent of a phase transition in a finite ~ y s t e m ~ ~ - ~ O which leads one to suspect that there is a free energy barrier (at least of order several between the folded and unfolded states. Our model is consistent with this suspicion. Furthermore, in our hopping model of protein folding the molecule has the opportunity where we have defined the function to change conformation many times without changing p , so it is likely that a set of states with a particular value of p will be T(x) E '/A1 - sgn (x) erf (Ixl)l ( 1 24) reasonably well explored in the process of folding. These situations We may use the asymptotic expression for the error function to C(E,p) by a Boltzmann distribution, suggest that we approximate obtain so (i) for E > e(p) C(E,P) a exp(-PE) ( 1 18)
lE
This approximation is in the same spirit as the similar approximations used in the transition-state theory of chemical reaction rates4*28-30 and by Kramers in his treatment of diffusion over a potential b a ~ i e r . ~Later ~ . ~in~ this paper we will briefly discuss what happens when C(E,p)deviates from a Boltzmann distribution. If we require g(E,p) to be normalized g(E,p) so that42
1;" @,PI dE
(119)
= 1
R(E,p)
(ii) for E ( p )
R&u
(125)
> E > &) - P A E ( P ) ~
1 ':zz21
R(E,p) = R&v exp
-
(126)
(iii) for E ( p ) - P A E ( P )>~ E
then combining eq 1 , 120, and 12 1 gives
(120) The calculation of R(E,p) is almost identical with the calculation of RLM(Eo,p)in eq 21-23. The difference between the two calculations is because RLM(EO,p) is the average rate of escape from a local minimum state with energy Eo and Np native amino acids whereas R ( E , p ) is the average rate of escape from an arbitrary state with energy E and Np native amino acids. The rate of a protein molecule leaving a particular given state is simply the sum of the rates for a molecule going from that state to the states connected to it. Therefore, if the given state has energy E, the states connected to the given state have energies {Ei),and we use the Metropolis dynamics of eq 6 , the rate of leaving our state is
exp[
R = Ro E,>E
-71 (Ei - E )
+ Ro E, R 1
> Rscp(p),and
(121)
The second term in eq 121 is absent in eq 21 because eq 21 is valid (38) (39) (40) (41) (42)
Ikegani, A. Adu. Chem. Phys. 1981, 46, 363.
G6,N . Annu. Rev. Biophys. Bioeng. 1983, 12a, 183. Ga,N. Stat. Phys. 1983, 306, 413. Kramers, H. Physica 1940, 7, 284. The purist would insist that we normalize g(E,p) so that
If we use this normalization then our g(E,p) is replaced by
where & ( p ) = Tg(p)-'and T ( p ) is given in eq 26. Thus g ,i,t(E,p) differs from g i E , p ) by a constant whch lies between 1 and 2. &e we are not keeping track of constants of order 1 we will ignore the difference between g(E,p) and i!pudQ).
for G ( p ) 1 R 2 Rsb,,,(p),where, once again, G ( p ) and k W ( p ) are defined in eq 28. For the sake of completeness we calculate the function ( 1 / R n ) ( p ) ,where n > 0. First we will treat the regime where T > Tg(p). Some readers may wish to skip to eq 145 and 146, which give the results of this calculation. The desired function is
The Journal of Physical Chemistry, Vol. 93, No. 19, 1989 6911
Dynamics of a R E M and in this regime the above integral naturally breaks into three terms
where (133)
Putting eq 132, 133, 139, 141, and 144 together gives us the following expressions
(135)
Substituting eq 129 into eq 134 gives
(145)
and
[ (y)]
so if we define the dimensionless quantity y 2 log I12
(137)
eq 136 becomes
PWP)Y - ' / t B " q
(138)
For n = 1 the integral in eq 138 can be evaluated exactly to yield 2 sinh ['/z/32AE(p)2] P21(P)
=
(2d1lZRdYvPWP)
(139)
For n > 1 the integral in eq 138 cannot be evaluated exactly, however, it is of the form
I =
xb
dx explf(x)]
In the T < Tg(p) regime the protein molecule is trapped in one of the low-energy states with energy E = E,,(p). As we discussed in our previous paper,' this low-temperature regime is the frozen, glassy phase of protein folding. The rate for leaving one of these low-energy states is given by Rslow(p)= R(Elow(p),p),which was previously defined in eq 28. For T < T&p) Rslow(P) = RdVv exP[-XBg(P)z~(P)21
which may be rewritten by using the definition of Tg(p) given by eq 30 to obtain Rslowb) = RdVv exp[-S*(p)l
(140)
(147)
(148)
From the definition of S*(p) eq 148 may be expressed
wheref(x) is an increasing function of x. We may approximate I by doing a Taylor expansion of f(x) about x = 6 to find
(149)
Applying the simple approximation of eq 141 to the integral of eq 138 gives
where R(pN) is the number of states with pN native amino acids and is given in eq 9. Since all of the protein molecules are trapped in states with energy E = El,(p), the calculation of (l/R")(p) is easy for this temperature regime,
exp[ ;d'AE(p)Z] P2n(P)
=
(2*) 1'Z(RdV4n[nPAE(P)1
(1 42)
Substituting eq 130 into eq 135 gives
(143)
The integral in eq 143 may be expressed in terms of the { functions defined in eq 124 to yield
These results (eq 148-150) deserve several comments. First we point out a relationship between the rate we calculated for leaving a state in the frozen phase and some classical ideas about protein folding. Early discussions about the existence of folding pathways focused on the necessity of finding a way of folding a protein without searching through all of the possible states for the minimum energy s t r u ~ t u r e . l ~In~ ~these ~ * ~discussions ~ the time required for finding the minimum energy state was estimated by a PoincarE recursion argument a 11 Boltzmand3 which we now (43) Wetlaufer, D.B. Proc. Natl. Acad. Sci. USA 1973, 70, 691.
6912
The Journal of Physical Chemistry, Vol. 93, No. 19, 1989
repeat for our model. In our model there are (v + l ) N different structures and these structures interconvert at a rate R , = R&v. The time t required to sample each structure is t - -
(v + 1)N RoNv
For reasonable values of Ro, N , and v the time t is enormous. This result led to the postulation of folding pathways and kinetic theories of folding. Consider the dynamics of our model in the frozen phase. Recall that the typical time for a protein molecule to be "stuck" in the set of states with Np native amino acids is given by T(P)
=
x(p)( 1
(74)
which becomes
in the frozen phase. We estimate the time for folding in the spirit of the above argument to obtain N
(153)
Thus the frozen phase of folding is kinetically like the "Levinthal limit" of folding dynamics. Although the energy surface for folding in our model is quite rough, the folding time in the frozen phase is similar to the folding time for a model where the native state has energy 0 and all other states have positive energy E."4 The escape rate in the frozen phase has natural, physical interpretation. Consider a state with energy E = Elow(p). The average number of states with energy between Elow(p)and Elow(p)+ d E which are connected to our low-energy state is NiOw(p)= Nvg(Eidp),p)
(1 54)
Substituting the expression for Elow(p)from eq 18 into eq 154 gives (155)
In our dynamic formalism, the rate of going from the original low-energy state to the set of states with energies between El,(p) and Elow(p)+ d E is Rlow(P) =
RdVv
WNP)
(1 56)
Notice that Rlow(p)in eq 156 is equal to the rate for leaving a protein state in the slow phase. The meaning of this result is clear if we recall that in the frozen phase the protein molecule is trapped in one of these low-energy states with E = ElOw(p)and that d E scales as Nu, where 0 < a < 1, so eq 149 and 156 tell us that in the frozen phase the molecule only makes transitions from the low-lying state to states with energies which differ from the low-lying state by an amount that is nonextensive in the size of (44) An interesting situation in the theory of optimization may be relevant here. Many optimization problems can be mapped to energy functions. Typically problems which are NP-hard have very rough energy surfaces with many local minima." However, it has been shown that for a sufficiently perverse choice of energy function that these intractable computations can be embedded in a golf-course energy surface like the energy surface we just discussed.4 (45) Kirkpatrick, S.; Gelatt, C. D., Jr.; Vecchi, M. P. Science 1983, 220, 671-680. (46) Baum, E. B. Phys. Rev. Lett. 1986, 57, 2764.
Bryngelson and Wolynes the system (Le., the energy difference does not scale as N). More precisely, if the molecule is in a state with energy E = Elow(p) and this state is connected to states with energies Ei = Elow(p) + AEi, then in the frozen phase the molecule will only make transitions to states where Mi Nu, where 0 < a < 1. [Actually there is nothing to absolutely forbid the molecule from making transitions to other states, but the Arrhenius factors for these other transitions are so low in the frozen phase that they are a negligible contribution to the escape rate.] Finally, these results point out the limits of our methods. In constructing our model we assumed that the distribution of energy levels is given by a smooth function g(E,p). We had to make d E large enough for g(E,p) to be smooth. Following the arguments of DerridaI3 we required d E Nu with 0 < CY < 1 . The price we pay for the convenience of working with a smooth g(E,p) is that energy levels that differ in energy by a nonextensive amount are considered to have the same energy. This limitation of our methods is responsible for the temperature independence of the escape rate for molecules in the frozen phase. The average escape rate of a molecule in the frozen phase is determined by transitions to states that differ in energy from the frozen low-energy state by a nonextensive amount. In our calculations these states are treated as if they have the same energy as the frozen low-energy state. In our Metropolis dynamics the transition rate between states of the same energy is Ro, not explicitly dependent on temperatures, so the total rate of escape from the frozen low-energy state is not explicitly temperature dependent. In a "real" system nonextensive energy differences are still energy differences and there are still temperature-dependent Arrhenius factors associated with these energy differences. The dynamics of the transitions between states with nonextensive energy differences in the frozen phase of the random energy model has been studied by DeDominicis, Orland, and LaineE4' and by Koper and H i l h o r ~ t . ~These * papers use dynamical schemes that are much simpler than our Metropolis dynamics and only treat the frozen phase. At present it is not at all clear how to extend these rather complex calculations to either a Metropolis dynamics or to the unfrozen phase. Before proceeding with the rest of this investigation we briefly discuss what happens when C(E,p),defined in eq 117, deviates from a Boltzmann distribution. These considerations are likely to be relevant when the protein is in, or close to, the frozen phase because in the frozen phase the dynamics are so slow that the protein molecule will explore only a few states during the observation time of a typical experiment. In particular the local minimum states are likely to be very long lived. This situation suggests that we study the effects of a non-Boltzmann C(E,p) through a simple model where most of the time is spent escaping from local minima, and escapes from states that are not local minima make a negligible contribution to the inverse moments of the escape rates. To emphasize non-Boltzmann behavior as much as possible we will take the probability of a local minimum being occupied to be independent of its energy ("white average"). As before, the inverse moments are given by
-
-
and for our new model P(R,p) is equal to the distribution of escape rate from local minima, PLM(R,p). We calculated PLM(R,P)and gave the results in eq 25-27. We also found that PLM(R,p) has three different temperature regimes, a high-temperature regime where T > Thigh(p), an intermediate regime where Thigh(p) > T > T g ( p ) , and a low-temperature, frozen regime where T g ( p ) > T , and T h i g h ( p ) and T g ( p ) are defined in eq 29 and 30. The non-Boltzmann effects are most likely to show up at low temperatures, so we will only study the low-temperature phase of our "white average" model. In the low-temperature phase the distribution of escape rates is always given by eq 26 and so the expression for the inverse moments is (47) DeDominicis, C.; Orland, H.; Lain&, F. J. Phys. Lett. 1985,46, L463. (48) Koper, G. J. M.; Hilhorst, H. J. Europhys. Lett. 1987, 3, 1213.
The Journal of Physical Chemistry, Vol. 93, No. 19, 1989 6913
Dynamics of a REM
for T < Tg(p).The expression for D(p,O) in eq 163 is unwieldy but for large N i t can be approximated by simple expressions. If we use the definition of {(x) in eq 124, the asymptotic expression of the error function of a large argument, then we find
So if we substitute eq 137 into eq 157 and notice that in the frozen phase Rslow(p)= RdVu exp[-S*(p)] and Rfast(p)= Ro then
(+
for T
=
> 2Tg(p) and
P 1 2 w P ) 2- 1% [(2*)"
For n = 1 the integral in eq 158 can be evaluated exactly to find
For n > 2 we can use the approximation technique of eq 140 to find n-l
exp[(n - l)S*(p)l
(160)
With the Boltzmann average
in the frozen phases whereas with the white average
There is a simple physical explanation for this difference in behavior. With the Boltzmann average of all the molecules have energy E = ElOw(p)in the frozen phase so they all change state with a rate R Rslow(p)= R(E,,(p),p) so the spread of inverse rates is essentially zero. With the white average the molecules have a broad distribution of energies in the frozen phase and therefore a broad distribution of escape rates, so the spread of inverse rates is huge. The practical message of this exercise is that deviations from the Boltzmann average in the frozen phase lead to spreading in the distribution of escape rates, and these deviations are likely to occur in the frozen phase. We return to the mainstream of our calculation, the estimation of the folding time scale. We have argued that a good estimate of the folding time in our model is given by
-
5=
&:
dp
sp
dp'
0
exp[PF(p)
-PW)l
D(P,O)
for 2T&) > T > Tg(p)49 Equation 166 may be simplified further by noticing that for large N,N >> log N a n d using the definition of T g ( p ) in eq 30, which suggest writing eq 166 as
for 2Tg(p) > T > T&). These approximations of eq 165 and 167 have the pleasant property of being continuous at T = 2T,(p). The expression for the diffusion constant for T > 2Tg(p) given in eq 165 is identical with the expression for the effective diffusion constant found by Zwanzig in his elegant paper on diffusion in a rough one-dimensional potential.* Similar results occur in the theory of transport in disordered semiconductor^.^^ At lower temperatures our results differ from earlier work because our system undergoes a glass transition because of the paucity of states at low energies, an effect not taken into account by the pure Gaussian models. More generally, our work studies a many-body system diffusing through its multidimensional state space, although we are keeping track of the motion only along a one-dimensional projection of this state space, whereas earlier work focused on a single particle diffusing in a potential or systems with only one degree of freedom. Now that we have expressions for F(p) and D(p,O) we will develop a simple, physically insightful procedure for estimating the mean first passage time. Our procedure is essentially a generalization of the time-honored transition-state method for calculating reaction rates.4p28-30Our derivation leans on the work of Schulten et al.27and emphasizes the generality of our methods and results. Let Do be some position-independent reference diffusion constant for our system. Our results are independent of the value of Do. We define
and &J) = F(P)
(1 14)
- 7-X(P)
so eq 114 for the mean first passage time becomes 1 5 = DO Pi dp 0 dp' exp[@(p) - PF(p')]
-spy1'
Substituting the expressions for (l/R)(p) into eq 115 for D(p,O) gives
- P , ( P ) I ~ ( P ) l I (166)
(169)
(170)
We assume F(p) has a double-well structure with the bottom of one well-at pb, close to pi, the bottom of the other well at pf, and the top of the barrier at pt (see Figure 1). Next we define
so
for T
> T&p) and
(49) The alert reader will notice that the breakdown of eq 165 when T i s close to 2Tg(p) is irrelevant to our work because this failure is due to the breakdown of the asymptotic expansions of the functions for large argument in this region. (SO) Griinewald, M.; Pohlmann, B.; Movaghar, B.; Wiirtz, D. Philos. Mag. 19 1984, 49, 34 1.
6914 The Journal of Physical Chemistry, Vol. 93, No. 19. 1989
Bryngelson and Wolynes
I
I
Finally, we estimate the folding time scale of our model by using the expressions for the mean first passage time in eq 178-183, the definition of E ( p ) in eq 168 and 169, the expression for D(p,O) in eq 164, 165, and 167, and the expression for F(p) given in eq 74. When N is large, N >> log N so if we set the position-independent reference diffusion constant Do = Ro,then for T > 2Tg(p)
X ( p ) = P2aE(p)2
= -S*(p) + tbgb) - P 1 2 w P ) 2
The main contribution to Z(p) will come from the part of the integral close to pb. For p < pb, Z(p) will be small and for p b p < ph Z(p) will be approximately equal to the value of the integral around p = pb, so we approximate z(p)by
- Pb)Wb
z(p)
To calculate the constant F(p)
wb
%b)
(173)
- Pa)’
(174)
for p near p b and approximate the integrand in eq 171 by a Gaussian centered at pb to obtain
(1 84)
Similarly we can collapse eq 178, 180, and 183 into a single expression for the folding time 1% = PFmax - PF(Pb) (185) where we are ignoring terms that are logrithmic in N again and we have defined Fmax
we expand F(p) about p b to obtain
+ YZWb’b
> 2- 7 TJP)
for Tg(p) > T
= -S*(p)
Figure 1. A schematic diagram of a double-well free energy function.
for 2Tg(P)
= max [ & J ) I
(1 86)
where pb Ip Ipf. Equation 185 is the desired simple expression for the time scale of folding. Given values for the free parameter of our model (e, Ae, L, AL, T, v, Ro)one can use eq 185, in conjunction with the relations AE(p)’ = N[Ae’( 1
- p)
+ AL’( 1 - p’)]
(3)
therefore f
(
DOWb
eXp[-PF(pb)]ShPb dp eXp[PF(p)]
(176)
We evaluate the integral in eq 176 with the same steepest descent technique we used to approximate Z(p). The maximum value of p(p) for pb Ip I pt w u r s at some value of p, which we denote by p t . There are three cases we must consider, namely pb p t < pf, p t = pb, and pt = pf. In the first case expand F(p) about pt to obtain
and eq 184 to calculate the folding time scale. We make these calculations more concrete by considering a practical example. In our previous work’ we found that a free energy function with reasonable characteristics is obtained for
$)= -0.2T ( L - $) = 2.6T
(e
for p near pt, and approximate the integrand in eq 176 to obtain
In the second case where pt = pb we may expand F(p) about p = pb to obtain
= F(pb) - E b b - Pb)
(179)
for p close to pb, and approximate the integrand in eq 176 by an exponential to find
The exponential approximation we used in deriving eq 180 is reasonable because most of the contribution to the integral in eq 176 comes from the region close to pb in this case. Substituting eq 168 and 169 in eq 180 gives
for this regime. In the third case we expand F(p) about p = pf to obtain
-
The free energy function that follows from using these parameters is shown in Figure 2 and the equilibrium phase diagram for these parameters with e = Ae = 2.18T is shown in ref 7. We have used eq 185 to calculate the folding time scale for the parameters in eq 187 and A€ = AL,and the results are shown in Figure 3. Some features of this graph are worth discussion. As noted in Figure 3, p b = 0.15, pf = 0.90 and we are taking pi Ipb. If A€ < 0.3 then p t is between p b and pf. If Ac > 0.3 then p t = pb. When Ae = 1.6 the protein molecule enters the frozen phase for p = pb This shows up in Figure 3 as a leveling off of the folding time scale. This leveling is a result of the temperature independence (and hence, A€/T independence) of the folding time in the frozen phase in our dynamic model. This transition does not occur at the equilibrium value of p, so it is not the same as the equilibrium freezing transition discussed in ref 7. The equilibrium phase transition occurs at higher values of At and AL. The parameter If we set Ro= lo9 Rohas been estimated to be 108-10’2 s-l, then with the parameters of eq 187 and Ae = AL = 0.30 a 100 amino acid protein will fold in s, which is a reasonable folding time. In this case all of the parameters, e, L, Ae, and AL,
-
s-1.51952
~~
QP)
=b
f )
+ Ef(P-Pf)
(182)
for p close to pf, and approximate the integrand in eq 176 to obtain
(51) Anfinsen, C. B.; Scheraga, H. A. Adu. Prorein Chem. 1975,29, 205. (52) Careri, G . ;Fasella, P.; Gratton, E. Annu. Reo. Biophys. Bioeng. 1979, 8, 69.
The Journal of Physical Chemistry, Vol. 93, No. 19, 1989 6915
Dynamics of a R E M -2.30
I
I
I
I
h
-2.35
-2.500
'
I
I
0.2
0.4
I
I
0.6
0.8
P Figure 2. The free energy function for our model with -0.2T and L - (AL2/2T)= 2.6T.
c
1
1 .o
- (At2/2T)=
A.vT Figure 3. The folding time for our model with L - (AL2/2T)= 2.6T, and Ae = AL.
t
- (Ae2/2T)= -0.2T,
are of order T which, is chemically reasonable. Thus, there is a sensible set of parameters that yield sensible folding time scale in our model.
IV. Conclusions Protein folding is an example of a chemical reaction that is qualitatively more complex than the reactions of small molecules. In this paper we used protein folding to illustrate a general approach to complex chemical reactions based on stochastic model Hamiltonians. We developed a simple model of protein folding and showed that it has many metastable states. The distribution of lifetimes of these states is fairly broad in the ordered phase of protein folding and considerably broader in the glassy phase. We
used a continuous time random walk (CTRW) approach to find the time scale of folding and after making a series of approximations arrived at a simple expression for the folding time. We used this expression to find the folding time for a chemically reasonable set of parameters. We also examined the pitfalls of simple-minded applications of our ideas. There are many ways to improve this treatment of protein folding. First, and probably most important, the model should be modified to account for correlations in the energies of protein states."," Second, a better treatment of dynamics near and below the glass transition temperature is needed. Third, there are many approaches to dynamics in disordered medias5 and we have used only one of them, continuous time random walk, in our calculations. The use of other approaches is likely to be interesting, particularly because the notion of folding pathways does not exist in the continuous time random walk approach. Last, the second and higher moments of the folding time are probably interesting. We conclude with a discussion of the implications of the work for the general problem of complex chemical reactions. The new feature of this work that is most general and probably most important is the use of stochastic potentials. The potential used in this paper is naive. A more sophisticated potential would include more information about the system, for example, the statistics of correlation between the energies of state^^^,^^ and the existence of several energy scales. The methods used to calculate the properties of metastable states could be used in any stochastic potential model, but our results are specific to the model of this paper. If protein folding is viewed as a unimolecular reaction, then the folding time we calculated is the inverse of the reaction rate, so the continuous time random walk method can be used to find rates. However, we must again point out that other methods can and should be used to calculate reaction rates. The generalized Fokker-Planck equation will occur whenever one takes the continuum limit of a continuous time random walk. The generalization of transition-state theory can be used whenever the free energy in a generalized Fokker-Planck equation has a well-defined double-well structure, upon which there is superimposed a ubiquity of statistically defined metastable states. Acknowledgment. We thank D. Stein for many interesting discussions. J.D.B. also thanks S. Solla, W. Nadler, and J. J. Hopfield for sharing their insights. The research described in this paper was started at the Institute for Theoretical Physics, Santa Barbara, CA, and we thank that institution for its warm hospitality. Support for this work was provided by the National Science Foundation through grants DMR-86-12860, CHE-84-18619, and PHY-82-17853, supplemented by NASA. ( 5 3 ) Derrida, B. J . Phys. Lett. 1985, 46, L401. (54) Derrida, B.; Gardner, E. J . Phys. C 1986, 19, 2253. (55). Ziman, J. M. Models of Disorder; Cambridge University Press: Cambridge, U.K., 1979.