The problem of how and why proteins adopt folded conformations

Ken A. Dill , Darwin O. V. Alonso , and Karen Hutchinson ... Phyllis Anne Kosen , Ruud M. Scheek , Hossein Naderi , Vladimir J. Basus , Sadasivam Mano...
0 downloads 0 Views 2MB Size
2452

J . Phys. Chem. 1985, 89, 2452-2459

FEATURE ARTICLE The Problem of How and Why Proteins Adopt Folded Conformations Thomas E. Creighton Medical Research Council, Laboratory of Molecular Biology, Cambridge CB2 2QH, England (Received: June 20, 1984; In Final Form: October 31, 1984)

The physical basis of the stability of the folded, three-dimensional structures of globular proteins is reviewed. At best, it may be rationalized by using an empirical approach that takes into account the unimolecular nature of the interactions in folded proteins. The need for a rigorous physical chemical explanation is emphasized. Progress has been made in elucidating the nonrandom pathways by which such folded conformations are acquired. The only pathway that has been experimentally determined is described.

Introduction An immensely important physical chemical problem is how each protein acquires and maintains its folded conformation. Proteins are linear polymers but are special in having unique covalent structures, determined principally by the genetic information, and the ability to adopt relatively fixed three-dimensional conformations. The detailed conformations of well over 100 different proteins are known from high-resolution X-ray diffraction analysis,' but their complexities and individualities have provided few clues as to how or why these conformations are attained. It might be thought that the folded conformation is simply that of 'owest free energy reached by random fluctuations of the poly$eptide chain. However, simple estimates of the number of Lonformations possible with even a small polypeptide chain of, say, only 50 amino acid residues imply that an astronomical length of time, many orders of magnitude longer than the age of the universe, would be required for all of them to be sampled randomly.2 Consequently, the folding process must be directed in some way; this raises the question of whether the folded conformation is that of lowest free energy, since the conformations accessible to a protein might be kinetically restricted. The folding problem is basically one of physical chemistry, since folding is a self-assembly process that occurs spontaneously under the appropriate condition^,^ directed by physical interactions between different parts of the protein and with the solvent. Most proteins will refold spontaneously after being unfolded, unless they have been covalently modified in a way that blocks refolding. No catalysts are required, nor have ny been detected in biological systems, other than an enzymr that catalyzes disulfide rearrangement~.~ Which folded conformation is attained is determined by the amino acid sequence of the protein. However, each amino acid sequence does not specify a unique conformation. Changes in amino acid sequence have occurred during evolutionary divergence, yet all the resulting variant proteins have very similar confor(1) (a) F. C. Bernstein, T. K. Koetzle, G . J. B. Williams, E. F. Meyer, M. D. Brice, J. R. Rodgers, 0. Kennard, T. Shimanouchi, and M. Tasumi, J . Mol. Bioi., 112, 535 (1977); (b) G. E. Schulz and R. H. Schirmer, 'Principles of Protein Structure", Springer-Verlag, New York, 1979; (c) M. G.Rossmann and P. Argos, Annu. Reu. Biochem., 50,497 (1981); (d) J. S. Richardson, Adv. Protein Chem., 34, 167 (1981); (e) T. E. Creighton, "Proteins: Structures and Molecular Properties", W. H. Freeman, New York, 1984. (2) (a) C. Levinthal, J . Chim. Phys. Phys.-Chim. Biol., 65,44 (1968); (b) D. B. Wetlaufer, Proc. Natl. Acad. Sci. U.S.A., 70, 697 (1973). (3) C. B. Anfinsen, Science, 181, 223 (1973). (4) (a) R. F. Goldberger, C. J. Epstein, and C. B. Anfinsen, J. Biol. Chem., 238,628 (1963); (b) P. Venetianer and F. B. Straub, Biochim. Biophys. Acta, 67, 166 (1963); (c) T. E. Creighton, D. A . Hillson, and R. B. Freedman, J . Mol. Biol., 142, 43 (1980); (d) N . Lambert and R. B. Freedman, Biochem.

J . , 213, 225 (1983).

0022-3654/85/2089-2452$01.50/0

mations, even when the remaining sequence similarities are minimal. No proteins with related sequences have been found to have different folded conformations. The folded conformation, much more than the amino acid sequence, has been maintained throughout evolution, presumably by natural selection. There must then be considerable redundancy in the rules relating sequence to conformation. Proteins can fold within about a minute or so of their biosynthesis as the linear polypeptide chain or after being transferred from unfolding conditions to those where the folded conformation is table.^ Consequently, random conformational fluctuations must be considerably restricted, since the time required is much less than would seem necessary for a random search. On the other hand, the time is also sufficiently long for a very large number of conformational transitions to have taken place; molecular dynamic simulations of proteins have only been feasible on a time scale of 1O-Io s . ~ Therefore, the folding process is too complex to be either experimentally elucidated or computationally simulated in complete detail. Further complexity is introduced by the need to consider the interactions .of the aqueous solvent. Nevertheless, it should be possible in principle to predict the folded conformation of a protein from just its amino acid sequence. To solve the problem of how and why proteins fold would be of practical importance for the design of novel proteins by genetic engineering or chemical synthesis; to be useful, any such protein must be able to acquire and to maintain the proper three-dimensional structure. Another practical benefit would be the possibility of deducing the biological function of a protein from its amino acid sequence or from the nucleotide sequence of its gene. With the recent revolution in genetic technology, many genes of biological importance have been isolated and sequenced, so the amino acid sequence of the protein product is known. In some cases, the biological function is unknown, but it might be deduced if the folded conformation of the protein could be predicted. In fortunate cases, the primary structure of the new gene product is found to be homologous to a protein of known structure, so it may safely be concluded to have a very similar folded conformation;' this may permit inference of its biological function. With (5) R. L. Baldwin, Annu. Rev. Biochem., 44, 453 (1975). (6) (a) J. A. McCammon and M. Karplus, Annu. Rev. Phys. Chem., 31, 29 (1980); (b) M. Karplus and J. A. McCammon, CRC Crit. Reo. Biochem., 9, 293 (1981). (7) For recent examples, see (a) B. Furie, D. H. Bing, R. J. Feldmann, D. J. Robison, J. P. Burnier, and B. C. Furie, J . Biol. Chem., 257, 3875 (1982); (b) R. K. Wierenga and W. G. J. Hol, Nature (London), 302, 842 (1983); (c) P. Argos, M. Hanei, J. M. Wilson, and W. N. Kelley, J . Biol. Chem., 258, 6450 (1983); (d) T. L. Blundell, B. L. Sibands, and L. Pearl, Nature (London), 304,273 (1983); (e) A. Laughon and M. P. Scott, Nature (London), 310.25 (1984).

63 1985 American Chemical Society

Feature Article other amino acid sequences there is little that can be done at the present time. With the ever increasing accumulation of amino acid sequences of proteins, the importance of solving the protein folding problem is bound to grow. Considerable information has been gained about how and why proteins fold, but many aspects are understood only rudimentarily and need to be given a solid basis; this article will describe the current status of the subject. The reader should not expect to learn anything new about physical chemistry, but the need for the application of this subject to the problem of protein folding should become apparent.

Protein Structure and Stability , Over 100 different protein structures have been determined to high resolution and described in detail.' Although each is unique, they share the following common properties: The overall structure is remarkbly compact; within the interior, atoms comprise on average about 75% of the volume, similar to crystals of small organic molecules.* There are few cavities, and any buried water molecules are apparently integral parts of the structure, forming hydrogen bonds with polar groups of the protein. Within the protein interior, there are only rarely charged groups or polar groups not paired in hydrogen bonds.9 These close-packing characteristics are achieved in spite of the stereochemical restrictions imposed by the covalent connectivity of the polypeptide chain, without requiring substantial unfavorable rotations about the individual bonds; most such torsion angles of both the main and side chains are close to one of the values favored in the isolated structural unit, and it has been calculated that there is, at most, about 1-2 kcal/mol of strain energy per amino acid residue in accurately determined structures.I0 Although rather irregular, the polypeptide chain pursues a moderately straight course across the entire breadth of the structure and then turns and continues in a more or less direct path to the other side, without forming a knotted topology. Very often, the internal backbone follows a regular course as part of the secondary structure elements of a-helices or @-sheets;they allow efficient pairing of the backbone polar groups within the protein interior. The reverse turns occur on the surface of the protein and are now recognized as vital structural elements;" 80% of the residues in known proteins are helices, @-strands, or reverse turns. The distribution of these regular elements varies widely in different proteins, some being comprised almost entirely of a-helices, others of P-sheets.12 Some rules governing the topology of secondary structure elements have been uncovered,I3but the diversity of protein structure at this level is more striking than its regularity. Polypeptide chains of more than 100 or 200 amino acid residues are usually folded into two or more relatively independent structural units, generally designated as domains, although a wide variety of definitions of this term are used. Very large proteins are comprised of multiple folded and aggregated domains, either part of the same or different polypeptide chains. The individual domains appear to form by folding individually and then aggregating.14 Some conformational changes probably occur in the latter step, but the primary problem of protein folding is to determine how an individual domain folds. The remainder of this article will be limited to single-domain proteins. The interior atoms of crystal structures of proteins are generally well-fixed in space, but less so than in small-molecule crystals; this is believed to be due to both crystal lattice disorder and various types of fle~ibi1ity.I~The ability of all labile hydrogen atoms (8) F. M. Richards, Annu. Reu. Biophys. Bioeng., 6, 1 5 1 (1977). (9) C. Chothia, Nafure (London), 254, 304 (1975); J. Mol. Biol., 105, 1 (1976). (10) C. Chothia, personal communication. ( 1 1 ) (a) C. M. Venkatachalam, Biopolymers, 6, 1425 (1968); (b) P. Y. Chou and G.D. Fasman, J . Mol. Biol., 115,135 (1977); (c) J. A. Smith and L. G.Pease, CRC Crit. Reu. Biochem., 8, 315 (1980). (12) M. Levitt and C. Chothia, Nature (London), 261, 552 (1976). (13) (a) J. S. Richardson, Proc. Nafl.Acad. Sci. U.S.A.,73, 2619 (1976); (b) C. Chothia, M. Levitt, and D. Richardson, Proc. Nafl. Acad. Sci. U.S.A., 74, 4130 (1977); (c) C. Chothia, Annu. Rev. Biochem., 53, 537 (1984). (14) R. Jaenicke, Biophys. Sfrucf.Mech., 8, 231 (1982).

The Journal of Physical Chemistry, Vol. 89, No. 12, 1985 2453 to exchange with those of the solvent was an early indication of protein flexibility in solution.16 This topic has recently become the subject of intense e~perimental'~ and theoretical6 study, and detailed dynamic information has been obtained by nuclear magnetic resonance18 and hydrogen exchange.lg In all the instances of flexibility studied, the protein conformation serves primarily to limit, by introducing steric barriers, the flexibility that would otherwise be present in a comparable small molecule. Only the most rapid and frequent motions in proteins have been studied thus far, but there must be a wide variety of motions with decreasing frequency and increasing amplitude. In the extreme, the relatively low stabilities of folded protein structures imply that complete unfolding must occur with a frequency of 104-10-10 s-I under conditions optimal for stability.20 This ultimate expression of flexibility occurs only very rarely, and it seems clear from the general insensitivity of protein conformation to different crystal lattices and to interactions with a variety of ligands that there are substantial energy barriers to deformations of the folded conformation. The relationship between stability and flexibility of the folded state has only recently been considered.lxs2' The unfolded state of a protein is more difficult to characterize. Under at least some denaturing conditions, an unfolded protein has the hydrodynamic properties expected of a random coil,22but it is unlikely to be completely random, because the diversity of the various groups on a protein makes it impossible for a single solvent to interact favorably with all of them. Intramolecular interactions between nearby groups favorably situated stereochemically on the unfolded polypeptide can be present,23and marginally unstable conformations probably can be stabilized by just one additional interaction. On the other hand, other than such local interactions, there is no evidence that unfolded small proteins or peptides maintain any "residual structure" or stable nonrandom conformation in aqueous solution; individual interactions between different parts of polypeptide chains are very weak. The transition between the folded and unfolded states has been using a simple electrophoretic studied e x t e n ~ i v e l y . An ~ ~ ~example ~~ technique to monitor the urea-induced unfolding of proteins is illustrated in Figure 1. The folding transition of small proteins is usually found to be two-state at equilibrium, with only the two limiting conformational states populated substantially; intermediate, partially folded states are relatively unstable thermodynamically. Evidence apparently to the contrary is not difficult to obtain, e.g. by using probes of unfolding, such as proteolytic enzymesZ6or a n t i b ~ d i e sthat , ~ ~ interact to different degrees with either the folded or unfolded states, either by binding or producing covalent changes, and thereby pull the equilibrium between them to varying extents. Unfolding induced y denaturants may also (15) (a) P. J. Artymiuk, C. C. F. Blake, D. E. P. Grace, S. J. Oatley, D. C. Phillips, and M. J. E. Sternberg, Nufure (London), 280, 563 (1979); (b) R. Huber, Trends Biochem. Sci. (Pers. E d . ) , 4, 271 (1979). (16) A. Hvidt and S. 0. Nielsen, Adu. Protein Chem., 21, 287 (1966). (17) F. R. N. Gurd and T. M. Rothgeb, Adu. Protein Chem., 33, 73 (1979). (18) G. Wagner, Q.Reu. Biophys., 16, 1 (1983). (19) (a) A. Wlodawer and L. Sjolin, Proc. Narl. Acad. Sci. U.S.A.,79, 1418 (1982); (b) A. A. Kossiakoff, Nature (London), 296, 713 (1982). (20) T. E. Creighton in 'Structural Aspects of Recognition and Assembly in Biological Macromolecules", M. Balaban, Ed., Balaban ISS,Rehovot, 1981, pp 57-73. (21) (a) G.Wagner and K. Wiithrich, Nature (London),275, 247 (1978); (b) P. L. Privalov and T. N . Tsalkova, Nature (London), 280, 693 (1979). (22) C. Tanford, Adu. Protein Chem., 23, 121 (1968). (23) (a) K. Wiithrich and A. DeMarco, Helu. Chim. Acfa, 59, 2228 (1976); (b) A. Bundi, R. H. Andreatta, and K. Wiithrich, Eur. J . Biochem., 91, 201 (1978); (c) R. Mayer, G.Lancelot, and G.Spach, Biopolymers, 18, 1293 (1979). (24) (a) P. L. Privalov and N. Khechinashvili, J . Mol. Biol., 86, 665 (1974); (b) C. N. Pace, CRC Crit.Reu. Biochem., 3, 1 (1975); (c) W. Pfeil, Mol. Cell. Biochem., 40, 3 (1981). (25) D. P. Goldenberg and T. E. Creighton, Anal. Biochem., 138, 1 (1984). (26) T. E. Creighton, J . Mol. Biol., 129, 235 (1979). (27) M. Hollecker and T. E. Creighton, Biochim. Biophys. Acta, 701, 395 (1982). (28) A. W. Burgess, L. I. Weinstein, D. Gabel, and H. A. Scheraga, Biochemistry, 14, 187 (1975). (29) L. G. Chavez, Jr., and H. A. Scheraga, Biochemistry, 16, 1849 (1977).

Creighton

2454 The Journal of Physical Chemistry. Vol. 89. No. 12, 1985 H.0-8M

Una

H,0--6M

T

'

d

GuHCI,,~*

/

t

/

+Ala./"-

0

50

IM)

150

0

50

100

150

5 u k e a e a (A'>

f

OM

Urea

w8M

R p 1. Urea-indud rwcrsiblc unfolding of horse cytochrome E demonstrated by urea-gradient electrophoresis. The electrophoretic mobility of a protein through a poly(acry1amide) gel depends upon i u conformation, with the unfolded protein generally migrating the more slowly." Ferricytochrome e was applied across the top of a slab gel of poly(acry1amide) in which there was a horizontal linear gradient of urea. so that the protein band migrated electrophoretically (pH 4.0. I 5 'C) at continuously varying urea concentrations?6 The unfolding transition is observed as an abrupt decrease in mobility. The continuous band of protein through the transition indicates that unfolding and refolding were rapid relative to the time of the electrophoretic separation (1.5 h): also. the same pattern was observed starting with unfolded protein. Conrcquently. the bandshape represents the equilibrium unfolding transition. The relative stabilities of the folded and unfolded states are known through the transition region and may be extrapolated to other urea concentrationsby assuming that the difference in free energies is linearly proportional to the urea concentration. The free energy scale for extrapolating the middle of the transition is defined by the values o f - 2 R T and + 2 R T for the mobilities of the N and U states. respctively." This method of investigating pratein unfolding transitions has the advantage of being easy. rapid. and useful with protein mixtures. of detecting heterogeneity that could otherwise produce an apparently complex unfolding transition. and of giving a graphic visual display of each unfolding transition. Reprinted with permission from ref 26. Copyright 1979. Academic Press, appear to be multistep when followed spectrally, probably due to the tendency of denaturant molecules to bind to the folded state" or to produce minor ~ t ~ c t u rperturbations. al For example, the unfolding of cytochrome E is apparently two state when measured calorimetrically" or by the change in g m omformalion by using denaturants (Figure I). but is multistate when measured The spstral 'intermediates" are probably perturbed forms of the folded state.'] The most common authentic exceptions to two-state transitions are exhibited by proteins constructed with two or more domains, which may unfold independently. On the other hand, there are reports in some proteins of a third conformational state stable under some conditions, the 'molten globule", that is reasonably compact and has secondary structure but with an extremely flexible interior;)' its structural properties remain to be determined. Extensive thermodynamic data on the folding transitions of a number of wellaracterized proteins have been obtained from the (IO) L. S. Hibbard and A. Tulinrky. Biwhmirtry. 11. 5460 (1918). []I) P. L. Prwalov. Adv. Pmrnn Chrm.. 33. 161 (1919): 35. I (1982). (12) (a) T. Y. Trong. I. Biol. Chrm.. 252.8718 (1977); (b) Y. P. Myer. L.H. McDonald. B. C. Verma. and A. Pandr. B,orhrmirtry. 19. 199 (19RO); (c) Y. Sailo and A. Wada. Bqmliv"r. 22. 2105 (1983). (31) I: Stellwagen and J Babul. Biwhrmirtry. 14, 5135 (1975). (14) (a) D. A. Dolgikh. R. I Gilmamhin. F V Brazhnikw. V. E. Bychkov8.G. V. Semsolnov. S. Yu. Vcn)ammo\.and 0 R Ptitryn. FEBSLN.. 1 3 6 . l l t ~ I 9 R l ~ , ~ bOhgurhiandA ~M W a d a . F E R S I ~ , r . 1 6 4 . 2 1(1981).

Figm 2. Relatiomhip h w m the favorable frcs energy of transfer from water to 8 M urea or 6 M guanidinium chloride (CuHCI)" and the accessible surface arca for s ~ c r a amino l acid side chains. The s l o p of the lines. 7.1 and 8.1 cal/A'. rtspctively, indicate that thae two dcnaturanu diminish the normal unfavorable hydrophobic interanion of t h a e groups with water (22 cal/A')* by about a third. The curvc~do not pass

through the origin, possibly because the accessible surface area was measured with water as the surface probc. rather than the larger dcnaturants. Reprinted with permission from ref26. Copyright 1979. Academic Press. calorimetric studies of Prival0v.l' but detailed interpretation of thew data is limited by our inwmplete understanding of aqueous solutions in general. Also. changing a single amino acid residue can produce substantial alterations of the thermodynamics of unfolding.)' without altering significantly the folded conformation.16 The physical basis of unfolding i n d u a d by denaturants. such as urea (Figure I ) or the guanidinium salts. is even less well understood, in spite of the large amount of experimental data available on a variety of related denaturants and their effats upon proteins]' and model compound^.'^^^^ At best, it may be qualitatively concluded that denaturants act by solvating more equally all portions of the unfolded polypeptide. increasing the aqueous solubility of the hydrophobic portions while maintaining the hydrogen-bonding capability of the aqueous s o l ~ e n t . They ~ ~ . ~act ~ only a t high concentrations where they comprise half the mass of the solvent.w The more popular theories envisage preferential binding of urea and guanidinium ion to the unfolded state." even though guanidinium sulfate increases. rather than decreases, the stabilities of protein^.'^ Urea and guanidinium chloride decrease the hydrophobic interaction (Figure 2). which is more than adequate to account for their efficiency as denaturants. For example, 8 M urea would be expected on the basis of Figure 2 to lower the relative stability of the folded state of bovine pancreatic trypsin inhibitor (BF'TI) by 40 kcal/mol. The observed effect is only 6.5 kcal/mol;" the discrepancy is probably due to greater binding to the folded state. Numerous other compounds. such as salts of the Hofmeister series, have substantial effects on protein stability." That they (35) (a) C. R. Matthrus, M. M.C h t i . 0. L Gepmr, G. Velictlcti. a d M. Sturtcvant. Biwhcmisrry, 19. 1290 (1980): (b) K. Yutani. N. N. Khshimshvili. E.A. Laphina. P.L. Priualw. and Y.Sugino. Inl. J. Pepr. prorein Res.. 20. I31 (1982); (e) R. Hawks. M. G. GrOtter. a d J. Sckllman.

J.

J . Mol. Bid.. 175. 195 (1984). (16) M.GrOtter. R. B. Hawk-, and B. W. Matthew. Ncrrurr ( W o n ) . 277.667 (1919). (17) (a) W. FTCil and P. L. Rivalov. Biophys. Chem.. 4,. 33 (1916); (b) V. h k m h . C . Loueheux. S.Seheufclr M. J. Gwbuwll. a d S. N.Timashell. Arch. Biwhrm. Biophy;.~ZlO.455 ( i 9 8 l ) . (38) (a) 1. C. Lee and S.N. Timaahell. Biochemistry. 13,231 (1974); (b) K. P. Prasad and J. C. Ahluwalia. Bioplymers. 19, 211 (1980). (19) M.~ Rcseman and W.P. Jench. J . Am. Chm"Soc..97.631 11975). ~ . , (40) J. Schcllman. Biooolvmrrs. 17. 1305 (19788). (41) C. Tanford. Adu.'P&irdn Chem.. 24. I (1970). (42) P. H. Von Hippel and K.-Y. Wong. I . Biol. Chem..WJ.3W9 (1965). (41) Y. Nomki and C. Tanford. J. B i d . Chcm.. 238.4074 (1963); US. 1648 (1970). (44) C. Chothia. Norum (London). 248.338 (1914). (45) T. E. Creighton. J. Mol. B i d . 144. 521 (1980). ~~~~~~~~~~

~~

~

The Journal of Physical Chemistry, Vol. 89, No. 12, 1985 2455

Feature Article and urea and guanidinium salts affect all proteins in a similar way, probably indirectly through the surface tension properties of water and the hydrophobic effect?6 is shown by a quantitative relationship between their effects on some diverse physical phenomena, e.g. on the stabilities of folded and partially folded proteins and the solubilities of proteins and small molecules.45

Accounting for Protein Stability In spite of the large amount of experimental data available, no comprehensive theory of protein structure is available. A major factor believed to stabilize the unfolded state is its greater conformational entropy, but there is not even an adequate estimate of its magnitude.47 Theoretical calculations of protein conformation and stability have generally used semiempirical energy calculations4* of the type developed with small molecules. They usually ignore the solvent and the entropy and do not take into account the vastly greater sizes of proteins and the consequent unique features that follow from having a compact structure excluding water from the interior, with a myriad of microscopic environments. Not surprisingly, such calculations do not find crystallographically observed protein structures to be those of minimum calculated "energy". Attempts to account for net stability of the folded state have generally not been and relatively few are in the literature, perhaps for that reason. The hydrophobic effect has come to be considered the only stabilizing interaction unique to the folded state, for all other interactions in folded proteins were considered also to be present, and energetically comparable, in the unfolded state, but involving the ~ o l v e n t . ~However, ' ~ ~ ~ the net hydrophobic effect was estimated to be insufficient to overcome the greater conformational entropy of the unfolded state?' even though this calculation ignored other unfavorable interactions, such as conformational strain, that will require additional stabilizing interactions. The other observation indicating that hydrophobicity is not the only stabilizing factor is that proteins generally decrease in stability with increasing t e m p e r a t ~ r e , ~ ] whereas the hydrophobic effect in model systems increases in magnitude within the same temperature range.49qs0 In other words, folded proteins have lower enthalpies than the unfolded states, whereas the opposite should be the case on the basis of just hydrophobicity. The role of electrostatic interactions is also very uncertain,s1 due to uncertainty about the effective dielectric constant in a complex, inhomogeneous structure like a protein immersed in water.s2 The classic Debye-Htickel treatment by Tanford and K i r k ~ o o dwas ~ ~unable to account satisfactorily for the ionization of individual groupss4 but subsequently has been apparently improved by empirically weighting all electrostatic interactions by the inaccessibility to solvent of the charged This empirical analysis suggested that electrostatic interactions give substantial contributions (about 10 kcal/mol) to net stability of the folded statees6 However, experimental measurements*' showed only small changes in net stability upon progressively reversing the charge on amino groups by succinylation. For example, (46) (a) w. Melander and c. Howah, Arch. Blochem, Biophys., 183,200 (1977);(b) T.Arakawa and S. N. Timasheff, Biochemistry, 24,6545 (1982). (47) (a) M. Karplus and J. N. Kushick, Macromolecules, 14,325 (1981); (bj H. Meirovitch, Macromolecules, 16, 1628 (1983). (48) (a) M. Levitt. J . Mol. Biol.. 82.393 (1974); (b) . . P. K. Warme and H. A. Scheraga, Biochemistry, 13, 757'(1974). (49) (a) W.Kauzmann, Adv. Protein Chem., 14, 1 (1959);(b) C. Tanford, J. Am. Chem. Soc., 84, 4240 (1 962). (50) (a) H. A. Scheraga, G. Nemethy, and 1. Z. Steinberg, J . Biol.Chem., 237,, 2506 (1962); (b) H. Edelhoch and J. C. Osborne, Jr., Adv. Profein Chem., 30, 183 (1976); (c) C. Tanford, "The Hydrophobic Effect", Wiley, New York, 1973. (51) (a) A. Wada, Adv. Biophys., 9, 1 (1976);(b) W. G. J. Hol, L.M. Halie, and C. Sander, Nature (London),294, 532 (1981). (52) (a) D.C. Rees, J . Mol. Biol., 141, 323 (1980); (b) A. Warshel, S. T. Russell, and A. K. Churg, Proc. Natl. Acad. Sci. U.S.A., 81,4785(1984). (53) C. Tanford and J. G. Kirkwood, J . Am. Chem. Soc., 79,5333 (1957). (54) C. Tanford and R. Roxby, Biochemistry, 11, 2192 (1972). (55) S. J. Shire, G. I. H. Hanania, and F. R. N. Gurd, Biochemistry, 13, 2961 (1974). (56) S. H. Friend and F. R. N. Gurd, Biochemistry, 18, 4612 (1979). I

converting normally basic ribonuclease A to an acidic protein by succinylating all 11 amino groups decreased its net stability by only 2-4 kcal/mol.*' In this case, all the favorable electrostatic interactions involving amino groups should be reversed in sign, so the observed decrease in stability should be double the original net favorable contribution of the electrostatic interactions of the amino groups. Only a few specific interactions between ionized groups, presumably salt bridges, were observed with other proteins to contribute substantially to their stabilities.*' On the other hand, electrostatic interactions between dipoles within the compact folded structure may well be extremely important. The usual neglect of entropy in energy calculations seems to be a fundamental shortcoming, for interactions within a protein are intramolecular, whereas those between the protein and the solvent are intermolecular. It is well established that the same interaction can have a much lower free energy when intramolecular, simply because less entropy needs to be lost than in the intermolecular case. The two types of interactions can be compared directly by using the "effective concentration" of the interacting groups in the intramolecular case, which reflects the entropic difference between the t ~ o . In ~ the ~ -extreme ~ ~ situation where the interacting groups are held in correct proximity and interact in an inflexible interaction, with no conformational strain, a maximum effective concentration of between lo8 and 10" M has been predicted on the basis of the entropy that would need The actual value for any to be lost in the intermolecular particular intramolecular interaction will be lower to the extent that there is flexibility when the interaction is both present and absent and if there are any unfavorable aspects required for the intramolecular interaction. These expectations are generally confirmed, at least qualitatively, in the many values of effective concentrations measured in small molecules.sg Surprisingly, no detailed quantitative analysis of these data appears to have been undertaken. Intramolecular effective concentrations can be much greater than the concentrations feasible in solutions because the inherent disorder of liquids prevents molecules from being held in the correct proximity and orientation for interacting. Effective concentrations may be converted to equilibrium constants for the intramolecular interactions by multiplying them by the association constant for the intermolecular interaction of the same groups.60 If the latter is measured in water, it automatically includes the competing effects of the solvent. For hydrogen bond, ionic, and hydrophobic interactions, the association constants are low,6M2in the region of 0.005-0.4 M-I, reflecting both their intrinsic weakness and the competing effects of water. Within unfolded proteins, the effective concentrations of pairs of groups are low, due to the high conformational entropy, and vary with their relative positions within the chain. They are about lo-* M when moderately ~ e p a r a t e d ~and j , ~probably ~ decrease as the 3/2 power of the number of residues between them.6s Consequently, the equilibrium constants for interaction between moderately separated groups in an otherwise unfolde' polypeptide The chain would be expected to be only 5 X 10-s-4 interactions would be present only with very low frequency, consistent with the disordered nature of most small peptides; similar values of conformational equilibrium constants have been measured immunochemically in unfolded polypeptides.66 Only

-

(57) M. I. Page and W. P. Jencks. Proc. Natl. Acad. Sci. U.S.A..68. 1678

(igisij.

(58) (a) M. I. Page,, Chem. SOC.Rev., 2, 295 (1973); (b) W . P.Jencks,

Adv. Enzymol. Relat. Areas Mol. Biol., 43,219 (1975). (59) A. J. Kirby, Adu. Phys. Org. Chem., 17,183 (1980). (60) T.E.Creighton, Biopolymers, 22, 49 (1983). (61) (a) J. A. Schellman, C. R . Trav. Lab. Carlsberg, Ser. Chim., 29,230 (1955); (b) I. M. Klotz and J. S.Franzen, J . A m . Chem. SOC.,84, 3461 (1962);(c) I. M. Klotz and S. B. Farnham, Biochemisfry,7,3878 (1968). (62) (a) A. Katchalsky, H. Eisenberg, and S. Lifson, J. Am. Chem. SOC., 73, 5889 (1951);(b) J. A. Schellman, C. R . Trao. Lab. Carlsberg, Ser. Chim., 29, 223 (1955); (c) C. Tanford, J . Am. Chem. SOC.,76, 945 (1954);(d) B. Springs and P. Haake, Bioorg. Chem., 6, 181 (1977);(e) S. D.Christian, Faraday Symp. Chem. SOC.,17, 176 (1982). (63) G. Illuminatti and L. Mandolini, Acc. Chem. Res., 14, 95 (1981). (64) T.E.Creighton and D. P. Goldenberg, J . Mol. Biol., 179,497 (1984). (65) J. A. Semlyen, Adu. Polymer Sci., 21, 41 (1976).

2456

The Journal of Physical Chemistry, Vol. 89, No. 12, 1985

when the groups are in close, stereochemically favorable positions are their effective concentrations likely to be as high as 103-104 M and the interaction be sufficiently stable to be present most of the time.23t67 A stable folded conformation is attained from an unfolded polypeptide chain with such weak interactions by using the entropic cooperativity between a large number of them. This occurs if the presence of one or more interactions increases the effective concentrations of other pairs of groups. In this way, additional stabilizing interactions are gained with less loss of conformational entropy than is required in the unfolded polypeptide chain. The culmination of this process is a fully folded state that can have very high values of effective concentrations; values between 1.9 X lo2 and 3.7 X lo5 M have been m e a ~ u r e d ~ ~(see . ~ ~below). .~* Consequently, these interactions are present essentially all the time and may have lower free energies than those between an unfolded protein and water; the latter is normally present at 55 M. Therefore, all intramolecular interactions, even hydrogen bonds and other polar interactions, can provide significant net stabilization to the folded state,60in addition to that provided by the hydrophobic effect. This conclusion contradicts the general belief that hydrogen bonds within a folded protein are comparable to those between the unfolded protein and watelA’~~~*~’ and therefore provide no net stability to the folded state; they had been considered simply to be necessary in order to avoid the energetically unfavorable burial of unpaired polar groups within the nonpolar protein interior. The present conclusion has important consequences for the thermodynamics of protein structure and probably explains why the folded state has a more negative enthalpy than the unfolded state, in spite of the opposite eff + of the hydrophobic effect. The negative enthalpy contributior , o d d arise from both the more intimate van der Waals con cts between atoms within the close-packed interior, as pointed out by Bello,69and the greater stability of the hydrogen bonds, which results in their being present virtually all of the time. In contrast, the hydrogen bonds between proteins and water,70and within water,” under normal conditions are present only part of the time and are usually strained. If the enthalpic contribution of the increased van der Waals interactions is comparable to that observed in the enthalpy of fusion in small mole~ules,6~ a net enthalpic contribution of at most only -1.4 kcal/mol for each hydrogen bond is required to account for the observed enthalpy of the folded state.Ie Privalov3’ previously concluded from his thermodynamic studies that each protein hydrogen bond probably contributes -1.5 kcal/mol to net stability of the folded conformation, without explaining how this could occur. Within the folded conformation, the high effective concentration of virtually every pair of interacting groups results from the stereochemical constraints of the neighboring groups, which tend to keep the interacting pair in proximity, even if the interaction is momentarily broken. In turn, the positions of the neighboring groups are maintained by their interactions, including those with the first pair. Consequently, the stability of each interaction depends upon the presence and stabilities of all the other interactions, producing a very cooperative fully folded structure. This energetic description of a folded protein can at least qualitatively rationalize why conformational perturbations of the fully folded state are energetically unfavorable, why fully folded conformations (66) (a) D. H. Sachs, A. N. Schechter, A. Eastlake, and C. B. Anfinsen, Proc. Natl. Acad. Sei. U.S.A., 69, 3790 (1972); (b) J. G. R. Hurrell, J. A. Smith, and S. A. Leach, Biochemistry, 16, 175 (1977). (67) H.H.Jaffe, J . Am. Chem. SOC.,79, 2373 (1957). (68) T. E. Creighton in ‘Functions of Glutathione: Biochemical, Physiological, Toxicological and Clinical Aspects“, A. Larsson, S. Orrenius, A. Holmgren, and B. Mannervik, Eds., Raven Press, New York, 1983, pp 203-21 1. (69) J. Bello, J . Theor. Biol., 68, 139 (1977); Int. J . Pept. Protein Res., 12, 38 (1978). (70) (a) K. D. Watenpaugh, T. N. Margulis, L. C. Sieker, and L. H. Jensen, J . Mol. Biol., 122, 175 (1978); (b) C. C. F. Blake, W. C. A. Pulford, and P. J. Artymigk, J. Mol. Biol., 167, 693 (1983). (71) F. H. Stillinger, Science, 209, 451 (1980).

Creighton have been so conserved during evolution, why partially folded conformations are unstable, and why altering one small part of a protein can have consequences throughout the protein structure for those interactions where proximity and orientation are important, such as hydrogen bonds, but not for others, such as van der Waals Furthermore, the large, but approximately compensating, changes in both enthalpy and entropy of a protein caused by alteration of its covalent structure3s can be imagined to be the result of changes in the flexibility of the folded state. The dependence of the stability of an intramolecular interaction on the conformational flexibility would suggest that a favorable increase in conformational entropy would be accompanied by an unfavorable increase in enthalpy, and vice versa. Such alterations in thermodynamic functions may be undetectable crystallographically in the average structure36 but should be most apparent in the flexibility measured by rates of exchange with the solvent of hydrogen atoms within the folded structure, especially with a closely related series of proteins, such as mutationally35 or chemically’* altered forms of a single protein. The thermodynamics of protein folding are usually complicated by that of hydrophobic effect,50 but Privalov and Tsalkovazibnoted the expected inverse correlation between the flexibility of the folded state and the density of intramolecular contacts, by comparing a number of diverse proteins. This empirical description of protein stability is clearly not rigorous and not entirely satisfactory; indeed, it has been described72as “practising statistical mechanics without a licence”. However, it was devised only because no alternative was available, and it provides a useful framework within which to think about protein stability and flexibility. Its presentation here should serve to illustrate the need for a more rigorous physical chemical description of protein structure.

Defining a Protein-FoldingPathway Due to the large number of conformations possible with a protein and our poor understanding of the basis of protein structure, it is not surprising that no theoretical analysis has been able to predict how an unfolded polypeptide chain folds to its particular folded conformation. The preceding analysis in terms of effective concentrations would predict that the most favorable pathways will be those that maximize the entropic cooperativity between the favorable interactions, while minimizing any energetically unfavorable situations, but it is not immediately obvious what conformations would have these properties. Experimental studies have been hampered by the cooperativity of the folding process, in that partially folded intermediate states are unstable relative to the fully folded and fully unfolded states. Some intermediates might accumulate transiently to detectable levels kinetically, but the kinetics of protein refolding are complicated by the heterogeneity of the unfolded ~ t a t e , ~ in . ’ ~which different fractions of molecules refold at different rates. The major cause of the slow- and fast-refolding forms of unfolded proteins appears to be the existence in the unfolded state of both cis and trans isomers of the peptide bonds preceding proline residues, which are only slowly inter~onverted,’~ whereas the folded state generally has each such bond either cis or trans. Kinetic studies of folding at low temperatures using rapid electrophoresis in urea-gradient gels (see Figure 1)75 observed a universal tendency for transiently unfolded proteins under refolding conditions to equilibrate rapidly with compact, but not fully folded, conformations. In contrast, residual folded proteins under unfolding conditions showed no detectable tendency to unfold partially prior to complete unfolding. There are many other indications of partly folded intermediates in refolding but relatively (72) Personal communication from anonymous referee. (73) (a) J.-R. Garel and R. L. Baldwin, Proc. Natl. Acad. Sci. U.S.A.,70, 3347 (1973); (b) J.-R. Garel, B. T. Nall, and R. L. Baldwin, Proc. Natl. Acad. Sci. U.S.A.,73, 1853 (1976). (74) (a) J. F. Brandts, H.R. Halvorson, and M. Brennan, Biochemistry, 14, 4953 (1975); (b) L.-N. Lin and J. F. Brandts, Biochemisrry, 22, 559 (1983). (75) T. E. Creighton, J. Mol. Biol., 137, 61 (1980).

The Journal of Physical Chemistry, Vol. 89, No. 12, 1985 2457

Feature Article

intermediates have been determined by chemical The kinetic roles of the intermediates may be elucidated from their kinetics of accumulation under different redox conditions. This is most readily accomplished by using thiol-disulfide exchange with suitable reagents. With an "intermolecular" disulfide reagent, RSSR, the two steps in forming a single protein disulfide are

&

+RSSR

e

gSR fRSH

(1 1

0

// / / / Elution

/

/

b

4

Figure 3. Chromatographic analysis of the kinetics of disulfide bond formation in reduced BPTI. Reduced BPTI was incubated at 25 OC with 0.15 mM glutathione disulfide. At the indicated times, the species present were trapped with 0.1 M iodoacetate and separated by ion-exchange chromatography. Fully reduced BPTI, R, elutes first, due to the six negatively charged carboxymethyl groups. It is followed by the one-disulfide intermediates, then the two-disulfide intermediates, and finally the refolded BF'TI, N. Peak N contains nativelike BPTI, with the three disulfides linking Cys 30 to 51,5 to 55, and 14 to 38, plus nativelike molecules lacking the 30 to 51 disulfide and with the two thiol groups buried and unreactive. The other intermediates are identified by the residue numbers of the Cys residues paired in disulfides. Only minor

quantities of mixed-disulfide protein species accumulate under these conditions. Reprinted with permission from ref 79. Copyright 1984, Academic Press.

The first step is simply the chemical reaction between a protein thiol and the reagent, but gives information about the environment of the protein thiol, which depends upon the protein conformation. The second is that of interest for protein folding, for it is the one in which a second Cys thiol group comes into proximity to displace the mixed disulfide and to form a protein disulfide. The mixed disulfide accumulates only if no second Cys residue can react with it at a sufficiently rapid rate, and so is diagnostic of slow steps in folding and disulfide formation. This reason for slow disulfide formation is readily distinguishable from that due to inaccessibility of the protein thiols, where the first reaction is slow and no mixed disulfide accumulates.80 When the second step is much more rapid than the first, the rate of disulfide formation is determined by the first step and no mixed disulfide accumulates. These kinetics are not very informative but are useful to ensure that all intermediates with different numbers of disulfides accumulate, since the rates of forming successive disulfides are comparable. Estimates of the rates of such fast intramolecular steps may be obtained from the rate of reduction by the thiol reagent, since the formation of the mixed disulfide is unfavorable and rapidly reversed.64 Large values of kintramay also be measured from the rate of disulfide formation by using a cyclic disulfide reagent, such as oxidized dithiothreitol, DTT;. It only transiently forms a very unstable mixed disulfide, since the reverse reaction is intramolecular and very fast:

few of intermediates in unfolding.76 This was interpreted to indicate that folding generally occurs by rapid equilibration between a few compact conformations that are energetically favored under folding conditions, with the rate-limiting transition occurring in a very compact conformation that would probably involve most of the polypeptide chain. The same transition state would limit unfolding under the same conditions and would probably be a distorted, high-energy form of the fully folded conformation.20 To elucidate the pathway of folding, in terms of the intermediate conformations, requires that the intermediates be accumulated to detectable levels, trapped, characterized, and their kinetic roles determined. This is relatively straightforward when the disulfide interaction between cysteine (Cys) residues is used, as its redox nature means that it may be manipulated by the concentrations of appropriate electron donors and acceptors in the solution. With a protein that requires disulfide bonds for stability of its folded conformation, such as BPT17' or bovine ribonuclease A,78 the folding transition may be manipulated simply by altering the disulfide bond stability, via the redox potential, to favor either the unfolded (thiol) or folded (disulfide) states, with no requirement for other denaturants of uncertain role. Moreover, any disulfides present at any instant of time may be trapped simply by blocking rapidly all the thiol groups, and thereafter avoiding any conditions that might catalyze disulfide interchange. Protein molecules with different disulfides and blocked thiols are then distinguishable and may be separated, especially if an appropriate blocking reagent is used. Figure 3 shows the ion-exchange chromatography patterns of species trapped with iodoacetate during refolding of reduced BPTI, where the separation is primarily on the basis of the number and disposition of the negatively charged blocking groups. The identities of the purified trapped

Consequently, the observed rate of disulfide formation with this reagent is determined by the product of the equilibrium constant for the first step, KDn, times the unimolecular rate constant for the second. Using the measured values for K,, of approximately 4 X IO4 M-' at pH 781and 3 X M-' at pH 8.7,64*79we may obtain the value of kintra.Using both intra- and intermolecular types of reagents, we may measure rate constants for the intramolecular step in the range from lo5 s-' to as slow as the investigator's patience permits, and the values measured in both ways for the BPTI-folding pathway are remarkably consistent.64 The pathway determined for unfolding and refolding of BPTI is illustrated in Figure 4; the fully folded protein has three disulfides linking Cys residues 5 to 55, 14 to 38, and 30 to 51. Formation of the first disulfide in reduced BPTI is essentially random, as expected from its very unfolded conformation. All six Cys thiols are equally as reactive as model thiols,82 but no mixed-disulfide form of the reduced protein accumulates, indi-

(76) P. S. Kim and R. L. Baldwin, Annu. Rev.Biochem., 51,549 (1982). (77) T. E. Creighton, J. Mol. Biol.,87, 563, 603 (1974); 95, 167 (1975); 113, 275 (1977); Prog. Biophys. Mol. Biol.,33, 231 (1978). (78) (a) C. B. Anfi;lsen,-Haroey Lett., 61,95 (1967); (b) T. E. Creighton, J . Mol. Biol.,113, 329 (1977); 129, 411 (1979).

(79) T. E. Creighton, Methods Enzymol., 107, 305 (1984); in press. (80) T. E. Creighton, J . Mol. Biol., 151, 211 (1981). (81) (a) W. W. Cleland, Biochemisfry, 3,480 (1964); (b) R. P. Szajewski and G. M. Whitesides, J . A m . Chem. SOC.,102, 2011 (1980). (82) T. E. Creighton, J . Mol. Biol.,96, 777 (1975).

2458

The Journal of Physical Chemistry, Vol. 89, No. 12, 1985 / R

-&-&

Creighton TABLE I: Correlation between Stability of a Disulfide and Its Contribution to the Conformational Stability of BPTI" stability in temp, O C at which folded conformation folded conformation (effective concentration without disulfide disulfide of thiols, M) disappears 5-55 3.1 x 105 -38 30-5 1 1.4x 103 -55 14-38 1.9 X lo2 14

Data from ref 64 and 94.

(,Zi) N

( 3 o S H 51SH)

Figure 4. Pathway for unfolding and refolding of BPTI accompanying

disulfide bond breakage and formation, respectively. The polypeptide backbone is designated as a solid line, with the positions of the six Cys residues indicated. R is the fully reduced protein and N the fully folded protein with three disulfides. The numbers of the Cys residues paired in disulfides are indicated under each; intermediates that adopt nativelike conformations are designed N, with the state of the free Cys residues following. The bracket encloses the one-disulfide intermediates that are in equilibrium by rapid intramolecular disulfide interchange, with their relative levels given. The *+" sign between (30-51,5-14) and (30-51,538) signifies that they both play this kinetic role. Reprinted with permission from ref 64. Copyright 1984,Academic Press. cating that all Cys residues may readily participate in disulfide formation. The half-time for the sum of the intramolecular steps is about 80 ms. Only two one-disulfide intermediates accumulate to high levels, because the initial disulfides formed are rapidly rearranged intramolecularly and these two are favored energetically over the others. The major intermediate, (30-51), has a nativelike disulfide, but the other, (5-30), does not. Intermediates are designated by the numbers of the Cys residues linked in disulfide bonds. Three different second disulfides are formed readily in (30-51); one, 14-38, is nativelike, but none of the resulting two-disulfide intermediates can readily form third disulfides. Instead, they rearrange intramolecularly to intermediate (30-5 1,5-55), which adopts a stable nativelike conformation and hence is also designated as N(14SH,38SH). This intermediate is not formed directly from (30-51) by incorporating the 5-55 disulfide. Other rearrangements occur to produce intermediate (555,14-38), or N(30SH,51SH), which also has a stable nativelike c o n f ~ r m a t i o n . ~In~ this case, the Cys 30 and 51 thiols are inaccessible and consequently cannot form the third disulfide. This species also arises from formation of a second disulfide in a very minor one-disulfide intermediate, probably (5-55) or (5-5 1 p 4 The intermediate (30-5 1,5-55) has the two remaining thiol groups on the surface of the protein and in reasonable proximity, so it readily forms the third disulfide; the intramolecular step has a half-time of about 400 ps. The pathway of unfolding upon simply breaking the three disulfides is the reverse of the above pathway. The accessible and least stable disulfide, 14-38, is broken first. Disulfide rearrangements are the energetically most favorable means of unfolding the protein and of reducing the remaining two disulfides. These rearrangements are slow and extreme examples of protein flexibility, as they involve interchange of disulfides between Cys residues a t almost opposite ends of the pear-shaped folded conformation.84 Because the values of all the rate constants, both forward and reverse, are known, the free energies of all the intermediates and transition states are known for all redox conditions, i.e. all ratios of disulfide to thiol reagent concentration^.^^**^ The energetics (83) D. J. States, C. M. Dobson, M. Karplus, and T. E. Creighton, J . Mol. Biol., 174,411 (1984). (84)J. Deisenhofer and W. Steigemann, Acta Crystallogr., Sect. B: Struct. Crystallogr. Cryst. Chem., B31,238 (1975).

of the folding transition demonstrate that it is cooperative, in that all the intermediates have higher free energies than either the fully reduced or fully folded forms, and this has been confirmed experimentally. Moreover, the rate-limiting transition separates the folded forms from all the rest, consistent with the general conclusion that the free energy barrier is a high-energy distorted form of the folded conformatiomm Consequently, the folding transition elucidated with disulfide bonds has all the known properties of protein-folding transitions not involving disulfides. These energetic considerations can account for the conformational restrictions on disulfide formation, i.e., the difficulty of forming the 5-55 disulfide bond and the folded conformation in intermediates (30-51) and (30-51,14-38); the energies of these transition states are high because they are distorted versions of the native conformation. The surprising observation that disulfide rearrangements are the energetically most favorable means of getting into and out of the folded conformation suggests that this is a rather specific conformational transition. These inferences are supported by the folding pathways of two proteins from black mamba venom that are homologous to BPTI and almost certainly have the same folded conformation.86 They fold by similar pathways and similarly have the highest free energy barrier separating the folded forms from all the others. This suggests that folding pathways have been conserved through evolution, although the relative rates of the various steps may be altered. In particular, their disulfide rearrangement pathways are not the most favorable; instead, it is the sequential formation of the 30-51, 5-55, and 14-38 disulfides, with the second being formed slowest. Presumably, this difference in most favorable pathway is a result of the lower net stabilities (by 5.5 and 2.6 kcal/mol) of the folded black mamba proteins, which makes their folded conformations easier to distort than with BPTI. It is then possible that the crucial role of rearrangements in the BPTI pathway may be atypical, in that this protein is exceptionally stable. On the other hand, the rate-limiting step with the less stable proteins is still a distorted form of the folded conformation and may involve conformational rearrangements that are not detected by the cysteine residues. The equilibrium constant for forming a protein disulfide with an intermolecular disulfide reagent (reaction 1) gives the relative free energies of the two disulfides and the effective concentration of the protein thiols. In reduced BPTI, the effective concentrations of the various pairs are of the order of 10-1-10-2 M, due to its unfolded conformation. The values for the various pairs increase and decrease as folding progresses, reflecting the acquisition of nonrandom conformation. In the fully folded conformation, the values for the correct pairs are between lo2 and los M (Table I). The most stable disulfide, between Cys residues with the highest effective concentration, would be expected to provide the greatest contribution to the stability of the folded conformation. This is confirmed by the different stabilities to thermal unfolding of the three proteins each lacking one of the three disulfide bonds (Table I), although this comparison is complicated by the different disulfide bonds in the three unfolded proteins. That the pathways of disulfide formation and breakage accurately reflect the conformational properties of the protein is shown by the effects on the pathway of altering the folding conditions45J" (85) T.E.Creighton, J . Mol. Biol., 113,295 (1977). (86) M.Hollecker and T. E. Creighton, J . Mol. Biol., 168,409 (1983). (87) T.E.Creighton, J . Mol. Biol., 113, 313 (1977).

Feature Article and the covalent structure of the proteinegg A novel modification of BPTI is to introduce a peptide bond between the amino and carboxyl termini, to produce a protein with a circular backboneeg9 The circular protein unfolds and refolds by a slightly altered version of the normal pathway, and the rates of the various steps are different.90 The circular polypeptide chain may be cleaved at other peptide bonds to generate linear molecules with amino acid sequences that are circular permutations of that of BPTLg9 One such permuted variant has been shown to refold readily, but the pathway is not yet known. These results demonstrate that the ends of the polypeptide chain have no crucial role in folding, nor does the distribution of Cys residues along the chain. Further information about the conformational forces that guide disulfide formation should be available from the conformations of the trapped intermediates. Although only the disulfide interaction has been trapped, it is almost a thermodynamic requirement that the presence of the disulfide stabilize to the same extent the conformation that stabilized it; Le., the two should be linked functions. H y d r ~ d y n a m i c i, m ~ ~m u n o ~ h e m i c a l ,absor~~ bance:2 circular d i ~ h r o i s mand , ~ ~nuclear magnetic resonance94 studies all indicate the presence of nonrandom conformation in most of the trapped intermediates, but there is little detailed conformational information available yet. The limited number of experimental techniques that give information about the conformations of flexible, partially folded proteins is a major difficulty. Similar studies with ribonuclease, which has four disulfides and over twice the number of amino acid residues as BPTI, have demonstrated a much more random pathway of disulfide bond formation, with very many intermediate^.^^^ Moreover, the trapped intermediates have no detectable nonrandom conforma(88) (89) (90) (91) (1978). (92) (1980). (93) (1981); (94)

T. E. Creighton and D. F. Dyckes, J . Mol. Biol., 146, 375 (1981). D. P. Goldenberg and T. E. Creighton, J. Mol. Biol., 165,407 (1983). D. P. Goldenberg and T. E. Creighton, J . Mol. Biol., 179, 527 (1984). T. E. Creighton, E. Kalef, and R. Arnon, J . Mol. Biol., 123, 129 P. Kosen, T. E. Creighton, and E. R. Blout, Biochemistry, 19, 4936

P. Kosen, T. E. Creighton, and E. R. Blout, Biochemistry, 20, 5744 22, 2433 (1983).

D. J. States, Ph.D. Thesis, Harvard University, 1983.

The Journal of Physical Chemistry, Vol. 89, No. 12. 1985 2459 tion,” except for one with three native disulfide bonds96 and a very nativelike c o n f ~ r m a t i o n .Otherwise, ~~ the energetics of disulfide bond breakage and formation are similar to those of BPTI, in that the rate-limiting barrier separates the folded state from all the other intermediates. It has been shown directly that no pathways avoid this barrier,78bpresumably a distorted form of the folded conformation. Additional rate-limiting steps have been claimed to be significant with ribon~clease,~’ but this conclusion was based on the assumption that each collection of intermediate species with the same number of disulfide bonds, but differing in their pairings and probably also in other aspects, behaved as a single, homogeneous kinetic species. These species have been shown directly not to behave kinetically as a single species, forming disulfides with a wide range of rates.78b The danger in assuming the kinetic homogeneity of even unfolded ribonuclease with a single set of disulfide bonds has been amply demonstrated by others.5*73,74 Protein Folding in Vivo Although protein folding is a problem of physical chemistry, when intact proteins are used in the laboratory, it is also a biological problem when it occurs during or after biosynthesis of the polypeptide chain inside a cell. Very little is known about this process, although a start has been made.98 The use of disulfides described above should also be applicable to elucidation of biosynthetic folding, if a suitable biosynthetic system is available. To that end, a gene coding for BPTI has recently been cloned99 and expressed in bacteria, and it is hoped to use it to direct the biosynthesis of this protein for such in vivo folding experiments. It should also be useful for genetically producing mutant forms of the protein for further folding and structural studies. (95) A. Gai-at, T. E. Creighton, R. C. Lord, and E. R. Blout, Biochemistry, 20, 594 (1981). (96) T . E. Creighton, FEBS Lett., 118, 283 (1980). (97) Y . Konishi, T. Ooi, and H. A. Scheraga, Biochemistry, 21,4734,4741 (1982). (98) (a) L. W. Bergmann and W. M. Kuehl, J . Biol. Chem., 254, 5690 (1979); (b) G. Scheele and R. Jacoby, J . Biol. Chem., 257, 12277 (1982); (c) D. P. Goldenberg and J. King, Proc. Natl. Acad. Sci. U.S.A.,79, 3403 (1982). (99) S. Anderson and I. B. Kingston, Proc. Natl. Acad. Sci. U.S.A., 80, 6868 (1983).