3987
J. Phys. Chem. 1992, 96, 3987-3994
Helix-Coil Theories: A Comparative Study for Finite Length Polypeptides Hong Qian and John A. Schellman* Department of Chemistry and Institute of Molecular Biology, University of Oregon, Eugene, Oregon 97403 (Received: September 27, 1991)
At present, numerous polypeptides of known sequence and length are being synthesized by chemical or biosynthetic means. The observation of the helixsoil transition is one of the important methods of studying the factors which stabilize a-helices of short to moderate length in these molecules. As a result, the observation and interpretation of helix-coil transitions is likely to return to its prominence of 25 years ago, but with increased power because of the precise sequence information and the ability to observe the state of individual amino acid residues or peptide groups by nuclear magnetic resonance spectroscopy and hydrogen-exchange methods. In this paper we develop some of the details of the helix-coil model, which are likely to be useful in the interpretation of these more incisive experiments.
1. Introduction It is a pleasure to be a part of this celebration for Marshall Fixman. He has made fundamental contributions in so many areas that it is not surprising that we have encountered and made use of his work in a number of quite diverse fields of research. One of these is the theory of helix-coil transitions, where he made a number of important theoretical contributions and also some practical ones in the case of long, sequenced In the late 1950s and early 1960s the theory of the helix-coil transition was developed into one of most elegant and complete areas of macromolecular science. Though there have been many practical applications of the theory, and experiments designed to determine the values of the helix-coil parameters, the theory itself remains in essentially the same form it acquired 25 years ago. In recent years a considerable number of polypeptides of moderate length, excised from proteins or synthesized, have been found to form a-helices in aqueous solution. These systems are now serving as a proving ground for testing the validity of extant helix-coil theories and provide a new method of determining the aqueous nucleation and growth parameters for the various amino acids. Special interactions, in addition to those considered in the models for helix-coil theories, have already been revealed such as salt bridges and electrostatic effects arising at the ends of the strongly polar a-helix. Fundamental questions concerning the origins of helical stability in water still exist, however, such as the contributions of the hydrogen bond and the hydrophobic interaction. This paper is a reexamination of some of the features of helix-coil theory that have been highlighted by these experimental studies. The emphasis will be on the physical aspects of the model rather than on the mathematical manipulations which are used to evaluate partition functions and obtain averages, since the latter are well-established and model-independent. Section 2 will examine the physical assumptions behind the original helix-coil models, sections 3-5 will discuss the Lifson-Roig and ZimmBragg theories and their interrelations, section 6 will compare a variety of theories which are in current use, and section 7 will reexamine the physical postulates of helix-coil theory from the contemporary point of view. 2. The a-HelixXoil Model
The standard helix-coil model has been thoroughly discussed in the literatureS and is reviewed here because the assumptions of the model are being investigated. It will be useful to recall the hydrogen-bonding pattern of the a-helix, and this is shown in Figure 1A. Tbe Polypeptide Model. In discussing the helix-coil transition, it is most useful to describe peptide conformations in terms of bond Fixman, M.; Zeroka, D. J . Chem. Phys. 1968, 48, 5223. Ebhinger, B. E.;Fixman, M. Biopolymers 1970, 9, 205. Fixman, M. Biopolymers 1975, 14, 271. Fixman, M.;Freire, J. J. Biopolymers 1977, 16, 2693. Poland, D.; Scheraga, H. A. Theory of Helix-Coil Transitions; Academic Press: New York, 1970. (1) (2) (3) (4) (5)
lengths, bond angles, and torsional angles rather than atomic coordinates. Conformational properties vary most strongly with torsional angles, and they are used to classify local conformations a, 8, etc. The conformation of the polypeptide backbone (secondary structure) is defined by the three main-chain torsional angles 4, #, and w . ~Except for prolines, w may be confined to the region near 180' which is appropriate for trans peptides. With these conventions the conformation of a polypeptide is specified by a sequential list of values for (4&) for all residues i. Early studies with molecular models indicated that the polypeptide chain can be represented as "isolated" pairs of conformational angles with 4, and +istrongly interdependent but having a weak or negligible dependence on the conformational angles of neighboring residues.' (The x-proline sequence is an exception!) This assumed separation into independent residue conformations is the basis of the virtual-bond model for the polypeptide chain9 and for the success of two-dimensional energy maps in discussing the stability of peptide and protein conformations. Arguments based on molecular models and energy considerations ignore the contribution of solvent molecules. Conformational probabilities should be deduced from free energy maps (Le., the potential of average force or of average torque when the coordinates are angularlo) rather than steric or energy considerations. Though solvent molecules may seriously perturb the local conformational probabilities of polypeptide chains, it is nevertheless assumed in helix-coil theories that this independence of residues persists for the random coil on a free energy basis as well as an energy basis. On the other hand, when helices or other secondary structures are formed, longer-range interactions are introduced, leading to an interdependence of local conformations, i.e., to cooperativity. The Helix Model. The thermodynamics of helix formation is based on the geometry of the a-helix in which the carbonyl of peptide group i is hydrogen-bonded to the NH of peptide group i 3. Consequently, the formation of a helical hydrogen bond requires the spatial fixing of four consecutive peptide units or, expressed alternatively, the fixing of the 4 and # angles of the three intervening residues. The addition of a peptide unit to an already existing helix involves a free energy, AG,,which is considered to consist of the free energy associated with H-bond formation, the negative entropy of fixing the main-chain conformation, and the change in free energy of the side chains. The latter include interactions with the helix and losses in conformational freedom. Side chainside chain interactions are ignored and must be superimposed on the theory. AG, is associated with an equilibrium constant, s, called the helix-propagation constant. AG, and s depend on the nature of the side chain. The formation
+
~
~~~
(6) W a l l , J. T.; Flory, P.J.; Kendrew, J. C.; Liquori, A. M.; Nementhy, G.; Ramachandran, G. N.; Scheraga, H. A. J. Mol. Biol. 1966, 15, 399. (7) Brant, D. A.; Flory, P. J. J . Am. Chem. Soc. 1965,87, 2791. (8) Schimmel, P. R.; Flory, P. J. J. Mol. Biol. 1968, 34, 105. (9) Flory, P. J. Statistical Mechanics of Chain Molecules; John Wiley & Sons: New York, 1969. (10) Lifson, S.; Oppenheim, I. J . Chem. Phys. 1960, 33, 109.
0022-365419212096-3987%03.00/0 0 1992 American Chemical Society
3988 The Journal of Physical Chemistry, Vol. 96, No. 10, 1992
PEPTIDE UNITS RESIDUES
I
I
I
ICOH
Qian and Schellman ~
~~~~
I
I
co
w
H N ~ C H O N ~ C H O N ~ C O H N ~ C H O
@
f&J
Q 7 J
A
ZB WEIGHTS ZB CODE
LR CODE LR WEIGHTS
-
z
z
z
z
as
s
S
z
z
0
0
-0
-0
-0
1
1
1
0
0
CUh
C
h
h
h
h
h
c
cUh
z
U'
v'
W1
W1
W1
v'
U'
z
B Figure 1. (A) The hydrogen bond pattern of the a-helix. A sequence of 10 peptide units or nine amino acid residues contains a small helical segment, Peptide units are drawn as rectangles, and residues, centered on the a-carbon atom, are drawn as circles. Note that there are six peptide units or five amino acid residues immobilized in the helix, which contains three hydrogen bonds. (B) Coding and weighting helices in the LR and ZB theories. The helix is the same as in (A). The two lines below the drawing are the coding and weighting of the LR theory. The two lines above are the coding and weighting of the ZB theory. The weight assignments are discussed in the text.
of the first H-bond is unique, since four peptide groups or three residue conformations must be fixed. AG for the first H-bond is written as AG,,, + AG,, where AG,,, is the excess free energy associated with fixing the first two residues without the compensation of H-bond formation. The free energy of a helix with nh H-bonds is given by AG = AG,,, + nhAG, AG,,, + (n, 2)AG, where n, is the number of consecutive helical residues, i.e., the number of residues in the helical conformation. The equilibrium constant for the formation of the helix is then K = us".-* where RT In u = -A&,,. AG,, is often considered to be purely entropic so that u is temperature independent. In summary, the helix-coil model divides helix formation into a nucleation parameter and a growth parameter. The nucleation parameter is responsible for the cooperativity and is usually considered independent of temperature and often also of the nature of the amino acids involved. The growth parameter includes the 1-4 H-bond interaction between peptide groups, the entropy loss which results when the residue is fixed in the helix and side chain-helix interactions. This formulation, which was based on the importance of the peptide H-bond, continues to be used even by authors who think that the H-bond does not contribute to the stability. The above is an essentially thermodynamic formulation of the statistical weight of a helix. In general, there are two types of theories of the helix-oil transition: (1) Theories which count hydrogen These are essentially thermodynamic theories; the nature of the interactions leading to nucleation and growth are undisclosed. (2) Statistical mechanical theories based on residues and conformational integrals.l03l5 These constitute two different, valid approaches to the helix-coil problem. We have found that the relationship between the two approaches is in need of clarification, and this constitutes one of the main objectives of this work. 3. The Lifson-Roig Theory We find it useful to set up the theory in terms of the LifsonRoigI6 (LR) molecular model in which the parameters are expressed as integrals over molecular phase space. This is a suf( 1 1) Schellman, J. A. C. R. Trao. Lub. Carlsberg, Ser. Chim. 1955,29, 230. ( I 2) Peller, L. J. Phys. Chem. 1959, 63, 1 194. (13) Gibbs, J. H.; DiMarzio, E. A. J. Chem. Phys. 1959, 30, 271. (14) Zimm, B. H.; Bragg, J. K. J . Chem. Phys. 1959, 31, 526. (15) Nagai. K. J. P h y i S o c . Jpn. 1960, 15,407. (16) Lifson, S.; Roig, A. J. Chem. Phys. 1961, 34, 1963. ~~
~
ficiently general approach that may be used to examine the physical assumptions and mathematical procedures of other models. In the LR model the conformational unit of the polypeptide chain is the residue, which centers about the a-carbon atom of the amino acid. The residue conformation specifies the 9 and I// angles associated with C, and the conformational angles of the side chain, though the latter are usually ignored. In the LR theory the length of the polypeptide chain is the number (N,) of C, atoms which are flanked by peptide units on both sides and therefore depends on the state of the amino and terminal ends of the chain. The number of residues in an a-helix (n,) is the number of consecutive residues with the a-helical conformation. Other chain units have been used: N,, the number of amino acids in the chain; Np, the number of peptide units in the chain; Nh, the maximum possible number of hydrogen bonds in a helix. In counting the number of units in a chain N p- 1 = N, = Nh 2. Small discrepancies in published formulas usually turn out to arise from different definitions of the number of units. The LR theory classifies the states of a residue by its location in (4,I//)space. If a residue lies in the narrow area of ($,I//) which is concordant with a-helix formation, it is considered to be a helical residue and given the symbol h. Note that h refers to the helical conformation and does not mean that it is part of a helix. The remainder of the conformational space is classified as nonhelical, and residues in this region are symbolized by c. With these rules the instantaneous conformation of a polypeptide chain can be specified as a sequence such as cchchccchhccchc, which specifies the state of each residue. The sequence just written would indicate a random coil sequence, since it does not contain the three or more consecutive helical states which are required for helix formation. In accordance with the "isolated pair" model discussed in the previous section, the statistical weight of residue i, when it is in the coil state, is defined as the integral of the Boltzmann factor over all nonhelical states
+
Here Wl(i) is the free energy (or potential of average torquei0) of the conformation (qji&); i.e. ,-WI(i)/kT = (e-Ut(i)/kT) where VI is the local residue energy (see above) and the average is over side chain conformations and solvent coordinates. Both u{ and Wi(i) depend on the amino acid in the ith position since the nature of the side chain plays a role, especially for proline and glycine.
The Journal of Physical Chemistry, Vol. 96, No. 10, 1992 3989
Helix-Coil Theories The statistical weight of a residue in the helical conformation, but not necessarily in a helix, is defined as the integral over the small helical region of conformational space
The sum of u: and u:. is the conformational integral over all (q$&): zi = U f i U f i (3)
to be independent of temperature, which implies that nucleation occurs without a significant change in enthalpy or heat capacity. These rules permit the assigning of a statistical weight for the central element of all eight triplets and from these the statistical weight matrix is set up as the weights of the triplets regarded as sequential, overlapping pairs Ah
+
For simplicity the subscript i, which is needed whenever the peptide contains more than one type of amino acid (heteropolymer), will be dropped in most of the formulas which follow. In a homopolymer the probability of the helix conformation is given by v’/z, and of the nonhelical conformation by u’/z, and the ratio of the helix to the coil conformation is v’/u’. The LR theory uses the coil state as the reference state and defines v = u‘/u‘ as the statistical weight of the helical state in the absence of a helix and 1 = u’/u’ as the statistical weight of the coil state. Since the residue conformations are considered independent, it might be assumed that the conformational integral of the random coil can be represented as the product of residue conformational integrals N
z = rfz, I
(4)
This, however, is only an approximation. The reason is that the product in eq 4 contains all possible conformations of the chain, including sequences of three or more residues which are in the helical conformation and therefore are parts of hydrogen-bonded helices. Equation 4 needs to be corrected by anticorrelation effects which prevent strings of residues in the helical conformation from appearing in random coil sequences. This correction is intrinsic to the LR theory and will be discussed below. The conformational independence of residues disappears in the a-helix, since the formation of each H-bond restricts three consecutive residues to the helical conformation, thereby leading to mperativity. A third statistical weight, w’, is defined to describe a residue which is both helical and hydrogen-bonded. This statistical weight is given by
+
where W = W , ( i ) W3(i-l,i,i+l). W,(i) represents the same local residue free energy as before while W3(i-1 ,i,i+ 1) is a function of three (#J,+)pairs and represents the additional contributions arising from the longer-range interactions which occur in helix formation. At the time that the LR theory was developed, the hydrogen bond was considered to be the dominant contribution to this long-range term. Associated with a sequence of h’s, which is n, units long, are n h = n, - 2 hydrogen bonds. The LR theory gives two of these units the statistical weight v’and assigns the helical weight w’to the remainder. How this is done is a matter of convention so long as the weight of the sequence is ( V ’ ) ~ ( W ’ ) The ~ - ~ two . u’residues could be at the amino terminal end, the carboxyl terminal end, or one at each end. The latter was selected in the LR theory. In summary, there are three statistical weights, u’, u’, and w’ which are normalized to 1, u, and w for practical calculations. u is the equilibrium constant for the formation of the helical conformation in the random coil. It is considerably less than unity because of the low entropy associated with confinement in the helical region and because of electrostatic repu1~ions.l~The same weight is used for the two residues in a helix which are not compensated by H-bond formation. Thus, the nucleation parameter in the LR theory is 3. This will be related later to u, the nucleation parameter of the Zimm-Bragg and other theories. The rules for assigning weights are as follows. A c-residue is always given the weight of u’; an h-residue is given a weight of u’, except when it is flanked on both sides by h-residues, when it is given a weight of w’. The nucleation parameters v and u are usually assumed (17)Brant, D. A.; Miller, W. G.; Flory, P. J. J. Mol. Biol. 1967, 23,47.
h&
hf=;;
c(h u 2 )
Fh 0
fc
0
f
0
CF
if-
ic v’
(: ,; !:>w’
(.‘
ih
v’
u’
ic
F(h u c)
0 w’
0 u’
Y’
0
(!
ih i c
)=E!-
y
?('(hut)
)(6)
c(h u 2.)
where h U c is the Lifson-Roig notation for the union of the local configuration space of h and c.I6 Weights are assigned to the units indicated by the upper bars. In the 3 X 3 matrices the third-row label is branched, indicating that either h or c is represented, the selection is determined by the first element of the column label. Note that c is always given the statistical weight of u’ regardless of its neighbors. The 4 X 4 matrix is singular and can be reduced to the 3 X 3 matrix in the middle of ( 6 ) . Introduction of the renormalized weights w,v,l leads to the matrix on the right, which is commonly used. The matrix method for chain statistics is discussed in detail in ref 5, including the reduction of the size of the matrix. By standard methods the partition function is given by
z=(o
0
(7)
The partition function is evaluated by matrix diagonalization, and from the resulting analytical expression a number of average properties can be c a l c ~ l a t e d , ~ Jin~particular .’~ the average number of helical hydrogen bonds per molecule which is given by the formula ( n h ) = a In Z / a In w (8) If circular dichroism (CD) at 222 nm is being used to follow the transition, then (nh)/Nh is a good measure of the fractional helicity. Theoretical results1*indicate that the number of residues which contribute to this band should approximately equal the number of peptide carbonyls buried in the helix, and for a single helix this equals ah = (4) - 2. Experimental results with protein CD indicate that the best empirical value is (nr) - 2.5.19 See the paper by Chakrabartty et aLZ0for the application to helix-coil transitions. For chains with more than one helix the total average number of helical residues, skipping single helical residues, is ( n h ) + 2 ( 4 ) , where ( v h ) is the number of helices of at least two units. (q,) can be determined from the derivative a In Z / a In vI2,where uIz is the u which appears in the first row, second column of the LR matrix.I6 Other quantities that can be calculated are the average length of helices ((nh)/(Y h ) ) and probabilities for individual units in the chain, such as the probability that a residue is in the first, last, or central position in a helix and the probability that its N H or CO group are hydrogen-bonded to another peptide unit. 4. The Zimm-Bragg Theory The Zimm-Bragg theory (ZB) is also a matrix theory essentially equivalent to LR but modeled in a quite different way. The units in the calculation are peptide units rather than residues, and these are classified by hydrogen-bond formation rather than by conformation. If the N H group of the peptide is involved in an a-helical hydrogen bond, the unit is given the symbol 1; if its N H (18) Madison, V.; Schellman, J. Biopolymers 1972, 11, 1041. (19) Chen, Y.-H.; Yang, J.-T.; Chay, K. H. Biochemisrry 1974, 13,3350. (20) Chakrabartty,A.; Schellman, J. A.; Baldwin, R. L. Narure 1991, 351, 586.
3990 The Journal of Physical Chemistry, Vol. 96, No. 10, 1992
Qian and Schellman
+,
CHART I
peptide units coding
1
2
3
4
5
6
O
O
Q
Q
Q
l
weights
l
l
l
l
l
a
s
7 1 s
8 1 s
9 0 l
10
0 1
is not involved in a hydrogen bond, it is given the symbol 0. (In the original paper ZB ordered the chain starting at the carboxyl end and counted CO hydrogen bonds; we have inverted this procedure to match modem conventions.) The first hydrogen bond in a helix as one proceeds to the right, Le., toward the C-terminus, is given weight us; subsequent contiguous hydrogen bonds are given the weight s. All non-H-bonded NH’s are weighted unity. s is the probability of helical continuation relative to the probability of termination and plays the role of a propagation equilibrium constant. u is a weight attached to the first unit and represents the barrier to helix initiation. The procedure for assigning statistical weights to a peptide sequence is most easily demonstrated by an example. Suppose we have a molecule which contains 10 peptide units with units 3-8 forming a helix. The coding for this sequence is given in Chart I. Although there are six helical peptide units, there are only three hydrogen bonds as indicated for positions 6-8 and therefore only three “1”s in the coding. Because of the structure of the a-helix (Figure lA), each “1” must be preceded by at least three consecutive ‘0”s which are underlined above. These are the three peptide units at the N terminal end of the helix which have no N H hydrogen bonds. The weights assigned to the peptide units are given in the third row using the rules of ZB. To establish all the weights in a sequence requires knowledge of the states of four consecutive peptide units, correlating with three residue conformations. This means that the statistical weight matrices have overlapping triplets as bases and are of dimension 8 = 23. There are, however, only 11 nonvanishing elements of the matrix, and the secular equation is quartic. Poland and Scheraga have demonstrated the reduction of the ZB matrix to 4 X 4.5 However, the full 8 X 8 matrix theory has remained essentially unused. The principal reason for this is that ZB also presented an approximate 2 X 2 matrix solution to the problem, which is very simple and which provides a faithful representation of the helix-coil transition under a wide variety of circumstances. Poland and Scheraga have suggested an alternative 2 X 2 matrix starting with the LR method.2’ Most work on the helix-coil transition has utilized one or the other of these simple theories. The relationship of these theories with the present work will be discussed in a later section. For the ZB model we note the following: (1) The nonhelical ‘0” units are not the same as the c units of the LR theory. The former represent coil units which are comprised of all allowed conformations including the helical conformation, whereas the helical conformation is excluded from “c”. The “1” units correspond to c U h. Since the “c” and “1” states are the standard states of the theories, this difference shows up in all other parameters. (2) The helical “0” units are given the same weight as the coil ‘0” units. We shall show that the u parameter adjusts for this difference. (3) The w and s parameters are not identical. s is the probability of a hydrogen-bonded unit relative to that of a coil unit (conformational integral z , eq 3); w is the probability of a hydrogen-bonded unit relative to a unit which is not in the helical conformation (conformational integral u’, eq 1). 5. Interconversion of ZB and LR Parameters The relationship between the LR and ZB parameters can be established by using both theories to set up the statistical weight of an entire segment consisting of a helix embedded between coil regions and equating the results. We do this for the example discussed in the previous section. The details of the statistical weight assignments are shown in Figure 1B. The rectangles represent peptide groups which are now numbered 0-9; the circles are the intervening a-carbon atoms which are associated with the (21) Poland, D.;Scheraga, H. A. J . Chem. Phys. 1965, 43, 2071.
conformational angles 4 and The binary coding of the H-bond model is indicated above the squares, while the top line represents the assigned weights. The ZB theory arbitrarily assigns the weight of unity to the random coil state. In order to compare the ZB result with the LR theory, we replace it with the weight z , recognizing that these weights are the average over the possible values of ($,+). Similarly, s’rather than s is used for helix growth in accord with the change in reference state. No statistical weight is given to the leftmost peptide unit since the conformational integral deals with the relative orientations of the peptide units and the first unit establishes the position and orientation of the entire chain. The statistical weight of the entire segment is thus
z6us ’3.
The coding for the LR is straightforward except for the two terminal residues on each end. Residues 2 and 8 must be c residues to stop the helix at residues 3 and 7. On the other hand, residues 1 and 9 must be real coil residues, Le., h U c residues, since the example is of a helix embedded in a random coil. Nothing would be added by extending the random coil region on either side. The resulting weights for the LR method are shown in the bottom line. The statistical weight of the entire chain is thus U ’ ~ U ‘ ~ W ’ ~ Z ~ . Equating the two results z6usr3 = u’2u’2w/3z2 Dividing by z9 to recover the weighting of the ZB theory us3 = (u’/z)2(v’/z)2(w‘/z)3
which leads to the correlation formulas (9)
The final forms, on the right, were obtained by dividing homogeneously by u’ to put the formula in the LR reference frame. For an infinite chain the helix-coil transition has its midpoint very close to s = 1, Le., when the helix propagation constant equals the weight of the random coil. According to eq 9, w takes a value very close to 1 + v at the midpoint of the transition. This is physically the same situation, since 1 + v is the weight of a random coil residue in the LR theory. See eq 11. The structure of u in the LR model also has an elementary explanation. A helix embedded in a coil has two transition regions, one on each end. The last coil residue is not random but must be confined to the c region; otherwise the helix will not terminate. The first helical residue is also unusual, having a statistical weight of v rather than w. The probability of a c residue is 1/( 1 v); the probability of a non-hydrogen-bonded h residue is v/( 1 v). Thus, the probability of the ch junction is (1/(1 u))(u/(l v)) = An identical factor arises from the hc junction at the other end of the helix, leading to eq 10. This result is related to the elementary problem of the intrinsic probability of obtaining a run of exactly two heads as part of a series when the probability of a head is p and the probability of a tail is q = 1 - p . The probability is p2q2since not only must one have two consecutive heads but also they must be flanked on either side by a tail. It is easily shown that u as a function of u has a maximum value of 1/ 16 when v = 1. When v = 1, the c and h regions are equally likely, and this is equivalent to the coin tossing problem when p =q= This interesting maximum property of u arises directly from the assumptions of the LR model. In this model the probability of the h part of an hc junction cannot be increased without decreasing the probability of the c part. See eq 3. If real u factors exceed this value, it means that helix formation depends on interactions which are not considered in the LR theory. This subject will be discussed in the concluding section. The Zimm-Bragg 2 X 2 Model. Zimm and Bragg14presented a simplified version of their theory in which the first H-bond immobilized onlv two Dedide units rather than four. As a result. the statistical wkight of \ unit depended only on the state of the preceding unit. This leads to the statistical weight matrix
+
+ +
+
The Journal of Physical Chemistry, Vol. 96, No. 10, 1992 3991
Helix-Coil Theories 1
0
:) Physically, the model is a serious distortion of the molecular event which produces the helix. It is being assumed that the two peptides connected to an a-carbon atom are hydrogen-bonding to one another. This permits sequences like 101 and 1001, which are not allowed for the real polypeptide system. The advantage of this simplification is that it permits a convenient closed-form solution to the problem. As will be shown in a later section, it also tracks the phenomenology of the helix-coil transition in an essentially quantitative fashion for a number of important cases, if minor adjustments are made in the interpretation. Inspection of the example discussed in the previous section shows that the ZB coding and statistical weight of the sequence are not changed in the 2 X 2 case. Thus, the interpretation of u and s is precisely the same as for the full theory. Where errors come in is in counting the number of coil states. In the real helix of Figure 1 there are four nonhelical peptide units (0,1,8,9). In the 2 X 2 representation there are six (0,1,2,3,8,9). In a long molecule undergoing a helix-coil transition the number of nonhelical units is always overcounted with this representation, and this changes the distribution statistics and the partition function. The error becomes more serious when the average number of units between helices is small, Le., when s is large and the fraction of helix is greater than 0.5. Poland and Scheraga have also devised a 2 X 2 matrix method, which they call the nearest-neighbor Ising model, using the conformational description of LR. Their matrix is
completely helical or completely random coil. This model, which has pedagogical value, exaggerates both the stability of the helix and the sharpness of the transition and is not suitable for the quantitative interpretation of experimental results. Single-SequenceApproximation. When the polypeptide chains are reasonably short, it is possible to use the single-sequence approximation. The basis of this approximation, which preceded genuine helix-coil theories, is that because of the low probability of nucleation, a short polypeptide is unlikely to have more than one helix. The statistics of a single helix of varying length and position is a simple combinatorial problem, and it is possible to obtain closed formulas for not only the average helical properties but also the distribution of helical lengths.22 In modern notation the formula for the single-sequence partition function is given by Z=l+us
(N, - 2) - ( N , - 1)s + S N r - ’
In the original formula22the number of amino acids in the chain, rather than the number of residues, was used. The two formulas are connected by the relation N , = N,, - 2. It is also possible to obtain a single-sequence formula from any helix-coil theory by considering the limiting form of that theory at short chain lengths or very small u. Equation 12 is in fact the first two terms of the partition function expressed as a power series in u. The power of u corresponds to the number of helices in the chain. The ZB theory gives a formula identical to eq 12, which is expected since it is based on the same type of physical model. The LR version of the single sequence formula is Z= N,(N, - 1 ) ~ ’ ( N , - 2) - ( N , - l ) ~ u2w 1 + Nru + 2 (1 - w)2 (13) The difference from eq.12arises mainly from the fact that unity, the reference state, is not the statistical weight of a random coil in the LR theory. The first three terms of eq 13 are the number of ways that 0, 1, or 2 h residues may be distributed over N , residues. Interestingly, the single-sequence form of the ZB 2 X 2 theory may be adjusted so that it gives results identical to the full the~ r y . ~AsJ was ~ demonstrated in the example above, the 2 X 2 theory gives the same weighting to an a-helix as the 8 X 8 theory. However, it neglects two peptide units at the amino end of the helix and therefore artificially assigns two extra peptides to the coil segment. It is easily shown from this that, if there is a single helical sequence, the 2 X 2 result for a chain of N p - 2 peptide units is identical to the exact 8 X 8 result for a chain of Npunits. Thus, the effective number of helical residues is Np- 2 = N h + 1 = N , - 1. As a result, the 2 X 2 theory can be used with no error for short chains, provided one diminishes the number of peptide units in the 8 X 8 theory by 2. This is equivalent to assigning N , a value equal to the maximum number of possible hydrogen bonds in the real chain. Though simple, eq 12 has caused confusion. It has appeared with N,, N,, Np,and Nh as variable with resulting changes in the numerical coefficients and also with adjusted chain length to match the 2 X 2 theories. In fact, all the published formulas are identical in content. Short, Sequenced Polypeptides. When the chains are short, and especially when they have a specific known sequence, the partition function can be evaluated directly by computing the matrix product of eq 7 . For a sequence of different amino acids one must use the sequenced product of matrices in evaluating Z
+
+
The w in the denominator of the nucleation term is introduced to obtain the proper weight for a helix. On the other hand, this departs from physical reality by giving a weight to isolated h residues, which depends on w. Otherwise, the matrix has the same advantages and approximations as the 2 X 2 matrix of ZB. The Random Coil.In all the theories that count H-bonds there is an approximation in the description of the random coil. This approximation will now be evaluated and shown to be small. In the ZB and the other theories the partition function of a random coil is written as Z = f l r since each residue is to be averaged over all conformations and the units are considered to be independent. However, if each residue is allowed to assume independently the helical conformation, the resulting sequence will contain sequences of three or more helical residues and thus will not represent a random coil. No such assumption is made in the LR theory, so we can use this theory to evaluate the magnitude of the resulting error. This is done as follows. The LR chain is forced to represent a random coil by putting w = 0, and the partition function is evaluated for N , residues using the matrix theory. The results are expanded as a function of u and compared with (1 + v)”’~, which is the result for independent units when u ’ = 1. We find Z = (1
+ u)Nr - ( N , - 2)u3 + terms of u4 and higher
(1 1)
We note that ( N , - 2)u3 is just the probability of a triplet times the number of places that it can be placed in a sequence of N , residues. Higher terms deal with longer stretches of h’s, which are not permitted in the random coil. The relative error, ( N , or less for 2)v3/(l + ~ ) ~ isr ,very small; it is of the order of u = 0.05 for any length of chain and becomes smaller for smaller u. We conclude that the assumption of independent elements in the random coil leads to negligible error except when u is large.
(12)
(1 - s)2
zqo
0
,,.Mi(;) I
6. A Comparison of Theories Short Polypeptides. Two-State Transition. The simplest interpretation for melting of an a-helix is the two-state model in which the polypeptide is considered to exist only in states that are
where the matrices Mi are constructed with the parameters ap(22) Schellman, J. A. J . Phys. Chem. 1958, 62, 1485.
3992 The Journal of Physical Chemistry, Vol. 96, No. 10, 1992
Qian and Schellman
propriate for the amino acid at position i. One can not only evaluate the fractional helicity of the chain and other averages but also obtain the helical or hydrogen bond probability of individual units in the chain by way of the formula
1.0 - A
1- '
0.8 -
II
#k
I
0.6 0.4
-
0.0 0.2
where pi is the ensemble average of the probability that unit i has weight w,, Le., that it is associated with a hydrogen bond in a helix. The actual hydrogen bond is between the CO of residue i - 2 and the N H of residue i 2. In practice, the derivative of Z with respect to In w is accomplished by substituting the matrix
+
(r H>
for the LR matrix Mi in the product of eq 14. In the present era of high-speed computing, it is not necessary that chains be short and calculations of extremely long sequenced DNA molecules have been performed with considerable S U C C ~ S S . ~ ~ ~ , * ~ - ~ ~ Simplified models using 2 X 2 matrices have had continued success in interpreting the phenomenology of the helix-coil transition, i.e., thermodynamic properties which are averages over all molecules and all amino acids. Recent years have seen the use of techniques such as nuclear magnetic resonance (NMR) spectroscopy, hydrogen exchange, mutational effects, etc., which permit one to obtain information about the behavior of individual amino acids a t specific positions in the chain. For the adequate interpretation of this type of work it is necessary to use models like LR and ZB, which provide a faithful representation of the molecular events. Such a study combining the use of synthetic polypeptides of known and varied sequence, refined experimental methods, and helix-coil theory has been under way for several years in the laboratory of R. L. Baldwinz0~26-30 and is now being pursued in a number of others. Long Polypeptides. We have studied in some detail the comparative predictions of the two-state, single-sequence, LR, ZB, and 2 X 2 theories. The experimental parameters which are obtained from studies of helix-coil transitions are normally the position and slope of the transition. The position of the transition is defined as that value of s or w for which the system is 50% helical. The fractional helicity is conveniently defined as (0,) = (nh)/Nh,where ( n h )is the average number of hydrogen bonds in a molecule and Nh is the maximum possible number, N , - 2. This is convenient since nh is essentially what is measured by CD and hydrogen-exchange methods including N M R spectroscopy. See the discussion below eq 8. Formulas were obtained for finite chains by expanding the exact equations of the several theories in power series in u and (s - 1) or v and ( w - (u 1)) and examining the terms of low power. The results show that the ZB and LR theories are essentially identical, differing only in terms of the order of u3 or u I . ~ The . ZB 2 X 2 theory is surprisingly good near the midpoint of the transition. The slope is the same at s = 1 for the two theories. There is an error of v/2 (about 0.025) in the values of ( e ) at s = 1 for the 2 X 2 theory, but this results from the steepness of is very much the (e) vs s - 1 curve. The error in s at ( e ) =
+
(23) Poland, D. Biopolymers 1974, 13, 1859. (24) Poland, D. Int. J . Polym. Mater. 1975, 4, 1. (25) Lerman, L. S.; Silverstein, K. Methods Enzymol. 1987, 155, 482. (26) Padmanabhan, S.; Marqusee, S.; Ridgeway, T.; Laue, T. M.; Baldwin, R. L.Nature 1990, 344, 268. ( 2 7 ) Strehlow, K. G.; Robertson, A. D.; Baldwin, R. L. Eiochemisrry 1991, 30, 5810. (28) Scholtz, J. M.; Marqusee, S.; Baldwin, R. L.; York, E. J.; Stewart, J. M.; Santoro, M.; Bolen, D.W. Proc. Narl. Acad. Sci. U.S.A. 1991, 88, 2854. (29) Scholtz, J. M.; Qian. H.; York, E. J.; Stewart, J. M.; Baldwin, R. L. Biopolymers 1991, 31. 1463. (30) Scholtz. J. M.; Baldwin, R. L. Annu. Reu. Eiophys. Eiophys. Chem., in press.
0.8
.-X
I
0.6 0.4
0.2 0.0
0.8
1.0
1.2
1.4
1.6
1.8
W
Figure 2. (A) Comparison between models. The squares are calculated according to Lifson-Roig's helix-coil transition model (u = 0.05). The solid lines are according to the one-sequence approximation, and the dashed lines are the least-square fits to the LR model by an all-or-none transition: (w/wo)Nr-z/[l + ( W / W ~ ) ~where ~ - ~ ] w, , = 2.10, 1.35, and 1.15 for N , = 10, 20, and 40, respectively. It is seen that for N , = 40 the one-sequence approximation begins to deviate from the exact theory toward the direction of the all-or-none approximation. (B) Comparison between different helix-oil theories. The symbols are calculated according to Lifson-Roig's model, and the solid lines are according to Zimm-Bragg's 2 X 2 model, where s = w/(l + u), u = u2/(1 u ) ~and , the total number of units Nb = N , - 2. (a) N , = 40, u = 0.05; (b) N , = 40, u = 0.1; (c) N , = 20, u = 0.1; (d) N , = 20, u = 0.05. It is seen that when u and/or N is small, the models are almost indistinguishable. Even for an infinitely long chain, there is only 3% and 6% maximum difference for u = 0.05 and 0.1,respectively (data not shown).
+
smaller, and this is the experimentally important way of using the curves. The details of these calculations do not appear to be of special interest to this paper, and we illustrate the results instead with curves for the various models, using the same parameters and the translation formulas of eqs 9 and 10. These comparisons are shown in Figure 2. It is to be expected that the ZB and the LR theories show good agreement once the correspondences of eqs 9 and 10 are recognized, since they differ only by a minor approximation. The reason that the 2 X 2 calculation is also a good approximation is that the curves of Figure 2 are mostly near the single-sequence limit of small u and N , where all the theories can be brought into correspondence. 7. Have the Helix-Coil Assumptions Retained Their Validity?
The first theories of the helix-coil transition were developed at a time when conformational chemistry was in a relatively primitive state. Beginning with the two-state model of Schellman" the basic physical assumptions have been that (1) the nucleation barrier is essentially entropic and is caused by the fact that two residues (or three peptide groups) of a helix are uncompensated by hydrogen bonds and (2) hydrogen bond formation is the key stabilizing event in helix formation. Kauzmann's paper on the importance of the hydrophobic interaction31appeared shortly after, or concurrent with, the pioneer papers in the helix-coil field. Thus, all the theories were fixed in form before it became generally believed that hydrophobic forces are the primary stabilizing forces (31) Kauzmann, W. Adu. Protein Chem. 1959, 14, 1.
The Journal of Physical Chemistry, Vol. 96, No. 10, 1992 3993
Helix-Coil Theories in biological macromolecules. At present, opinion appears to be about equally divided between those who think that hydrogen bonds are negligible thermodynamically (but not structurally!) and those who believe that H-bonds play an important role but one that is difficult to determine because of the usual dominance of hydrophobic interactions. Only in the past few years have there been papers which determine the contribution of "polar" interactions relative to hydrophobic interaction^.'^^^^ No one appears to have reexamined the helix-coil postulates in the light of later developments, and it is the purpose of this section to do so. First, we note that there are two kinds of helix-coil theories. The first are the theories which have postulates that are essentially thermodynamic. This includes all theories that count H-bonds, which employ the standard mathematics of homogeneous nucleation theory. They assume a nucleation parameter ( u or an equivalent) and a size for the nucleus. Though these theories may have been constructed with the chain entropy, H-bond postulates mentioned above, they are in fact not committed to a physical model. The strength and weakness of thermodynamic analyses are that they are noncommittal at the molecular level. The postulates of the theories of Nagails and LR, on the other hand, are a t the level of conformational integrals and are more detailed, specific, and consequently less general. In terms of the LR theory the u factor has a structure based on the probabilities of specific molecular conformations as shown in eq 9. This is the type of theory that is required if we are to obtain a physical interpretation of the helix-coil parameters. It also makes predictions which can be checked against experiment. For example, if the u parameter is observed to have a value greater than 1/ 16, this means that u cannot be interpreted in terms of one pair of conformational probabilities 1/( 1 + u) and u / ( 1 u ) depending only on u. This question will be investigated further below. We will now perform a small transformation of the LR theory in order to bring it into a form closer to the ZB theory. This will also provide a better representation of nucleation. This is done by assigning the weight of each triplet to the right member of the triplet rather than the central member. This is only a change in convention and produces no change in the results of the theory. The 4 X 4 LR matrix then becomes
+
hh
h i hE c i cZ w l 0 0
cc
0
::(!
0
!)-!;
( h u c)c
(a
hi
ch ( h u c)? )(16)
This matrix has the same secular equation as the original LR matrix. Minor adjustments of the end vectors are required for finite chains. With the usual convention of starting at the left end of the chain, this matrix weights a helix by two consecutive u's, which is the nucleation step, and then proceeds with a series of w's. If we use the good approximation (see above) that a random coil stretch j units long may be represented by the product zj = (u uy, then the weighting of a particular chain configuration takes the form
+
...Z ' ~ U U 2 ~ W m ( U ( Z " ~ U ~ k~...~ w h ~ U ~ Z or in another form
...z'~JlWyz"lJlwh~Izk... where I , m, n, h, and k are integers representing the lengths of homogeneous stretches of helix or coil and J and j are nucleating junctions or boundaries between homogeneous stretches. J = uuz is the nucleating junction for the beginning of a helix. Three consecutive residues are fixed in regions of conformational space as discussed above in connection with eq 9. j = u is the weak nucleation parameter for the beginning of a coil. We can also transform the ZB theory to separate nucleation and growth by assigning all "1"s a weight of s and assigning a weight of u to the which is to the left of the first peptide in
"r
(32) Murphy, K.;Gill, S. J. J . Mol. Biol. 1991, 222, 699. (33) Privalov, P. L.;Makhatadze, G. I. J . Mol. Biol. 1991, 213, 385.
a helix. We skip details, but this results in the separation of u and s in the ZB matrix while giving the same secular equation and results. We then have for the configuration above ...z'(ulsmlz"+1(UlShlZk+' or
...z'lJlsmlz"+'IJlwhlzk+' where J = u. All nucleation processes are combined in the ZB theory so that the coil nucleation parameter is invisible and the first coil unit is counted as a random coil unit rather than as a "c" unit. The J junctions involve three units in both theories: chh in the ZB theory. in the LR theory and We now examine two postulates of helix-coil theory. The first is the LR assumption that the units in chc, chh, and chh have an identical weighting u. Can one ascribe the same free energy to an isolated h conformation, to the first helical fold of the nucleus, and to the second helical fold? These triplets are in the convention of eq 16. The second is whether the stabilization of the helix arises predominantly from H-bond formation. LR did not make this assumption. The two postulates are related. If interactions other than H-bonds are responsible for stabilization, then they may contribute to the "u" steps in helix formation as well as the "w" steps. Experimental s (or w ) values are usually in the neighborhood of 1.5 for helix-forming amino acid^.^^^^^ This corresponds to about 250 cal of free energy of stabilization. With this type of marginal stability, any type of interaction that takes place must be considered as important. The free energy of formation of the peptide hydrogen bond in aqueous solution remains an open question. The AH of its formation has cycled down and up in the literature for more than three decades, and there are still no good estimates of the free energy. The problem of the H-bond in the a-helix could be approached by molecular simulation methods, but the effect is sufficiently marginal that it would be difficult to be certain of the reliability of the results. Our approach has been to make a crude estimate of the changes in solvation in forming the a-helix. From the results we conclude that interactions other than H-bond formation are to be expected and that two nucleation steps of the LR theory are likely to have different free energies. For this purpose we have generated a series of oligoalanines, (Ala),, in the extended and helical form with N,, = 3-13, Le., N , = 1-1 1. For each of these molecules the molecular surface area was estimated using the MS/MIDAS program (University of California, San Francisco) which is based on the method of C ~ n n o l l y . ~For ~ each n the difference in area of the helical and extended forms was calculated AA, = AnCX -A?. The incremental area A M , = AA, - AAP1 was then used as an estimate of the area change for the transfer of the nth residue from the extended to the helical conformation. The chan e in area for the first helical residue is about 8 h2and about 20 for the second and subsequent residues.36 These are roughly the areas occupied by one and two water molecules, respectively. Without implying that the solvation free energy of a peptide unit is proportional to the exposed surface area, a number of interesting conclusions can be drawn: (1) When a residue changes from an extended conformation to the a-helical conformation, contacts with the solvent are replaced by contacts between the two peptide groups. The electronic spectrum of the peptide group is profoundly affected by helix f~rmation,~' which is an indication of strong interactions of the electronic systems of the groups. These changes in interactions
x
(34) Altmann, K.-H.; Wojcik, J.; Vasquez, M.; Scheraga, H. A. Biopolymers 1990, 30, 107. (35) Connolly, M. L.Science 1983, 221, 709. (36) These values are approximate because the areas calculated by our version of MS/MIDAS (May 1991) program on IRIS have a small dependence on the starting trajectory of the probe and therefore on the initial coordinates of the molecule. Although the variability is small on an absolute scale, it is increased to 10-20% when the double differences are taken. It will be necessary to use an average for each molecule starting with a series of randomly selected initial positions and orientations. This work is in progress. (37) Mandel, R.;Holzwarth, G. J . Chem. Phys. 1972,57, 3469.
J . Phys. Chem. 1992, 96, 3994-3998
3994
affect the parameter u for an isolated residue in the helical conformation. Though the magnitude of this solvent effect is not known, this indicates that peptide conformational maps should be based on free energy rather than the energy. According to Brant and Flory,” another contribution to u arises from the fact that an isolated helical conformation has an unfavorable electrostatic free energy. (2) It is unlikely that the statistical weights for the first two nucleating residues of a helix are the same, since the second residue changes the solvent contact surface area by about twice as much as the first. This is evidence for the contact of the third peptide with the first and indicates that for a pair of adjacent helical conformations the “isolated pair” model for the peptide chain is not valid. With uI and u2 as the weights for the first and second h residues in the chh sequence, the standard LR matrix takes the form hi hfc(h hi u E )
6c
P v,
v
Z(h u c)
’:> 1
Here element (3,l) of the matrix is for the first helical residue and (1,2) for the second. However, these elements are not separable parameters. It is easily shown that they occur only as a product in the secular equation. Thus, even when these two steps are not equivalent, one could define a single parameter u with v2 = u1u2,which would play the same role in nucleation as the standard parameter. This would then differ from the element (3,2), which is the statistical weight for an isolated h residue. Since helix-coil transitions are dominated by the behavior of a and s,
it would be very dflicult to establish this difference experimentally. Since the nucleation parameter, u2 (or a), is associated with changes in solvation as well as confinement in angular space, the assumption that it is mostly entropic and independent of temperature can be questioned. (3) When a peptide unit is hydrogen-bonded into a helix, these same solvation changes occur in addition to the formation of the peptide H-bond. We do not even know the sign of the contribution of these changes in solvation to the free energy, but with only 250 cal or so of stabilization, they could be making an important contribution. The statistical weight, w, which by definition16 depends on the interactions of three consecutive helical residues, need not be construed as arising solely from the hydrogen bond. In summary, the main conclusion is that the form of a helix-coil theory is not strongly dependent on the specific assumptions which were made at the time of its formulation. Interactions other than H-bonds may be incorporated in w or s. If u1 and u2 are significantly different from one another, the u determined from helix-coil theory will be ( u , u ~ ) ~and / ~ ,a probably small error will be introduced in the weighting of random coil segments. Because of changes in solvation energy, the nucleation parameter should possess a temperature dependence, but since the equilibrium constant for the formation of a helix is us“, where n is the length of the cooperative unit for large helices and roughly the chain length for short ones, the temperature factor of u is probably obscured by the large temperature factor of sW2. Acknowledgment. This research was supported by NIH Grant GM20195 and NSF Grant PCM8609113. We thank Drs. Robert L. Baldwin and J. Martin Scholtz for many useful discussions and for sharing numerous experimental results prior to publication.
Intrinsic Viscosity of Model Starburst+ Dendrimerss Marc L. Mansfield* and Leonid I. Mushin* Michigan Molecular Institute, 1910 West St. Andrews Road, Midland, Michigan 48640 (Received: September 27, 1991)
The hydrodynamic radii (R,) calculated from intrinsic viscosities of poly(amido amide) (PAMAM) starburst dendrimers increase rapidly with increasing generation number. This would seem to support a structure with strongly stretched tiers, which is relatively hollow near the core, and with most end groups near the surface. However, a computer simulation due to h n e c and Muthukumar supports a more folded structure with higher density near the core and with end groups dispersed throughout the molecule. Here we calculate the hydrodynamic radii of the Lescanec-Muthukumar model using intrinsic viscosity formulas developed by Zimm and Fixman. The Lescanec-Muthukumar starbursts with relatively stiff spacers have hydrodynamic radii in good agreement with the experimentalPAMAM radii,in spite of their folded structure. The hydrodynamic radius is sensitive both to the molecular size and to the density. As the number of generations increases, the molecules become more dense and the hydrodynamic radius increases more rapidly than the radius of gyration.
Introduction A number of authors have reported syntheses of starburst dendrimers.’* These are highly, regularly branched molecules having a treelike structure, displayed schematically in Figure 1. The most-studied starburst macromolecule is probably the sc~ca!lsd poly(amido amine) (PAMAM) starburst, whose synthesis was reported by Tomalia and co-workers.’ The basic building block, or spacer, of the PAMAM dendrimer is this moiety: ‘N-CH2-CH2-CO-NH-CH2-CH2-N,
/
#
‘STARBURST is a trademark of the Michigan Molecular Institute. *Dedicated, with highest regards, to Professor Marshall Fixman in honor of his 60th birthday. 8 Permanent address: Institute of Macromolecular Compounds, Academy of Sciencies of the USSR, 199004 St. Petersburg, Bolshoi prosp. 31, USSR.
We define the generation number, m, of a starburst molecule to be the number of spacers between the central core of the molecule and any of the termini. By this convention, the structure in Figure 1 has a generation number of 6. (Other authors begin counting a t zero, by which convention Figure 1 would have a generation number of 5 . ) (1) Tomalia, D. A.; Baker, H.; Dewald, J.; Hall, M.; Kallos, G.; Martin,
S.;Rocck, J.; Ryder, J.; Smith, P. Polym. J . 1B5, 17, 117. (2) Tomalia, D. A.; Hedstrand, D. M.; Wilson, L. R. Encyclopedia of Polymer Science and engineer in^, 2nd 4.; Wiley: New York, 1990; Index ~01.;pp 46-92. ( 3 ) Tomalia. D. A,: Navlor, A. M.; Goddard, W. A., 111 Anaew. Chem.. Int.‘Ed. Engl. 1990, 29, l j 8 . (4) Hawker, C. J.; Frkhet, J. M. J. Macromolecules 19M), 23, 4726. ( 5 ) Morikawa, A,; Kakimoto, M.; Imai, Y. Macromolecules 1991, 24, 3469. ( 6 ) Newkome, G. R.; Lin, X . Macromolecules 1991, 24, 1443.
0022-3654/92/2096-3994%03.00/0 0 1992 American Chemical Society