Statistical Studies of Flexible Nonhomogeneous Polypeptide Chains

Oct 13, 2005 - Therefore, studies of various statistical properties of flexible ... This paper presents distributions (histograms) of distances betwee...
2 downloads 0 Views 588KB Size
Biomacromolecules 2005, 6, 3010-3017

3010

Statistical Studies of Flexible Nonhomogeneous Polypeptide Chains Petras J. Kundrotas Department of Biosciences at Novum, Karolinska Institutet, SE-141 57 Huddinge, Sweden Received May 12, 2005; Revised Manuscript Received September 6, 2005

Unfolded proteins attract increasing attention nowadays because of the accumulation of experimental evidence that they play an important role in different biological processes. Therefore, studies of various statistical properties of flexible protein-like polypeptide chains are becoming increasingly important as well. This paper presents distributions (histograms) of distances between atoms of titratable residues for flexible polypeptide chains with various residue compositions and with the hard-spheres potential taken into consideration. The factors influencing the parameters of the obtained histograms have been identified and analyzed. It was found that the sensitivity of the distributions with respect to the internal structure of intermediate residues increases with the number of residues between the considered charged residues. It was shown that branching at Cβ atoms of the side chains of the intermediate residues is among the most considerable factors influencing the shape of the distance distribution and the average distance between atoms in flexible chains. Despite the model simplicity, the results of the calculations can be applied for systems with other types of interactions presented, and this was demonstrated for the charge-charge interactions. In particular, it was shown that those interactions have a significant effect on distances between the unlike charges, while such an effect for the like charges is much less pronounced. The comparison of predictions made on the basis of the presented calculations to some experimental data is also given, and possible applications of the theoretical concept described in the paper are discussed. I. Introduction Formulation of a structure-function paradigm more than 100 years ago1 was a ”big bang” which created the universe of modern protein science. Nevertheless, experimental evidence is being accumulated for the existence of a considerable amount of proteins that lack ordered structure within the large domains (more than 50 amino acids), or even entirely, under physiological conditions. This has recently given rise to a reappraisal of the protein structure-function paradigm,2 bringing instead the protein trinity paradigm,3 later extended to the protein quartet paradigm.4 According to these concepts, a function of a protein (or of its functional regions) can arise from any of three (for the trinity model) or four (for the quartet model) distinct thermodynamic states and/or transitions among them. Each of those states, which are identified as folded state, random coil, molten globule, and premolten globule (for the quartet model only) states, can be viewed as a native state of protein. The current list of “natively unfolded” or “intrinsically unstructured” proteins contains more than 100 entries and covers proteins with 28 distinguishable functions5 within four broad categories: molecular recognition, molecular assembly, protein modification, and entropic chains (for review, see refs 3, 4, 6, 7). This class of proteins plays also an important role in biomedical processes. As the most dramatic manifestation of this, one can mention that deposition of some intrinsically disordered proteins (as τ-protein, studies of which gave birth to the term “natively unfolded”8) plays a crucial role in the development of a number of neurodegenerative disorders

with examples including Alzheimer’s9,10 and Parkinson’s11,12 diseases (for a complete review, see the work by Uversky6 and references therein). Much work has recently been done on identifying intrinsically disordered proteins and their functions,11,13-15 analyzing their sequences,16-19 and developing different predictors for protein disorder.5,7,20,21 The author has previously proposed the spherical model of unfolded proteins for calculations of electrostatic contribution into protein stability.22,23 Simultaneously, Zhou24 has proposed the Gaussian-chain model for treating electrostatic interactions in unfolded proteins. Goldenberg has performed Monte Carlo simulations for flexible polypeptide chains based on sequences of real proteins.25 Fitzke and Rose26 have carried out Monte Carlo modeling for a somewhat artificial model based on sequences of real proteins but with only ∼8% of residues considered to be flexible. Nevertheless, systematic biophysical studies focusing on the statistical properties of natively unfolded proteins so far remain scarce and elusive. This paper is devoted to one of the innumerable biophysical problems related to unfolded proteins, namely, to studies of statistical distributions of distances between titratable sites of flexible protein-like polypeptide chains with various compositions. A distinguishable feature of natively unfolded proteins is the low content of hydrophobic groups and large content of charged groups, so they possess high net charge at physiological conditions.6 Therefore, the considered distance distributions can provide information about energetically favorable distances between protein charges.27 This,

10.1021/bm0503266 CCC: $30.25 © 2005 American Chemical Society Published on Web 10/13/2005

Flexible Nonhomogeneous Polypeptide Chains

in turn, is of great value for developing simplified virtualchain models of unfolded proteins. For example, it was shown in our previous paper27 that the correct incorporation of the protein charge sequence into the spherical model of unfolded proteins essentially improves the prediction of the pK values for the unfolded N-terminal SH3 domain of the Drosophila protein drk. In that study, the distance distributions were considered for homogeneous and nonhomogeneous polypeptide chains consisting of titratable residues only. As it was demonstrated,27 the distance distributions are strongly affected by chain composition, especially for residues located close to each other along the sequence of protein. However, with such a choice of studied chains as in ref 27, the role of polar and hydrophobic residues in confining spatial freedom of protein charges remains elusive. Here, we would like to revisit our earlier analysis of the distance distributions between titratable atoms of charged groups, but now with the special reference to an extended pool of considered chains, where all the possible residues, side chains of which do not directly restrict conformational freedom of the backbone (as is the case for proline residue), could be placed between any two residues of interest. The results of this study can be useful for understanding the functions of entropic chains (proteins, functions of which arise from intrinsic disorder), where excluded-volume effects and charge-charge interactions are of major importance. II. Calculation Methods A. Computation and Analysis of Distance Histograms. To achieve the above formulated goals of this study, computer simulations were carried out for the polypeptide chains consisting of a standard protein backbone and various sets of side chains. A generalized sequence of the considered chains can be written as

where distances are calculated and analyzed between a pair of residues of interest Z and Z′ separated by m variable residues X (the notations m and X will be henceforth referred to as separation parameter and intermediate residues, respectively). The case m ) 0 (neighboring residues) was described in detail in the previous work,27 and in this paper, the values of m ranging from 1 to 6 will be considered. Z and Z′ can stand for one of the titratable residues (aspartic (D) or glutamic (E) acids, arginine (R) or lysine (K)), and X can be any out of 19 residues, side chains of which do not confine conformational flexibility of the backbone. The proline residues were excluded from the consideration in this study, because they drastically reduce rotational freedom of the chain backbone and therefore are out of the scope of present investigation. Conformations of the chains were generated by random sampling of backbone dihedral angles, φ and ψ. Further, dihedral angles of side chains, χ, were randomly alternated between the three values (-60°, +60°, +180°) that keep the side chains in energetically favorable staggered conformations.28 Each conformation generated was checked against

Biomacromolecules, Vol. 6, No. 6, 2005 3011

sterical collisions and accepted only if all interatomic distances were larger that corresponding Ramachandran criteria.29 The distances between atoms on which charges can reside (Oδ1 and Oδ2 atoms for aspartic acid, O1 and O2 atoms for glutamic acid, Nζ atom for lysine, and NH1 and NH2 atoms for arginine) have been calculated and stored for each conformation accepted during calculations. For side chains with two possible charge positions, we did not find any significant statistical difference between results obtained for the charge being in either position. All distance distributions presented and analyzed in the paper were obtained from 4 × 107 conformations. The choice of the sequence (eq 1) was determined by two factors. First, as was pointed out in one of our previous works,30 statistical properties of flexible chains are more greatly influenced by the very presence of the side chains rather than by their length. Therefore, alanine residue (and not the simplest glycine residue) was chosen to be at the chain ends. Second, to emphasize an impact of the chain sequence connecting the residues of interest, it is desirable to keep the rest of the chain unchanged. To ensure that the distance distributions are not affected by the choice of chain tails, some calculations were also performed for longer chains with 10 and 20 alanine residues at both ends of the chains and with residues other than alanine. No significant statistical difference was found in the obtained distance distributions in either case. On the other hand, total absence of chain tails could lead to distortion of statistics. Therefore, five residues at both ends of the chain were chosen as the best compromise between the computational simplicity of the task and the complexity of real proteins. In this study, the main focus was analysis of histograms d(m) as functions of m and chain compositions for small m values (from 1 to 6). For this purpose, average distances (rav) between a pair of titratable residues and variations of the histogram’s shape with the chain composition have been scrutinized. Further, a quantity (hereafter referred to as “normalized asymmetry parameter”) δ)

rmax - rav rav

(2)

has been calculated and analyzed in order to assess asymmetry of the obtained histograms. The most widely used model for the description of flexible polymers, the Gaussianchain model,31 has asymmetric distribution of the end-toend distances with the distance, rmax, at the maximum of the histogram being smaller than the average distance, rav, computed from this histogram (δ < 0). The chains considered in the present study have less conformational freedom compared to the Gaussian chains because of rigidity of the covalent angles and explicit inclusion of the side chains into the simulations. Therefore, δ analysis can tell how those additional conformational constraints affect the distance distributions and, consequently, whether the Gaussian-chain model is still applicable for such confined chains. The considered range of m values is the most appropriate range for proteins, since frequency of occurrence for titratable residues in proteins is ∼25%.32 On the other hand, asymptotic

3012

Biomacromolecules, Vol. 6, No. 6, 2005

Kundrotas

behavior of average quantities is not yet reached at m < 6,27 and hence, the quantities of interest have to be analyzed individually for each m value. To assess quantitatively the variations of histograms with chain composition, a mean-square deviation between the two histograms (MRD) was defined as ∆(XX′, ZZ′, m) )

Wij(r) ) (332 Å kcal/mol/e2) × qiqj × W(r)

NB

1

∑[di(ZXmZ′) - di(ZXm′Z′)]2

NB i)1

(3)

where NB is the number of histogram bins (NB ) 5000 for all histograms presented in the paper), di(ZXmZ′) and di(ZXm′Z′) are histogram values for a distance, r, described by index i, between the residues Z and Z′ separated by m residues X and X′, respectively. The distance r can be obtained from the index i through the relation r ) rmin + i × (rmax - rmin )/NB, where rmin and rmax are minimal and maximal, respectively, allowed distances used in histogram calculations (here, rmin ) 0 Å and rmax ) 100 Å). The number of generated conformations ensures a low level of statistical noise in the obtained histograms. For instance, statistical errors in the determination of rmax and rav do not exceed ten histogram bins, i.e., ∆rmax, ∆rav e 0.1 Å. Statistical error for the asymmetry parameter δ can be calculated from the error propagation formula (see, e.g., ref 33). The level of uncertainty for ∆ can be evaluated by calculating this quantity using two histograms for chains of identical compositions (X ) X′ in eq 3) but obtained from two computationally different runs (i.e., with different sequences of pseudo random numbers used in calculations). For the chains considered in this paper, the levels of statistical noise for ∆(X ) X′) ≈ 10-7-10-8. B. Calculations of Free-Energy Profile. The method described above produces histograms of distances between titratable sites of the charged residues in a system with only the hard-sphere potentials taken into consideration. At this point, a natural question arises regarding whether these histograms can be of practical relevance for real proteins at physiological conditions when those sites usually bear a charge and therefore charge-charge interactions cannot be neglected. The positive answer to this question comes from statistical physics, where it is very well known that histograms for the quantities of interest in the absence of certain interactions can be easily recalculated (within certain limits, of course) for those interactions existing in a system. For the distributions considered in the paper, this can be done by calculating a free energy34,35 ∆Gij(r) ) -β-1 ln

d(m, r) × exp[-βWij(r)]

∑r d(m, r) × exp[-βWij(r)]

The d(m, r) values are obtainable from the corresponding “hard-sphere” histograms presented and analyzed below. The notation Wij(r) can, for example, stand for electrostatic energy of a system composed of two charges separated by r Å, which can be calculated by the following expression

(4)

as a function of distance between the atoms (free-energy profile). In eq 4, m ) |i - j| is the separation parameter between groups i and j along the sequence β ) 1/RT with R and T being the universal gas constant and the temperature (in K), respectively. The quantity d(m, r) stands for the number of states for which titratable sites of a pair of groups with separation parameter m are r Å apart from each other.

(5)

where qi and qj are the charges of groups i and j, respectively, and W(r) is the interaction potential between two charges separated by distance r. At equilibrium, the system tends to adopt the distance between the charges corresponding to the minimum of the free energy, rmin, or, at least, such a distance r, that the energy barrier ∆∆G(r) ) ∆Gij(r) - ∆Gij(rmin) at that distance does not exceed the energy of thermal fluctuations RT (0.58 kcal/mol at room temperature). Hence, the functions ∆Gij(r) for each pair of titratable groups provide information about the energetically favorable distances between protein charges. The expression for the interaction potential W(r) varies for different interaction environments, and here, two model systems are considered. First, a simple Coulomb law was considered W(r) )

1 r

(6)

where  is dielectric permittivity of the interaction environment (for the water, w ) 78.4 at room temperature). The second, more realistic, system comprises charges interacting on the border between two dielectric media with different values of . One of the media describes water and has  equal to that of water, w, while another represents a protein moiety and takes values p < w. In this paper, two values of p are considered, p ) 20 and 4, the first of which corresponds to the value used in the spherical model of unfolded proteins22,23,27 (the detailed description of the physical principles and parameters of the spherical model have been published earlier22) and the second value equal to the value widely used for electrostatic calculations in folded proteins. In this case, analytical expressions for the interaction potential are available only for certain simple cases, one of them being charges interacting on the surface of dielectric sphere. Even in this simple case, the expression for W(r) is a cumbersome one, and the curious readers are referred to ref 36 for the exact formula. III. Results and Discussions A. Histograms of the Distances between Atoms of Various Residues. Figure 1 displays a representative choice of the acquired distributions of distances (histograms) between titratable atoms of shortest (aspartic acid, D) and longest (lysine, K) charged side chains considered in this study for several separation parameters, m, and residues, X. Each panel presents histograms for X representing charged, polar, and hydrophobic side chains as well as glycine residues. As is seen from Figure 1, all factors influencing shape and parameters of the histograms can be ranked in the order of decreasing importance as follows: (1) separation

Biomacromolecules, Vol. 6, No. 6, 2005 3013

Flexible Nonhomogeneous Polypeptide Chains

Table 1. Variation of Parameters of the Obtained Distance Distributionsa

Figure 1. Distributions of distances between Oδ atoms of asp (panels A, C, and E) and between Nζ atoms of lys (panels B, D, and F) side chains separated by one (panels A and B), three (panels C and D), and five (panels E and F) residues X. Letters at curves denote the residue X, for which the corresponding curve is obtained. All histograms in panels were obtained from 4 × 107 randomly generated conformations (see the text for details). The three upper curves in each panel are shifted upward for clarity.

parameter m, (2) length of the side chains for residues Z and Z′ (see generalized sequence (1)), (3) absence or presence of side chains in intermediate residues X, and (4) internal structure of side chains for intermediate residues X. The relative unimportance of the last factor is clearly illustrated by the fact that for all considered m the histogram with X being, e.g., the tryptophan (W) residue, which has the most bulky double-ring side chain, does not differ considerably from the distribution with X being aspartic acid (D) or valine, which both have fairly short side chains (see Figure 1, panels A, C, and E). Also, there is no significant distinction between distributions with X being methionine (M, long, nonbranched side chain) and threonine (T, short, branched side chain) residues (panels B, D, and F in Figure 1). The obtained distributions have typically one-peaked shapes, but for m ) 1, especially for short-side-chain residues Z and Z′ (see panel A in Figure 1) there are traces of a second peak observed. As was show in the previous work,27 this second peak is mostly pronounced for the case m ) 0 (no residues between) and originates from the interplay of a small number of rotational degrees of freedom between the functional groups of interest. Another interesting observation from Figure 1 is that the influence of the residues X on the parameters of histograms is minimal at moderate m values (2-4), while both for larger (m > 4) or smaller (m < 2) values, this influence is more pronounced (compare panels B and D with panel F or panel C with panels A and E in Figure 1). This strikingly unexpected result can probably be explained by the interplay of two factors. First, growth in the number of rotational degrees of freedom between atoms of interest with m increasing leads to decreased histogram sensitivity to the chain composition. Second, when m increases, the relative weight of the volume occupied by intermediate residues also increases, leading to elevated sensitivity of the histograms to the chain composition.

average distance, Å

distance at histogram maximum, Å

m

Z

Z′

minimal

maximum

minimal

maximum

1

D D E K

D K K K

8.75 (I) 9.82 (I) 10.46 (I) 10.67 (I)

9.19 (G) 10.18 (G) 10.75 (G) 11.08 (G)

9.14 (I) 10.44 (V) 11.18 (L) 11.55 (V)

9.64 (G) 10.81 (G) 11.35 (G) 11.82 (G)

2

D D E K

D K K K

10.65 (E) 11.66 (W) 12.21 (G) 12.49 (G)

10.95 (V) 11.93 (V) 12.57 (V) 12.80 (V)

11.41 (S) 12.36 (G) 12.92 (G) 13.15 (G)

11.70 (V) 12.76 (T) 13.77 (T) 13.76 (I)

3

D D E K

D K K K

12.24 (G) 13.04 (G) 13.51 (G) 13.77 (G)

12.91 (V) 13.76 (V) 14.31 (V) 14.50 (V)

12.97 (S) 13.74 (G) 14.23 (G) 14.39 (G)

13.50 (H) 14.42 (H) 15.13 (H) 15.15 (H)

4

D D E K

D K K K

13.49 (G) 14.23 (G) 14.66 (G) 14.90 (G)

14.83 (V) 15.62 (V) 16.13 (V) 16.30 (V)

14.29 (G) 14.78 (G) 15.16 (G) 15.39 (G)

15.71 (V) 16.42 (V) 17.19 (V) 17.16 (V)

5

D D E K

D K K K

14.63 (G) 15.33 (G) 15.73 (G) 15.95 (G)

16.63 (V) 17.34 (V) 17.82 (V) 17.97 (V)

15.18 (G) 15.66 (G) 16.07 (G) 16.28 (G)

17.56 (V) 18.33 (V) 18.89 (V) 18.86 (V)

6

D D E K

D K K K

15.69 (G) 16.35 (G) 16.72 (G) 16.93 (G)

18.34 (V) 19.00 (V) 19.45 (V) 19.59 (V)

15.93 (G) 16.52 (G) 16.90 (G) 17.12 (G)

19.43 (V) 19.94 (V) 20.48 (V) 20.57 (V)

a Letters in brackets denote the intermediate residues X for which the corresponding value is obtained.

B. Analysis of the Histograms. The main parameters of the obtained distributions (average distances rav, asymmetry δ, and mean square deviations ∆) are categorized with respect to the intermediate residues X in Figures 2-4 and Table 1. Data for rav (Figure 2) and δ (Figure 3) for different X are sorted in increasing order, while for ∆’s (Figure 4), residues X are grouped by ordinary classification scheme (charged, polar, and hydrophobic residues). As is seen from Figures 2-4, there is almost no correlation between sterical properties of the chain, which determine the studied distributions, and physical chemical properties of the residues, which are background for the ordinary classification scheme. Note that the set of minimally allowed distances (the Ramachandran criteria29) used in the present calculations take into consideration not only simple stereochemical effects but to some extent also other physical chemical properties of the residues. For instance, the propensity of oxygen and nitrogen for forming a hydrogen bond is reflected in the fact that the minimally allowed distance for the O-N contacts is significantly smaller (2.6 Å) than, e.g., for hydrophobic carboncarbon contacts (3.2 Å). Nevertheless, a pronounced correlation can be seen at fixed m between the volume occupied by intermediate side chains and their internal structures. For m > 3, residues with no or short branched side chains (G, A, S, and C with exception for E) occupy the smallest volume, and consequently, rav’s for them have the smallest values. The second largest group of residues filling mediocre volumes consists of residues having long, nonbranched side chains (M, K), side chains with branching at Cγ or more distant atoms (L, N, D, Q, R),

3014

Biomacromolecules, Vol. 6, No. 6, 2005

Kundrotas

Figure 2. Average separation obtained from the distributions of distances (see Figure 1) between (i) Oδ atoms of asp (dark gray rectangles, see also the legend on the top of the Figure), (ii) Oδ and Nζ atoms of asp and lys (downward-stroked rectangles), (iii) O and Nζ atoms of glu and lys (light gray rectangles), and (iv) Nζ atoms of lys (upward-stroked rectangles) side chains separated by m residues X (m value for each panel is shown to the right of corresponding panel). Letters at groups of bars denote the residue X, for which the corresponding set of values is obtained. Bold letters denote charged residues, bold italic letters stand for polar residues, and bold gray letters indicate hydrophobic residues. Data are grouped in ascending order of distances for the DXmD chains.

Figure 3. Normalized asymmetry, δ, of the distributions of distances between (i) Oδ atoms of asp (dark gray rectangles, see also the legend on the top of the Figure), (ii) Oδ and Nζ atoms of asp and lys (downward-stroked rectangles), (iii) O and Nζ atoms of glu and lys (light gray rectangles), and (iv) Nζ atoms of lys (upward-stroked rectangles) side chains separated by m residues X (m value for each panel is shown to the right of corresponding panel). Letters at groups of bars denote the residue X, for which the corresponding set of values is obtained. Bold letters denote charged residues, bold italic letters stand for polar residues, and bold gray letters indicate hydrophobic residues. Data are grouped in ascending order of distances for the DXmD chains.

or side chains with ring(s) beginning at Cγ atoms (W, F, Y, H). Finally, the largest volume is taken by the residue with side chains branching at Cβ atoms (I, T, V). However, at m ) 1, the situation is opposite, i.e., side chains with branching at Cβ atoms occupy smaller volumes compared to residues with short side chains. This can be explained by a nontrivial increase of the relation between the volume occupied by the side chains and the total volume occupied by the whole flexible chain (the relative volume). Branching at the Cβ atoms creates sterical forbidden space between the branching point and the backbone, which leads to the largest increase of the relative volume with m increasing; while other side chains do not

create such areas, and therefore, their relative volumes increase more slowly with m. Further, average distances for a fixed residue X increase with an increase in total length of side chains for residues Z and Z′, which is an expected result since longer side chains occupy larger volumes. Average distances plotted in Figure 2 show the same nontrivial sensitivity of the studied distributions to intermediate residues X as those in Figure 1. The difference between the minimal and maximal values of rav for fixed m demonstrates insignificant variances for different residues Z and Z′ (see Table 1), and both absolute and relative differences first slightly decrease from m ) 1 to m ) 2 and then further increase nonlinearly with m.

Flexible Nonhomogeneous Polypeptide Chains

Biomacromolecules, Vol. 6, No. 6, 2005 3015

Figure 4. Mean-square deviations, ∆(AX, ZZ′, m) (eq 3), between distance distributions obtained for four sets of flexible polypeptide chains. Dark gray rectangles stand for ∆ values obtained for the distributions of distances between Oδ atoms of asp side chains (Z ) Z′ ) D) separated by m residues X with respect to the corresponding distributions when asp side chains separated by m alanine residues (for the rest of notations, see the legend on the top of the Figure). The m value for each panel is shown to the right of corresponding panels. Letters at horizontal axis denote the residue X, for which the corresponding value is obtained.

All the distributions presented in this paper have an asymmetric shape with rav < rmax, which is not consistent with the Gaussian-chain concept (rav > rmax). It was also shown previously27 that the Gaussian-chain model is not applicable for the considered m range, since the tails of the Gaussian chain distributions for small m stretch toward sterically forbidden regions. The distances rmax follow the same tendencies as those for rav presented in Figure 2 and Table 1, and therefore, they are not discussed separately. The normalized histogram asymmetry, δ, (Figure 3) does not show the same clear tendencies as rav and rmax. This is probably due to the elevated δ sensitivity to statistical errors. One can notice from Figure 3 that the asymmetry of the histograms for m g 3 does not depend (within the limits of statistical accuracy) both on residues Z and Z′ and on residue X except for the case X ) G (and partly for short sidechain serine and alanine residues). For the glycine residue, a clear trend for histograms to approach a symmetric Gaussian shape for larger m and/or for longer Z and Z′ residues is observed. Finally, Figure 4 summarizes examination of the histogram shapes. Data presented in this figure are based on the meansquare deviations (eq 3), ∆, for one of two chains having intermediate alanine residues, but they allow certain general considerations. First, one can see that the largest ∆ values (the biggest differences in the histogram shapes) are observed at all considered m for the residues having side chains with branches at Cβ atoms (I, T, and V). Second, the shapes of the histograms for larger m are more sensitive to the intermediate residues X rather than to the residues of interest Z and Z′. Meanwhile, at m ) 1, the situation is opposite, and the shape of the histogram is to a larger extent determined by the number of rotational degrees of freedom between the titratable atoms in residues Z and Z′. The last conclusion was also reached in the previous statistical study27 with a smaller pool of the considered chains.

C. Influence of Charge-Charge Interactions. The histograms presented and analyzed in the previous subsections can be viewed as an upside-down scaled free-energy profile (eq 4) for corresponding pair of charges in two cases: (i) when the temperatures are so high that the energy of thermal fluctuations is much larger than the chargecharge interaction energy (RT . Wij(r)) and (ii) when at least one residue in the pair is in its neutral form (Wij(r) ) 0). Then, the maximum in the histogram corresponds to a minimum in the free-energy profile. The situation becomes more complicated when both residues in a pair are in their charged forms. This will be demonstrated below for the case m ) 1 (cases m > 1 are qualitatively similar). Figure 5 shows typical examples of the free-energy profiles (eq 4) for the lysine-aspartic acid (panel A) and aspartic acid-aspartic acid (panel B) pairs with the alanine residue between (m ) 1) for both the “uncharged” case (curves 1) and using different models for calculating electrostatic interactions. The effects of electrostatic attraction (Figure 5A) on free-energy profiles are apparently more pronounced compared to the case of electrostatic repulsion (Figure 5B) for all cases considered. For instance, a shift of the positions of the minima, rmin, in ∆Gij(r) for the attraction case varies from 1.55 Å to 1.67 Å depending on the calculation model, while for the repulsion case, this shift is almost nonexistent, ∼0.2 Å. Another noticeable effect of charge-charge interactions is a change in ∆Gij(r) flatness around the minima. When charges attract each other, flatness of ∆Gij(r) increases, while for the repulsion case, the effect is opposite (Figure 5). Notice that for the model, which is usually applied for the calculation of electrostatic interactions in folded proteins (two dielectric environments with dielectric permittivity of protein moiety p ) 4), the free-energy profile is so flat that, in principle, all distances between the charges in the range 3- ∼12 Å are equally accessible energetically, since the difference ∆∆G(r ) 3Å) ) ∆Gij(r ) 3Å) - ∆Gij(rmin) ) 0.37 kcal/

3016

Biomacromolecules, Vol. 6, No. 6, 2005

Kundrotas Table 2. Experimental Distances between Pairs Lysine-Aspartic Acids in Loops of Human Pro-Matrix Metalloproteinase-2 and Their Comparison to the Energetically Favorable Distances Predicted by Free-Energy Profilesa energy barrier ∆∆G, distance, Å

pair of residues

Figure 5. Free-energy profiles, ∆Gij(r), defined by formula 4 for the pairs lysine-aspartic acid (panel A) and aspartic acid-aspartic acid (panel B) with the alanine residue in between. Curves labeled 1 represent the case of both residues in the pair being in their neutral state, while other curves stand for different models for calculations of electrostatic interactions: simple Coulomb law (curves 2) and two different dielectric environments with values of dielectric permittivity of protein moiety, p ) 20 (curves 3) and p ) 4 (curves 4). Curves 2, 3, and 4 are shifted upward for clarity.

mol is less than the energy of thermal fluctuations at room temperature, 0.58 kcal/mol. For the model, which was previously applied for calculation of electrostatic interactions in unfolded proteins,22,23,27 the barrier ∆∆G(r ) 3Å) ) 0.63 kcal/mol, and therefore, the range of energetically favorable distances (where ∆∆G < 0.58 kcal/mol) is slightly more restricted (3.8-13.9 Å, see curve 3 in Figure 5A) compared to the previous case. The comparison of the theoretical results presented in the paper to experimental data is a difficult task, since experimental data on distance distributions in unfolded protein are absent so far. However, it is possible to extract some comparable data from X-ray structures of folded proteins, since it is plausible to suggest that a pair of charged residues on loops between secondary structure elements and/or protein domains tend during the crystallization process to adopt distances determined by the minimum of the free-energy profiles as described above. Two independent X-ray structures37,38 of the same protein, human pro-matrix metalloproteinase-2 (PDB entries 1EAK and 1CK7), were chosen for such a comparison, and results are given in Table 2. Note that data presented in this table do not pretend to be a comprehensive statistical analysis of experimental distances but are rather an illustration of the practical relevance of the distributions calculated and presented in this work. Agreement between the experimental and theoretical data is good especially if one considers values of the energy barrier ∆∆G for the experimental distances. IV. Conclusions In this study, the distributions of distances (histograms) between titratable sites of charged residues for flexible polypeptide chains of various compositions have been obtained by means of computer simulations. The assessment of the acquired results was focused on the identification and classification of factors influencing the parameters of the

K118A-D120A D185A-K187A D268A-K270A D326A-K328A D370A-K372A K118B-D120B D185B-K187B D370B-K372B K118C-D120C D185C-K187C D268C-K270C D370C-K372C D384C-K386C K118D-D120D D185D-K187D D268D-K270D D326D-K328D D370D-K372D D384D-K386D K118A-D120A D185A-K187A D370A-K372A D437A-K439A K470A-D472A K519A-D521A

theory residue charged neutral between expt form form Data from 1EAK PDB File W 9.55 9.26 10.87 G 10.03 9.64 11.19 G 8.96 9.64 11.19 K 9.61 9.22 10.88 G 5.20 9.64 11.19 W 8.29 9.26 10.87 G 11.27 9.64 11.19 G 4.46 9.64 11.19 W 5.21 9.26 10.87 G 11.12 9.64 11.19 G 7.39 9.64 11.19 G 4.92 9.64 11.19 R 8.84 9.17 10.86 W 5.90 9.26 10.87 G 9.37 9.64 11.19 G 7.77 9.64 11.19 K 9.00 9.22 10.88 G 5.33 9.64 11.19 R 7.97 9.17 10.86 W G G I Q I

Data from 1CK7 PDB File 5.55 9.26 10.87 13.25 9.64 11.19 5.64 9.64 11.19 10.10 8.84 10.86 13.08 9.22 10.91 5.97 8.84 10.86

kcal/mol charged form

neutral form

0.01 0.01 0.02 0.01 0.23 0.04 0.02 0.31 0.15 0.02 0.10 0.25 0.01 0.15 0.00 0.07 0.00 0.23 0.04

0.04 0.01 0.14 0.05 0.91 0.19 0.00 1.24 0.79 0.00 0.38 0.98 0.11 0.63 0.09 0.31 0.09 0.85 0.22

0.15 0.30 0.23 0.03 0.32 0.22

0.69 0.20 0.81 0.02 0.19 0.57

a Experimental values are taken from two different X-ray structures (PDB codes 1EAK and 1CK7)37,38

obtained distributions. It was found that the internal structure of the intermediate residues plays a minor role for a small separation of the residues of interest, while for the large separation, the sensitivity of the histograms to the structure of intermediate side chains increases. The present study goes far beyond the celebrated Ramachandran studies,29 because it comprises analyses of sterically allowed distances between remote residues rather than studies of sterically allowed conformations of different, but still single, residues. Further, it was shown that the intermediate residues with side chains branching at Cβ atoms significantly increase the average distance between the considered residues of interest and considerably change histogram shape. The results of this study are useful from the following two standpoints. First, they can be used for understanding the functions of entropic chains, where excluded volume effects are the crucial factor determining functions of these proteins. Second, when one wishes to analyze statistical thermodynamic properties of a particular flexible chain with significant electrostatic interactions, the distributions presented in this paper would provide information about energetically favorable distances between any pair of charges as described previously.27 Note that applicability of the free-energy profiles described in subsection IIB is not restricted by the simple chargecharge interactions. Energy Wij(r) in formula 4 can stand for other types of interactions as well. For instance, if Wij(r) stands for the energy of dipole-dipole interactions, the freeenergy profile (eq 4) together with relevant histograms can be used for analyzing the influence of hydrogen bonding on the distance between atoms forming the hydrogen bond.

Flexible Nonhomogeneous Polypeptide Chains

Finally, yet another possible application of the theoretical concept described in the paper can be envisaged. The obtained distance histograms have typically a one-peaked shape, which implies that the position of the minimum in the free-energy profile varies with the temperature smoothly, i.e., without sharp changes characteristic for phase transitions. However, for m ) 1, especially for short-side-chain residues Z and Z′ (see panel A in Figure 1), there are traces of a second peak observed. Therefore, the m ) 1 distributions could transform to free-energy profiles with two minima, the relative depth of which depends on the temperature and model interactions. This can lead to a situation where depths of the two minima at certain temperature for a given set of interactions (again, not necessarily simplest charge-charge interactions) become equal, which is the classical situation for a discontinuous or first-order phase transition. As an example of such a transition in a biological system, a transition from a noncompact to compact state of the polyglutamine (polyQ) tract of the protein htt can be mentioned.39 It is known that, in Huntington’s disease and other related neurodegenerative diseases, a polyQ sequence containing less than 36 residues (disordered noncompact structure) is benign, whereas a sequence with only 2-3 additional glutamines (compact structure) is associated with disease risk.40 Another example of such a transition with charged residues involved is the formation of amyloid fibrils by a 10-residue fragment of the Alzheimer Aβ protein 41 (which is responsible for Alzheimer’s disease and associated dementia) and by a 20-residue fragment of the human (mouse) prion protein,42 which is responsible, in particularly, for Creutzfeldt-Jakob disease. Acknowledgment. The author would like to thank Dr. Eugeni Starikow and Dr. Ekaterina Morgunova for critical reading of the manuscript and useful discussions and Prof. Rudolf Ladenstein for the unreserved support and useful discussions. The calculations were partly conducted using the resources of the Swedish National Supercomputer Centre (NSC) at the Linko¨ping University, Sweden. References and Notes (1) Fisher, E. Ber. Dtsch. Chem. Ges. 1894, 27, 2985. (2) Wright, P. E.; Dyson, H. J. J. Mol. Biol. 1999, 293, 321. (3) Dunker, A. K.; Brown, C. J.; Lawson, J. D.; Iakoucheva, L. M.; Obradovic, Z. Biochemistry 2002, 41, 6573. (4) Uversky, V. N. Protein Sci. 2002, 11, 739. (5) Vucetic, S.; Brown, C. J.; Dunker, A. K.; Obradovic, Z. Proteins: Struct., Funct., Genet. 2003, 52, 573. (6) Uversky, V. N. Eur. J. Biochem. 2002, 269, 2. (7) Radivojac, P.; Obradovic, Z.; Smith, D. K.; Zhu, G.; Vucetic, S.; Brown, C. J.; Lawson, J. D.; Dunker, A. K. Protein Sci. 2004, 13, 71.

Biomacromolecules, Vol. 6, No. 6, 2005 3017 (8) Schweers, O.; Schonbrunnhanebeck, E.; Marx, A.; Mandelkow, E. J. Biol. Chem. 1994, 269, 24290. (9) Ueda, K.; Fukushima, H.; Masliah, E.; Xia, Y.; Iwai, A.; Yoshimoto, M.; Otero, D. A. C.; Kondo, J.; Ihara, Y.; Saitoh, T. Proc. Natl. Acad. Sci. U.S.A. 1993, 90, 11282. (10) Lee, V. M. Y.; Balin, B. J.; Otvos, L.; Trojanowski, J. Q. Science 1991, 251, 675. (11) Hokenson, M. J.; Uversky, V. N.; Goers, J.; Yamin, G.; Munishkina, L. A.; Fink, A. L. Biochemistry 2004, 43, 4621. (12) Arima, K.; Ueda, K.; Sunohara, N.; Hirai, S.; Izumiyama, Y.; Tonozuka-Uehara, H.; Kawai, M. Brain Res. 1998, 808, 93. (13) Dunker, A. K.; Brown, C. J.; Obradovic, Z. AdV. Protein Chem. 2002, 62, 25. (14) Iakoucheva, L. M.; Radivojac, P.; Brown, C. J.; O’Connor, T. R.; Sikes, J. G.; Obradovic, Z.; Dunker, A. K. Nucleic Acids Res. 2004, 32, 1037. (15) Uversky, V. N. J. Biomol. Struct. Dyn. 2003, 21, 211. (16) Munishkina, L. A.; Henriques, J.; Uversky, V. N.; Fink, A. L. Biochemistry 2004, 43, 3289. (17) Uversky, V. N.; Fink, A. L. Biochim. Biophys. Acta: Proteins Proteomics 2004, 1698, 131. (18) Uversky, V. N. FEBS Lett. 2002, 514, 181. (19) Tcherkasskaya, O.; Uversky, V. N. Protein Peptide Lett. 2003, 10, 239. (20) Obradovic, Z.; Peng, K.; Vucetic, S.; Radivojac, P.; Brown, C. J.; Dunker, A. K. Proteins: Struct., Funct., Genet. 2003, 53, 566. (21) Tcherkasskaya, O.; Davidson, E. A.; Uversky, V. N. J. Proteome Res. 2003, 2, 37. (22) Kundrotas, P. J.; Karshikoff, A. Phys. ReV. E 2002, 65, 011901. (23) Kundrotas, P. J.; Karshikoff, A. Protein Sci. 2002, 11, 1681. (24) Zhou, H. X. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 3569. (25) Goldenberg, D. P. J. Mol. Biol. 2003, 326, 1615. (26) Fitzkee, N. C.; Rose, G. D. Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 12497. (27) Kundrotas, P. J.; Karshikoff, A. Biochim. Biophys. Acta: Proteins Proteomics 2004, 1702, 1. (28) Branden, C.; Tooze, J. Introduction to Protein Structure, 2nd ed.; Taylor & Francis Group: New York, 1998. (29) Ramachandran, G. N.; Sasisekharan, V. AdV. Protein Chem. 1968, 28, 283. (30) Kundrotas, P. J.; Karshikoff, A. J. Chem. Phys. 2003, 119, 3574. (31) Tanford, C. Physical Chemistry of Macromolecules; Willey: New York, 1961. (32) McCaldon, P.; Argos, P. Proteins: Struct., Funct., Genet. 1988, 4, 99. (33) Abramowitz, M.; Stegun, I. Handbook of Mathematical Functions and Formulas, 9th printing; Dover: New York, 1972. (34) Kundrotas, P. J.; Lapinskas, S.; Rosengren, A. Phys. ReV. B 1995, 52, 9166. (35) Lee, J. Y.; Kosterlitz, J. M. Phys. ReV. Lett. 1990, 65, 137. (36) Kundrotas, P. J.; Karshikoff, A. Phys. ReV. E 2002, 65. (37) Morgunova, E.; Tuuttila, A.; Bergmann, U.; Tryggvason, K. Proc. Natl. Acad. Sci. U.S.A 2002, 99, 7414. (38) Morgunova, E.; Tuuttila, A.; Bergmann, U.; Isupov, M.; Lindqvist, Y.; Schneider, G.; Tryggvason, K. Science 1999, 284, 1667. (39) Starikov, E. B.; Lehrach, H.; Wanker, E. E. J. Biomol. Struct. Dyn. 1999, 17, 409. (40) Chen, S. M.; Ferrone, F. A.; Wetzel, R. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 11884. (41) Ippel, J. H.; Olofsson, A.; Schleucher, J.; Lundgren, E.; Wijmenga, S. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 8648. (42) Kuwata, K.; Matumoto, T.; Cheng, H.; Nagayama, K.; James, T. L.; Roder, H. Proc. Natl. Acad. Sci. U.S.A. 2003, 100, 14790.

BM0503266