Interresidue Contacts in Proteins and Protein-Protein Interfaces and Their Use in Characterizing the Homodimeric Interface Rudra Prasad Saha, Ranjit Prasad Bahadur, and Pinak Chakrabarti* Department of Biochemistry, Bose Institute, P-1/12 CIT Scheme 7M, Calcutta 700 054, India Received April 26, 2005
The environment of amino acid residues in protein tertiary structures and three types of interfaces formed by protein-protein associationsin complexes, homodimers, and crystal lattices of monomeric proteinsshas been analyzed in terms of the propensity values of the 20 amino acid residues to be in contact with a given residue. On the basis of the similarity of the environment, twenty residues can be divided into nine classes, which may correspond to a set of reduced amino acid alphabet. There is no appreciable change in the environment in going from the tertiary structure to the interface, those participating in the crystal contacts showing the maximum deviation. Contacts between identical residues are very prominent in homodimers and crystal dimers and arise due to 2-fold related association of residues lining the axis of rotation. These two types of interfaces, representing specific and nonspecific associations, are characterized by the types of residues that partake in ‘self-contacts’smost notably Leu in the former and Glu in the latter. The relative preference of residues to be involved in ‘selfcontacts’ can be used to develop a scoring function to identify homodimeric proteins from crystal structures. Thirty-four percent of such residues are fully conserved among homologous proteins in the homodimer dataset, as opposed to only 20% in crystal dimers. Results point to Leu being the stickiest of all amino acid residues, hence its widespread use in motifs, such as leucine zippers. Keywords: protein-protein interaction • residue environment • contact preferences • amino acid classification • identification of homodimers from crystal structures
Introduction Noncovalent interactions between amino acid residues are the basis for the stability and the specificity of the folding process and how protein chains interact among themselves to form oligomeric assembly and complexes. Specific recognition between protein subunits lead to the formation of functionally relevant assembly,1 and the physicochemical properties of the interfaces arising out of protein-protein interactions have been studied in great details.2-6 It has been shown that such interfaces are characterized by a core region whose composition is quite similar to the core that is found in protein tertiary structures.7,8 The crystal packing contacts, on the other hand, provide a category of low-affinity, nonspecific interactions.9-12 Yet, the noncovalent forces that engage residues within a molecule also hold the molecules together in both types of associations. Indeed, the same geometrical parameters can identify a particular type of interaction occurring intramolecularly or intermolecularly in small molecule crystal structures.13,14 Also, the packing densities of the atoms across the interface are similar to those in the interior of protein monomers.15 In addition to residue frequencies, the pairing preferences across biologically relevant interfaces have also been analyzed.16-19 It is of interest to know if such contact preferences * To whom correspondence should be addressed. Fax: +91-33-2334-3886. E-mail:
[email protected];
[email protected].
1600
Journal of Proteome Research 2005, 4, 1600-1609
Published on Web 08/24/2005
are the same as those found within protein structures,20-22 or in crystal packing interfaces. The work presented here addresses the issue. We find that residue contacts provide a telltale signature of biological homodimers, as opposed to 2-fold symmetry related interfaces that are observed only in crystals. The similarity in the environments of residues in protein structures also provides a means to classify the amino acid residues, and thereby reduce the number of letters in the amino acid alphabet, such that a sequence of lesser complexity and yet capable of acquiring a native fold can be designed. Though contact preferences are derived from the folded structures, the favored residue-pairings may also be important for the initiation of folding and also provide the driving forces responsible for protein-protein recognition, and thus be useful for protein engineering experiments.
Methods Files were collated from the Protein Data Bank (PDB) at the Research Collaboratory for Structural Bioinformatics (RCSB)23 in the following categories (with details provided in the cited references): (a) protein tertiary structures (a total of 555 chains in 531 files);24 (b) protein-protein complexes (70);5 (c) homodimeric proteins (122);8 and (d) pairs of molecules involved in lattice contact (with interface area, for both the components together, > 800 Å2) in 10.1021/pr050118k CCC: $30.25
2005 American Chemical Society
Residue Contacts in Proteins and Interfaces
monomeric protein crystals,12 which can be further subdivided into contacts with (103) or without (85) 2-fold symmetry. Calculation of Association Parameters. The propensity20 of a residue X to be in the environment of Y (the central residue) is as follows 20
∑N
NXY/
XY
x)1
PXY )
20
∑N
NX/
X
x)1
where NXY is the number of atoms of residue X found close (within 4.5 Å) to atoms of residue Y, NX is the total number of atoms contained in residue type X in the entire database and the summations are over all the 20 residue types. All of the atoms of residue type X within a distance of 4.5 Åsa threshold value that was found suitable in an earlier study24sfrom any atom of residue Y were assumed to be interacting. For the interactions within the tertiary structure, two sets of calculations were performedsone using all the atoms in the residue, another restricting to side chain atoms (Cβ onward) only. For interactions across the interface, for a ‘central’ residue located on one protein chain, the interacting atoms came from the other chain only. Interface residues from both the chains were considered as central residues for protein-protein complexes, while these were restricted to one subunit only for interfaces related by 2-fold symmetry. For interface residues, only the atoms located in the interface were used in the calculation. Interface Atoms and Area. The interface atoms are the ones that showed a reduction in the accessible surface area (ASA) by more than 0.1 Å2 as the two isolated subunits were brought to the binding mode. ASA was calculated using the program NACCESS,25 which implements the algorithm of Lee & Richards.26 The default probe radius of 1.4 Å and Z-slice (thin slices through the molecule over which the exposed arc lengths for each atom are calculated and the summation over all z-values gives the final area) of 0.05 Å were used. The interface area is the ASA (from a given subunit) that is buried on complexation. In cases with a 2-fold symmetry relating the protein subunits, a residue close to the 2-fold axis may interact with the same residue from the other subunits thus making up a pair of self-contacting residues. The contribution of a self-contacting residue to the interface area is designated as the self-contacting area (and this may include area in excess of that buried between the self-contacting mates, if the residue happens to interact with more residues across the interface). A self-contacting score, SC was defined for each 2-fold related interface by multiplying the self-contacting area (∆ASAi) for a residue (i) by the percentage contribution (Ci) of that residue to the self-contacting area in homodimers (Figure 4) and then summing over all such residues of that interface. 20
SC )
∑ ∆ASA * C i
i
i)1
The molecular diagrams were made using GRASP27 and PyMOL.28
Results and Discussion A multitude of relatively weak nonbonded interactions results in the stable native fold of proteins, as well as the strong and specific interactions between them. Protein-protein as-
research articles sociations can be of two typessthe permanent association between protein subunits leading to the quaternary structure, and protein chains that exist independently in vivo forming complexes, with a time scale of binding that can vary widely. Among the oligomeric proteins only the homodimers (122 in number),8 together with 70 complexes in the second category,5 have been studied. One can envisage another type of association that is observed in vitro. When protein crystals are formed, a given chain is packed in three dimensions using crystallographic symmetries and lattice translations and interfaces of various sizes are formed between neighboring molecules. Though nonphysiological in nature, we analyze 188 interfaces found in crystals of monomeric proteins,12 as the comparison of the results with those observed in biologically relevant associations would provide useful insight into the process of biomolecular recognition and specificity. Residue Contact Preferences in the Tertiary Structure. The propensity values for all the 20 amino acid residues to be in the environment of a particular residue are given, row-wise in Figure 1a. A value of 1.0 suggests a neutral association or preference of a residue X to be in the environment of a ‘central’ residue, Y; larger values indicate a preferential association, while smaller values stand for avoidance. The matrix is not symmetric. For example, His does not prefer Asp in its total environment as much as Asp prefers His. As expected, within protein tertiary structures the hydrophobic residues are in contact with each other, as are the oppositely charged residues, and the trend is much more clearer when only the side-chain atoms (Figure 1b) are considered rather than the whole residue (Figure 1a). Many of the contact preferences can be explained by hydrogen bonding29,30 between side chains, electrostatic, and hydrophobic interactions.20,21 Though we are concerned with nonbonded interactions, S-S covalent bridges also get included as they are within the limiting distance used, thus contributing to the high Cys-Cys contact preferences. Also, Cys residues can be indirectly brought into close proximity by virtue of their binding to the same metal ion, as are the His ligands being part of the same metal center.30 Pairing between Met residues also has high occurrence. Pro has a strong preference for aromatic residuessa result of the stabilizing C-H‚‚‚π interactions that can form when the two types of rings have specific relative orientations.24 Similar stabilizing interactions also exist between the sulfide/disulfide group and the face of the aromatic ring, making the Met-aromatic and Cys (rather halfcystine)-aromatic pairs stand out.31-33 Classification of Residues Based on Their Environment. The values (corresponding to the color-code) in a given row in Figure 1a are the propensities of different residues to be around a central residue (say, Y) cited in the left. Hence, the correlation coefficient, CCYY′ between the values in two rows indicates the degree of similarity in the environment of two residues Y and Y′. We found out CCYY′ values for all possible pairs and subjected (1-CCYY′) values (which provide a measure of the ‘distance’ between the two residues) to complete-linkage cluster analysis.34 Results presented in Figure 2 provide a classification of amino acid residues based on the similarity of the environment. There are nine classes; using one-letter amino acid code they are {G,S,T}, {A,V}, {P,F,Y,W}, {H}, {C}, {M,L,I}, {D,E}, {N,Q}, and {R,K}. Interestingly, His which is normally placed along with Arg and Lys as a basic residue, also has an aromatic ring. Because of this dual character it constitutes a unique group. Short polar residues Ser and Thr appear to have a very similar environment to Gly, and together they constitute Journal of Proteome Research • Vol. 4, No. 5, 2005 1601
research articles
Saha et al.
Figure 1. Color-coded representation of propensities of different residues (in columns) to be in the environment of any given residue (row) in protein structures (a,b), physiologically observed interfaces (c-e) and interfaces formed in crystals of monomeric proteins(f-h). The propensities within the same protein chains were calculated considering the atoms in (a) the whole residue and (b) belonging to the side chains only. For the dimeric interfaces, in addition to the normal one (c), calculations were also done neglecting the atoms of residues (close to the 2-fold axis) involved in ‘self-contact’ (d). For the monomeric proteins also, the values were calculated for the whole dataset (f) and after splitting it into two categoriessinterfaces formed between 2-fold (crystallographic or noncrystallographic) related subunits (g) and the rest involving all other crystal symmetries (h). The colors corresponding to different ranges of propensities are explained on top right. The ordering of residues are as follows: Gly (no side chain), Ala (no side-chain torsion angle), Pro (restricted torsion angles), Cys and Met (S-containing residues), Leu, Val, and Ile (aliphatic branched side chains), Phe, Tyr, Trp, and His (aromatics, the last one can also be charged), Arg, Lys, Asp, and Glu (charged), and Asn, Gln, Ser, and Thr (neutral polar). 1602
Journal of Proteome Research • Vol. 4, No. 5, 2005
research articles
Residue Contacts in Proteins and Interfaces
although it is equally important to have the correct geometry of interaction to achieve this.24 The degree of evolutionary conservation within a family of homologous sequences has been measured by dividing the 20 amino acids into six classes, one of which consists of Gly and Pro.43 However, their conformational properties are too diverge42 to conform to a single class and even from the point of view of their environment they are quite dissimilar. The classification proposed here has been used to calculate the sequence entropy at the core and rim regions of proteinprotein interfaces,5,8 and the contrast between the computed values appears to be stronger with the present classification than the other.44
Figure 2. Minimum spanning tree obtained for (1-CCYY′) (where CCYY′ is the correlation coefficient of propensity values of 20 residues to be in contact with two central residues Y and Y′) values using complete-linkage cluster analysis with a threshold distance of 0.30. The distance between two residues or the maximum of all the distances between two clusters is indicated when they are below the threshold.
a class. The strong association between Pro and other aromatic residues24 has made them constitute a single group. There are evidences indicating that the existing complexity of naturally occurring sequences is not an absolute requirement for achieving the unique native structure and that proteins can be designed with a reduced set of amino acids.35-39 It has been shown that the minimum number of letters required to fold a protein is around ten,40,41 quite similar to what we find. Other residue properties have also been used for devising set of reduced amino acid alphabets. For example, using the distribution of φ, ψ, χ1 conformational angles one could have only six distinct classes of residues.42 When one is interested in local structures in protein molecules, such a classification can be used to distinguish between a conservative and nonconservative substitution. The present classification is based on the similarity of the environment of residues and can be useful in making substitutions on one component of a complex or on both the components across the interface so as to improve the binding affinity. For example, with a Pro on one side it may be beneficial to have an aromatic residue on the other side,
Residue Contacts in the Interfaces. On average an interface residue is in contact with 2.5 (in homodimers), 2.4 (complexes), and 2.1 (crystal interfaces) residues in the other subunit, with very little differences between datasets or between residues (for example, in homodimers, the small Gly has a number of 2.0, while a large residue, like Trp has a value of 3.2). The striking feature of the dimeric interface is the contact between the same types of residues (along the diagonal) (Figure 1c). This is because a residue that is close to the 2-fold axis can be in contact with the residue with the same number from the other subunit, which we call ‘self-contact’ (Figure 3). When the atoms of such residues are excluded, the calculated values (Figure 1d) are very similar to the propensities (Figure 1a) found within protein chains. The propensities of residues to be in contact across crystal interfaces are also similar, and of these the subclass containing the interfaces formed by 2-fold related molecules has values (Figure 1g), which like the homodimeric interfaces, are large along the diagonal. However, crystal dimers show contacts involving pairs of residues, such as Glu, Lys, and Aspsnot there in biological dimers. The depletion of salt bridges in homooligomers and the extreme general preference for interactions between identical amino acid residues have been noted earlier, though no explanation, other than the possible ‘evolutionary advantage’, was provided for the latter observation.18 All of our datasets indicate the preferential contact between oppositely charged residues and we also establish the reason for the interaction between identical residue types in homodimers. Contacts between the same types of residues are not very prominent in interfaces found in complexes (Figure 1e) and are indeed avoided for Cys, Ile, Tyr, Trp, His, and Ser. Of the aromatic residues, Phe is an exception in that it has a strong preference to contact another Phe across the interface. HisHis contacts in tertiary structures are brought about by binding to the same metal center,30 and their absence in complexes indicate that interactions between two protein chains are not usually stabilized by such cation mediated contacts. Val and Ile seem to interact with the hydrophobic part of Gln, but not Asn (which has a smaller nonpolar component in its side chain). We compare the set of propensities observed around a given residue in protein tertiary structures to equivalent values calculated for the interfaces. For row-wise comparison between the matrix in Figure 1a and the other matrixes in the Figure 1, a Euclidian distance ∆P can be computed in the 19-dimension space of propensity, using 20
(∆P)2 ) 1/19
∑ (P
X
- P′X)2
x)1
Journal of Proteome Research • Vol. 4, No. 5, 2005 1603
research articles
Saha et al.
Figure 3. Two displays of homodimeric molecules (subunits in different colors), (a) L-2-haloacid dehalogenase (PDB file, 1aq6) and (b) hypoxanthine phosphoribosyltransferase (1tc1). On the left is the surface representation (overlayed on to the stick model) looking down the 2-fold axis (encircled). The self-contacting residues are excluded, resulting in a channel around the axis. On the right is the stereoview (perpendicular to the direction used in the former) showing only the self-contacting residues, with the 2-fold axis being vertical.
Figure 4. Histogram showing the percentage composition of residues in the self-contacting area of the interface in (a) homodimers (red) and (b) crystal dimers (blue).
where PX - P′X are propensities of residue type X to be around a given residue in the two matrixes. Distances provided in Table 1 indicate that the maximum difference occurs for Cys, especially for the interfaces formed in the complexes and in the crystals of monomerssno doubt brought about by the absence of disulfide bonds or metal-bound Cys residues in such systems. Another residue showing a rather different environment in crystal interfaces is Trp. Comparing Figure 1, parts a and f, shows that while in tertiary structures many residues have a neutral preference for Trp (color code, white), the 1604
Journal of Proteome Research • Vol. 4, No. 5, 2005
contrast is more in crystal contacts (residues are usually highly favoredsall aromatics and Prosor highly disfavored). Being a surface residue, Gly is found in the environment of most other residues (column under ‘G’ in Figure 1f) in crystal contacts, while hydrophobic residues with aliphatic chains, such as Leu, Val, and Ile are avoided. Pro is also found in the environment of many other residues, notably aromatics. Overall however, the Euclidean distance does not differ much between the different interfaces and the tertiary structure, being the maximum (0.78) for the crystal contacts, excluding the 2-fold related ones, indicating the environment of amino acid residues is rather invariant. This conclusion is also reflected in the validity of the same set of potentials of mean force for interresidue interactions both for intramolecular, as well as intermolecular regimes.22 Propensities of Residues to be Involved in Self-Contact. On average, based on numbers, 7.5% of interface residues are involved in self-contact in homodimers; this translates to 11.9% in terms of area. As self-contacts are important for packing of residues around the 2-fold axis, we determined the average contribution of each residue type to the total surface area buried by different self-contacting residues in dimeric interfaces (Figure 4). Leu has by for the largest contribution (13%), followed by Phe (8%), Val, Arg (7%), Ile, Met, Tyr, and Thr (6%).
research articles
Residue Contacts in Proteins and Interfaces Table 1. Euclidian Distance of Propensity Values of Residues to Be Located around a Given Residue Type in Protein Tertiary Structures and Different Types of Interfaces type of interfacea amino monomer, monomer, acid dimer dimer,no complex monomer 2F only no 2F residues (1c) SC (1d) (1e) (1f) (1g) (1h)
Gly Ala Pro Cys Met Leu Val Ile Phe Tyr Trp His Arg Lys Asp Glu Asn Gln Ser Thr Allb
0.106 0.049 0.069 0.078 0.115 0.034 0.061 0.058 0.113 0.071 0.100 0.181 0.075 0.087 0.133 0.087 0.163 0.159 0.094 0.107 0.09
0.076 0.040 0.056 1.314 0.071 0.024 0.032 0.064 0.047 0.074 0.074 0.118 0.057 0.092 0.141 0.095 0.112 0.095 0.077 0.066 0.13
0.129 0.247 0.192 4.533 0.267 0.123 0.086 0.235 0.171 0.102 0.233 0.260 0.145 0.190 0.186 0.153 0.146 0.136 0.145 0.167 0.37
0.169 0.138 0.152 0.597 0.284 0.121 0.153 0.452 0.126 0.178 2.247 0.430 0.127 0.118 0.139 0.133 0.158 0.093 0.174 0.127 0.29
0.226 0.267 0.215 1.097 0.559 0.226 0.216 0.500 0.428 0.336 3.220 0.756 0.217 0.151 0.204 0.175 0.278 0.278 0.430 0.210 0.48
0.587 0.114 0.226 4.029 3.756 0.302 0.324 0.741 0.649 0.256 2.621 0.744 0.292 0.206 0.305 0.193 0.295 0.274 0.180 0.246 0.78
a Matrixes correspond to Figure numbers given in parentheses. Abbreviations used: SC ) self-contact; 2F ) 2-fold related interfaces. b Calculation done using all the 400 elements of the matrix.
An example of three pairs of self-contacting Leu residues is shown in Figure 3b. Considering the 2-fold related interfaces in crystals of monomeric proteins, a smaller percentage (6.7%) of the interface area originates from self-contacting residues, and the residues contributing the maximum to the selfcontacting interface area are Glu (12%), Arg (9%), Leu (8%), Asp, and Tyr (7%). Thus, the crystal contacts are characterized by the presence of a higher percentage of charged residues in the self-contacting positions. There are instances of genetically engineered monomeric version of a protein that exists in the dimeric form. This was achieved in the case of Trypanosoma triose-phosphate isomerase, by altering residues in the loop region.45 From our result, targeting the self-contacting residues for example, mutating a Leu to a charged residuesmay achieve a similar effect. Distinguishing Specific from Nonspecific Interfaces. Proteinprotein interactions are integral to many mechanisms of cellular control, including protein localization, competitive inhibition, allosteric regulation, gene regulation, and signal transduction. Specific interactions between protein subunits also result in oligomeric assembly of protein molecules. Under conditions of crystallization protein molecules associate related by various symmetries of the crystal lattice. As a result, a 2-fold symmetry observed in a protein crystal could indicate a homodimeric molecule (i.e., the symmetry would be retained under the physiological condition), or a monomeric molecule, indicating that the nonspecific interaction generated by the symmetry exists only in the crystal. Various parameters have been devised, the size of the interface being one of them, to distinguish the physiological dimers from the ‘pseudo’-dimers that can exist in the crystals of monomeric proteins.12,46,47 There are many recalcitrant cases where a single parameter is insufficient to make a proper distinction and a combination of parameters (usually three) can make the correct categoriza-
Figure 5. Histograms of two features distinguishing homodimeric interfaces (red) from the 2-fold related crystal interfaces of monomeric proteins (blue). (a) Surface area (Å2) buried due to self-contact; (b) SC, the self-contacting score. Numbers on the horizontal scale are upper limits of the bins in the histograms. 9 and 10 PDB files in the two categories did not have any selfcontacting residue and are not included in the plots.
tion with greater than 90% accuracy.12 On the basis of what has been presented in the previous section, the extent and the types of residues making up the self-contacting area can also be used as additional features to discriminate between homodimeric and monomeric proteins in the crystalline state. The average numbers of residues involved in self-contacts in homodimers and crystal dimers are 4((2) and 2((1), respectively, suggesting a lesser contribution of such interactions across crystal interfaces. Figure 5a provides a comparison of the surface area (in one subunit) that gets buried due to the self-contacting residues, the average values being 248((119) and 139((70) Å2 for the homodimers and crystal dimers, respectively. Nine out of 122 structures in the former category and 10 out of 103 in the latter have no residue involved in selfcontacts and are not included in the statistics. 46% of homodimers have a value greater than 240 Å2, while it is only 6% for crystal dimers. At the other end (the area < 90 Å2), the values are 5% and 27%, respectively. A scoring function, SC, was designed involving both the self-contacting area in a file and the composition of self-contacting residues in homodimers, and the distributions shown in Figure 5b may be more discriminatory than Figure 5a. The average values are 16((8) and 8((5) Å2, respectively. 51% of homodimers have a value greater than 15 Å2, in contrast to only 4% for crystal dimers. Thus, self-contact on its own cannot completely distinguish homodimers from crystal dimers, but it is a new feature that may be used in conjunction with other parameters. Specifically, in an earlier work12 it was not possible, using a combination of three parameters, to conclusively discriminate 8 homodimeric interfaces from those observed in crystals of monomeric proteins. However, if an additional condition of an SC value greater than 15 is used, five of these interfaces can be correctly classified (Table 2). Journal of Proteome Research • Vol. 4, No. 5, 2005 1605
research articles
Saha et al.
Table 2. Self-Contacting Residues and SC Score for Interfaces that Could Not Properly be Identified12 as Belonging to Homodimers PDB file
protein
self-contacting residues
SC (Å2)
remarka
1a3c 1bif 1hss 1m6p 1reg 1sox 2mcg 2sqc
Pyrimidine operon regulator PyrR 6-P-fructo-2-kinase/bisphosphatase Alpha-amylase inhibitor Mannose-6-P receptor T4 Reg A Sulfite oxidase IgG lambda light chain dimmer (Mcg) Squalene-hopene cyclase
Met 118, Asp 119, Pro 144, Arg 146 Tyr 224, Val 225, Tyr 239, Met 242 Leu 90, Ser 94, Val 98 Met 93, Met 116 Arg 91, Thr 92, Met 94 Arg 377, Asn 437, Val 438, Pro 440, Asp 441 Gln 40, Phe 122, Thr 166, Ser 179, Cys 215 Asp 246, Leu 249, Lys 285
21.08 12.65 16.79 4.33 20.68 16.14 16.00 10.69
dimer inconclusive dimer monomer dimer dimer dimer inconclusive
a
Identifiable as monomer, dimer, or inconclusive.
Evolutionary Conservation of Self-Contacting Residues. In the absence of any self-contacting residue there would be a narrow channel running along the 2-fold axis of homodimers, as can be seen in Figure 3, left panel. Thus, these residues avoid the presence of any void between the subunits and contribute to their efficient packing. There are examples of transient and permanent oligomers18,49 and it would be of interest to see if the self-contacting residues have any role in determining the time scale of interactions between the two subunits in these two categories. Figure 6. Surface representation of one subunit of wheat germ agglutinin (9wga), with the interface region indicated in red and the 2-fold axis marked. In the homodimer in (a), self-contacting residues (Met10, Glu11, Asn57, Asn58, Pro82, and Arg84) are indicated in stick model, while in the interface formed due to contact in crystal (b), there is no such residue.
The usefulness of self-contacts as a discriminatory criterion can be exemplified. For multi-chain proteins, some PDB files do not necessarily contain the coordinates of the correct quaternary structure and symmetry transformation has to be applied to subunits to get the correct assembly. For such files, the Protein Quaternary Structure (PQS) server generates the biologically relevant assembly by analyzing the interfaces formed by the application of all possible symmetry operators.48 Wheat germ agglutinin provides a typical example. The PDB file (9wga) contains a small interface (area, 224 Å2) formed due to crystal contact, while the homodimeric interface, seen in the file in the PQS server, has a much larger area (2288 Å2). With no self-contacting residues in the former and six in the latter (Figure 6), one can as well use this condition quite convincingly to identify the correct quaternary structure for the molecule. Additionally, Ponstingl et al. provides a list of proteins for which homologous pairs, one existing as monomer and the other dimer, are known (Table 3).46 Out of the 6 cases, in four the 2-fold related crystal packing interface for monomers do not contain any self-contacting residues; in the last case, the correct classification can easily be achieved using SC. Only in the case of galectin, the correct classification, which was also difficult following the earlier procedure,46 was the distinction less convincing; but even here the SC parameter for the monomer was smaller than that of the dimer. As already mentioned, nine homodimers do not have any self-contacting residues. The PDB codes for these are: 1aa7 (with interface area, 1125 Å2), 1b67 (1607), 1b8a (4391), 1brw (1083), 1bxg (1041), 1coz (1050), 1cvu (2436), 2ilk (4542), and 8prk (969). Though the interface areas for the five of them are on the lower side, these are still within one standard deviation of the average value of 1940 ((1100) Å2 for homodimers,8 and there is no apparent reason these interfaces should be devoid of self-contacting residues. 1606
Journal of Proteome Research • Vol. 4, No. 5, 2005
Numerous studies have shown that based on the conservation of residues one can distinguish the binding sites from exposed protein surfaces.44,50-57 During molecular evolution, of all the oligomer contacts it may be easier to optimize the interactions involving a pair of self-contacting residues, as it depends on mutating a single residue, while a change in any other interface residue would entail compensating mutations in several other positions. If a self-contacting residue is important in a homodimeric structure it should be conserved at the same position in all other members in the same protein family, as delineated in the HOMSTRAD structural alignment database.58 Only those structures were considered for which HOMSTRAD had at least two homologous members. Data in Table 4 show that 34% of the self-contacting residues are fully conserved. Though the value is not very high it may be pointed out that while conservative changes, such as exchanges between hydrophobic residues, are normally permitted in such studies,51 we have been rather strict in not even allowing such changes by restricting to complete conservation. Also, we have not restricted to residues for which the self-contacting interaction is only through the side chain (with the consequent greater evolutionary pressure for conservation), or excluded the ones that are near the periphery of the interface (and can thus participate in other polar interactions also). However, the significance of the result can be further appreciated if one considers it against the background of a value of 20% conservation obtained for self-contacting residues in crystal dimers. Though present in a rather small number the most conserved residue (60%) is Trp (Figure 7), as has also been noted in protein-protein binding sites in general.54 Some other residues with high levels of conservation are Cys (55%) and Gly (50%). To find out if a short stretch of the polypeptide chain in homodimers can contribute a large percentage of self-contacting residues we identified the structures, which have at least three such residues in a 10-residue long span. There are 46 such examples in 38 files, 19 of which have the residues in an R-helical conformation, a typical example being shown in Figure 8. Thus, packing between R-helices from two subunits can optimally place residues in self-contacting positions. If one looks for the shortest stretch having the maximum number of self-contacting residues, then there are 36 examples in 30 files
research articles
Residue Contacts in Proteins and Interfaces Table 3. Closely Homologous Monomer/Dimer Pairsa PDB files family
dimer
interface area (Å2) monomer
self-contact area (Å2)
monomer
dimer
Ribonuclease Galectin (S-lectin)
1bsr 1slt
1afk 1bkz
1888 536
816 739
dimer
155 142
Cu, Zn superoxide dismutase Hemoglobin Diphtheria toxin Aminopeptidase/ creatinase
1xso
1eso
661
303
74
3sdh 1tox 1chm
1fip 1mdt 1xgs
873 3720 3170
269 559 1161
114 105 252
monomer
141
self-contacting residues dimer
Leu 28, Cys 32 Val 5, Ala 6, Cys 130, Val 131
monomer
Pro 15, Gly 16, Val 18
Ile 111, Val 146
182
Tyr 75 Leu 390 Leu 128, Asn 129, Leu 130, Gln 200, Tyr 332
SC (Å2) dimer
monomer
13.4 8.1
7.2
4.9
Val 148, Ile 164, Arg 171
6.8 13.6 18.1
12.4
a The structures are taken from Table 2 of ref 46, excluding three cases where the monomeric proteins do not give rise to any 2-fold related crystal dimers. The crystal packing was generated using the program CRISPACK.68
Table 4. Conservation of the Self-Contacting Residues in Homologous Structures no. of self-contacting residues interface type
total
considereda
conserved
% conserved
Homodimers Crystal dimers
469 225
298 147
100 30
34 20
a Only those PDB files are considered for which HOMSTRAD has two or more homologous members.
Figure 8. Stereoview of the 22-residue long R-helix having seven self-contacting residues in NADPH FMN-oxidoreductase (1vfr), with the 2-fold axis of the homodimer indicated.
Figure 7. Histogram of the numbers of self-contacting residues and those that are conserved (in black) among homologous structures in homodimers.
in which 3 out of 6 contiguous residues are in self-contacting positions. If one considers 298 homodimeric residues (Table 4), then 44% of them are in R-helix and only 15% in β-sheet, whereas the corresponding values for 147 residues located in crystal dimers are 29% and 20%, respectively. Leu, The Stickiest Residue. In addition to the role in dimeric association, the stickiness of Leu residues has also been used by nature for the construction of other stable noncovalent linkages, such as the leucine zipper, whose characteristic motif is a periodic repeat of leucines at every seventh position (a heptad repeat) in a segment of 22-29 residues.59 The chain forms an R-helix such that Leu side-chains interdigitate with those from another helix to form a coiled coil interaction.60 Moreover, nonfunctional conserved residues, which form a cluster in c-type cytochromes and globins and are assumed to be requisite for the fast and correct folding of these protein families, contain interacting Leu residues.61 Not only in R-helical proteins, even in all β-sheet proteins the clustering of hydrophobic residues, Leu in particular, has been observed in
folding intermediates.62 Leu-rich repeats, with 20-29 residue sequence motifs, which provide a versatile structural framework for protein-protein interactions, contain β-R units and intervening loops with a very conserved pattern LxxLxLxxN(or C)L (x being any amino acid) corresponding to the segment surrounding the β-strand.63 Some of the Leu residues from the neighboring units are brought in contact as the repeats are arranged in tandem contributing to the horseshoe fold, with R-helices lining its outer circumference and a parallel β-sheet along the inner. Leu can sometimes be replaced by Val, Ile, and Phe, residues which interestingly also have high propensities to be involved in self-contacts (Figure 4). Thus, the selfassociation between Leu residues, observed in various structural motifs in protein structures, is a direct reflection of the high propensity of Leu to be a self-contacting residue in homodimeric interfaces. We studied if any rotamer42,64 is particularly favored for selfcontacting Leu residues (Figure 9). If the conformational angles in bins of size 60° centered at -60°, +60°, and 180° are represented by symbols -, +, and t, respectively, the two combinations of χ1χ2, which are found almost exclusively are -t (47% of the cases) and t+ (23%)sa very similar distribution to Leu residues in general.65,66
Conclusion Residue contacts within a protein chain stabilize the tertiary structure, whereas those occurring at the interface between Journal of Proteome Research • Vol. 4, No. 5, 2005 1607
research articles
Saha et al.
References
Figure 9. Distribution of χ1 and χ2 torsion angles (deg) of selfcontacting Leu residues.
protein chains stabilize the quaternary structure and proteinprotein complexes. Although there are differences in residue composition and other physicochemical features between functional protein-protein interfaces and crystal-packing interfaces,12 we observe that the environment of residues, as seen within protein structures, is generally maintained in all types of interfacessthe maximum deviation occurs for the nonphysiological interfaces formed in crystals. This corroborates what has been noticed for the hydration of interfaces,67 which has distinct overall characteristics for the physiological and crystal dimers; however, the hydration of individual residues remains quite similar between the two categories. On the basis of the similarity of environment in protein tertiary structures, amino acid residues have been divided into nine groups: {G,S,T}, {A,V}, {P,F,Y,W}, {H}, {C}, {M,L,I}, {D,E}, {N,Q}, and {R,K}. Such a classification encompasses various conventional features that are used for residue categorization, such as hydrophobicity, size, aliphatic/aromatic, or conformational flexibility of the side chain, etc. Interactions between residues of the same type are quite prominent in interfaces formed by 2-fold symmetry. This is due to the ‘self-contact’ involving residues lining the 2-fold rotational axis. In homodimers, of all the self-contacting residues, Leu contributes the maximum (13%) to the interface area, while in crystal dimers the charged residue, Glu is the most conspicuous (12%). On average in a structure, 4 and 2 residues are self-contacting in the two datasets, respectively, and the residues involved may be useful in discriminating the two types. The importance of the self-contacting residues in homodimers is revealed as signals in the form of a greater degree of conservation of these residues in homologous structures, as compared to what is seen in crystal dimers. Leu, which is the stickiest residue across the 2-fold axis is also the residue of choice in structural motifs, such as in leucine zippers and leucine-rich repeats, whose structural integrity also depends on the interaction between Leu residues.
Acknowledgment. The work was supported with fellowships to R.P.S. and R.P.B. and a research grant to P.C. from the Council of Scientific and Industrial Research, India. 1608
Journal of Proteome Research • Vol. 4, No. 5, 2005
(1) Nooren, I. M. A.; Thornton, J. M. EMBO J. 2003, 22, 3486. (2) Jones, S.; Thornton, J. M. Proc. Natl. Acad. Sci. U.S.A. 1996, 93, 13. (3) Argos, P. Protein Eng. 1988, 2, 101. (4) Lo Conte, L.; Chothia, C.; Janin, J. J. Mol. Biol. 1999, 285, 2177. (5) Chakrabarti, P.; Janin, J. Proteins 2002, 47, 334. (6) Tsai, C. J.; Lin, S. L.; Wolfson, H. J.; Nussinov, R. J. Mol. Biol. 1996, 260, 604. (7) Miller, S.; Lesk, A. M.; Janin, J.; Chothia, C. Nature 1987, 328, 834. (8) Bahadur, R. P.; Chakrabarti, P.; Rodier, F.; Janin, J. Proteins 2003, 53, 708. (9) Janin, J.; Rodier, F. Proteins 1995, 23, 580. (10) Carugo, O.; Argos, P. Protein Sci. 1997, 6, 2261. (11) Dasgupta, S.; Iyer, G. H.; Bryant, S. H.; Lawrence, C. E.; Bell, J. A. Proteins 1997, 28, 494. (12) Bahadur, R. P.; Chakrabarti, P.; Rodier, F.; Janin, J. J. Mol. Biol. 2004, 336, 943. (13) Dunitz, J. D. X-ray Analysis and The Structure of Organic Molecules; Cornell University Press: Ithaca, 1979. (14) Desiraju, G. R. Acc. Chem. Res. 2002, 35, 565. (15) Walls, P. H.; Sternberg, M. J. E. J. Mol. Biol. 1992, 228, 277. (16) Moont, G.; Gabb, H. A.; Sternberg, M. J. E. Proteins 1999, 35, 364. (17) Glaser, F.; Steinberg, D. M.; Vakser, I. A.; Ben-Tal, N. Proteins 2001, 43, 89. (18) Ofran, Y.; Rost, B. J. Mol. Biol. 2003, 325, 377. (19) Aloy, P.; Russell, R. B. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 5896. (20) Narayana, S. V. L.; Argos, P. Int. J. Pept. Protein Res. 1984, 24, 25. (21) Karlin, S.; Zuker, M.; Brocchieri, L. J. Mol. Biol. 1994, 239, 227. (22) Keskin, O.; Bahar, I.; Badretdinov, A. Y.; Ptitsyn, O. B.; Jernigan, R. L. Protein Sci. 1998, 7, 2578. (23) Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. Nucleic Acids Res. 2000, 28, 235. (24) Bhattacharyya, R.; Chakrabarti, P. J. Mol. Biol. 2003, 331, 925. (25) Hubbard, S. J. NACCESS: A Program for Calculating Accessibilities. Department of Biochemistry and Molecular Biology; University College of London; London, 1992. (26) Lee, B.; Richards, F. M. J. Mol. Biol. 1971, 55, 379. (27) Nicholls, A.; Sharp, K.; Honig, B. Proteins 1991, 11, 281. (28) DeLano, W. L. The PyMOL Molecular Graphics System. DeLano Scientific, San Carlos, CA. http://www.pymol.org; 2002. (29) McDonald, I. K.; Thornton, J. M. J. Mol. Biol. 1994, 238, 777. (30) Bhattacharyya, R.; Saha, R. P.; Samanta, U.; Chakrabarti, P. J. Proteome Res. 2003, 2, 255. (31) Pal, D.; Chakrabarti, P. J. Biomol. Struct. Dyn. 2001, 19, 115. (32) Bhattacharyya, R.; Pal, D.; Chakrabarti, P. Protein Eng. Design Selection 2004, 17, 795. (33) Pal, D.; Chakrabarti, P. J. Biomol. Struct. Dyn. 1998, 15, 1059. (34) Johnson, R. A.; Wiechert, D. W. Applied Multivariate Statistical Analysis; Prentice Hall of India: New Delhi, 1996. (35) Wolynes, P. G. Nat. Struct. Biol. 1997, 4, 871. (36) Plaxco, K. W.; Riddle, D. S.; Grantcharova, V.; Baker, D. Curr. Opin. Struct. Biol. 1998, 8, 80. (37) Clarke, N. D. Curr. Opin. Biotech. 1995, 6, 467. (38) Beasley, J. R.; Hecht, M. H. J. Biol. Chem. 1997, 272, 2031. (39) Akanuma, S.; Kigawa, T.; Yokoyama, S. Proc. Natl Acad. Sci. U.S.A. 2002, 99, 13549. (40) Fan, K.; Wang, W. J. Mol. Biol. 2003, 328, 921. (41) Thomas, P. D.; Dill, K. A. Proc. Natl. Acad. Sci. U.S.A. 1996, 93, 11628. (42) Chakrabarti, P.; Pal, D. Prog. Biophys. Mol. Biol. 2001, 76, 1. (43) Mirny, L. A.; Shakhnovich, E. I. J. Mol. Biol. 1999, 291, 177. (44) Guharoy, M.; Chakrabarti, P., under submission. (45) Borchert, T. V.; Abagyan, R.; Kishan, K. V. R.; Zeelen, J. P.; Wierenga, R. K. Structure 1993, 1, 205. (46) Ponstingl, H.; Henrick, K.; Thornton, J. M. Proteins, 2000, 41, 47. (47) Ponstingl, H.; Kabir, T.; Thornton, J. M. J. Appl. Crystallogr. 2003, 36, 1116. (48) Henrick, K.; Thornton, J. M. Trends Biochem. Sci. 1998, 23, 358. (49) Nooren, I. M. A.; Thornton, J. M. J. Mol. Biol. 2003, 325, 991. (50) Lichtarge, O.; Bourne, H. R.; Cohen, F. E. J. Mol. Biol. 1996, 257, 342. (51) Elcock, A. H.; McCammon, J. A. Proc. Natl. Acad. Sci. U.S.A. 2001, 98, 2990. (52) Valdar, S. J.; Thornton, J. M. J. Mol. Biol. 2001, 313, 399. (53) Valdar, S. J.; Thornton, J. M. Proteins 2001, 42, 108. (54) Ma, B.; Elkayam, T.; Wolfson, H.; Nussinov, R. Proc. Natl. Acad. Sci. U.S.A. 2003, 100, 5772.
research articles
Residue Contacts in Proteins and Interfaces (55) Caffrey, D. R.; Somaroo, S.; Hughes, J. D.; Mintseris, J.; Huang, E. S. Protein Sci. 2004, 13, 190. (56) Panchenko, A. R.; Kondrashov, F.; Bryant, S. Protein Sci. 2004, 13, 884. (57) Lichtarge, O.; Sowa, M. E. Curr. Opin. Struct. Biol. 2002, 12, 21. (58) Mizuguchi, K.; Dean, C. M.; Blundell, T. L.; Overington, J. P. Protein Sci. 1998, 7, 2469. (59) Landschulz, W. H.; Johnson, P. F.; McKnight, S. L. Science 1998, 240, 1759. (60) Cohen, C.; Parry, D. A. D. Proteins 1990, 7, 1. (61) Ptitsyn, O. B.; Ting, K. L. H. J. Mol. Biol. 1999, 291, 671. (62) Gronenborn, A. M.; Clore, G. M. Science 1994, 263, 536.
(63) (64) (65) (66)
Kobe, B.; Kajava, A. V. Curr. Opin. Struct. Biol. 2001, 11, 725. Dunbrack, R. L., Jr. Curr. Opin. Struct. Biol. 2002, 12, 431. Ponder, J. W.; Richards, F. M. J. Mol. Biol. 1987, 193, 775. Lovell, S. C.; Word, J. M.; Richardson, J. S.; Richardson, D. C. Proteins 2000, 40, 389. (67) Rodier, F.; Bahadur, R. P.; Chakrabarti, P.; Janin, J. Proteins 2005, 60, 36. (68) Rodier, F.; Chiadmi, M.; Crosio, M. P. Acta Crystallogr. A 1990, 46, 37.
PR050118K
Journal of Proteome Research • Vol. 4, No. 5, 2005 1609