Local and Nonlocal Environments around Cis Peptides - Journal of


Dec 13, 2007 - Sreetama Das , Pratiti Bhadra , Suryanarayanarao Ramakumar , and Debnath ... Kantharaju , Srinivasarao Raghothama , Upadhyayula Surya ...
0 downloads 0 Views 126KB Size


Local and Nonlocal Environments around Cis Peptides Brent Wathen and Zongchao Jia* Department of Biochemistry, Queen’s University, Kingston, Ontario, Canada K7L 3N6 Received June 28, 2007

Although the vast majority of peptide bonds in folded proteins are found in the trans conformation, a small percentage are found in the less energetically favorable cis conformation. Though the mechanism of cis peptide bond formation remains unknown, the role of local aromatics has been emphasized in the literature. This paper presents results from a comprehensive statistical analysis of both the local and nonlocal (i.e., tertiary) environment around cis peptides. In addition to an increased frequency of aromatic residues in the local environment around cis peptides, a number of nonlocal differences in protein secondary and tertiary structure between cis and trans peptides are found: (i) coil regions containing cis peptides are almost twice as long as those without cis peptides and include more Tyr and Pro residues; (ii) cis peptides occur with high frequencies in coil regions near large β-structures; (iii) there is a nonlocal enrichment of Cys, His, Tyr, and Ser in the tertiary environment surrounding cis peptides when compared to trans peptides; and (iv) on average, cis peptides make fewer mediumrange and more long-range contacts than trans peptides do. On the basis of these observations, it is concluded that nonlocal factors play a significant role in cis peptide formation, which has not been fully appreciated previously. An autocatalytic model for cis peptide formation is discussed as are consequences for protein folding. Keywords: cis peptides • isomerization • protein folding • nonlocal environments

Introduction The peptide bond joining two adjacent amino acids acquires a partial double-bond character through a redistribution of electron density from the amide nitrogen of one amino acid to the carbonyl oxygen of the other. This electron delocalization results in a rotational barrier about this bond of between 13 and 20 kcal/mol,1–9 sufficient to ensure that the linkage between successive amino acids in a polypeptide chain remains planar. The overwhelming majority of the peptide bonds found in protein structures solved thus far by NMR or X-ray crystallography are in the trans conformation. A small percentage of these bonds, however, adopt the energetically less favorable cis conformation. The energy difference between the trans and cis conformations has been estimated to be ∼2.5 kcal/mol, primarily due to an increase in steric hindrance between adjacent R carbons that are brought into close proximity in the cis conformation.10 Typically, >80% of cis peptides are reported to occur within imide (X-Pro, where X is any amino acid) bonds and <20% within amide bonds (X-nonPro). The high abundance of cis imide versus cis amide bonds is a consequence of the lower energy difference between the trans and cis conformations of the imide bond.10–13 X-Pro peptide bonds have unfavorable steric interactions in both the trans and cis conformations, and the resulting energy difference in imide bonds is only ∼0.5 kcal/ mol.14 Numerous reports of cis imide and cis amide bond frequencies have appeared in the literature, with values ranging * Author to whom correspondence should be addressed [e-mail [email protected] queensu.ca; telephone (613) 533 6277]. 10.1021/pr0704027 CCC: $40.75

 2008 American Chemical Society

from 4.7 to 6.5% for cis imides11,15–17 and ∼0.03% for cis amides.15 These low-frequency valuessindeed, much lower than are expected on the basis of reported energy differences between trans and cis conformations of imide and amide bonds11simmediately pose a conundrum for a folding protein: because the translation process is thought to produce only trans peptides, how does a folding protein acquire the cis conformation in a select few of its peptide bonds while retaining the trans conformation for the remaining ones? This remains an open problem in the area of protein-folding research. To date, the majority of the work on this topic has focused on the local interactions between the residues in and around cis peptides. Numerous bioinformatics studies have examined the residues involved in cis peptide bonds, seeking out relationships between these residues and their sequential neighbors. Overall, aromatic residues have been reported to have higher occurrences than expected in both cis imide and cis amide peptide bonds10,18–20 (including in the position following a cis imide bond19), and some studies have found the Pro-Pro and Gly-Pro peptide sequences to have among the highest frequencies of cis peptide conformations.10,16 Conversely, the beta-branched aliphatic residues Val, Ile, and Thr, together with Leu and Asp, have been found to have low occurrences around cis peptides.10,16,18 The preference for aromatic residues in both cis imide and cis amide bonds has led numerous researchers to suggest that local nonbonding interactions or π-stabilization between aromatic rings and either the Pro ring (in imides) or neighboring β carbon atoms (in amides) leads to greater stabilization of the cis peptide Journal of Proteome Research 2008, 7, 145–153 145 Published on Web 12/13/2007

research articles 10,18,19,21,22

bond. In addition, evolutionary studies have concluded that the residue preceding a Pro residue has the greatest effect on determining the conformational state of the imide bond,19 and a number of NMR studies of small peptides have lent support to the idea that local forces play an important role in establishing cis peptide bonds.19,20,23 Collectively, then, although no specific local mechanisms for cis peptide formation have been determined, many of the bioinformatics, structural, and NMR data in this field suggest that the primary interactions that determine the isomeric state of peptide bonds are local. The role of nonlocal, tertiary factors in cis peptide formation has received far less attention in the literature. The most compelling evidence suggesting a formative role for nonlocal factors comes from mutational studies that change the Pro residue in an X-Pro cis imide bond to a non-Pro residue. Whereas some of these mutations resulted in either misfolded proteins24 or folded proteins that adopted a trans peptide bond in place of the original cis peptide bond,24–26 a number have led to folded proteins that retained the cis peptide bond in the much rarer amide form. Amide cis peptide bonds were retained following Pro mutations in carbonic anhydrase II,27 ribonuclease T1,28 ribonuclease A,12 aspartate aminotransferase,26 aspartate transcarbamoylase,29 tryptophan synthase,30 and the BmK M7 scorpion toxin.24 In most of these cases, protein stability or activity has been reduced, sometimes drastically.29 The retention of the less energetically favorable cis amide bond provides strong evidence that tertiary factors involving the overall fold of proteins can influence the formation of peptide isomers during protein folding as well as supply the necessary folding energy to compensate for the more energetically expensive cis amide bond.30 Beyond these mutational studies, however, little direct evidence linking nonlocal factors to cis peptide formation has been reported. Cis imide bonds are located almost exclusively in the coil regions of proteins, either in bends, type VI β-turns, or unstructured regions,10,11 making it difficult to characterize their relationship to tertiary components of folded proteins. Interestingly, although also predominantly found in coil regions, the much rarer amide bonds are found to a significant extent in or after β-strands. As noted by Pal and Chakrabarti, the juxtaposition of cis amide bonds near β-strands suggests that their occurrence may be determined more by folding considerations than by local interactions with neighboring residues.10 Other indirect evidence suggesting that cis peptide bonds might be the result of tertiary folding factors includes the observation that cis imide bonds occur more frequently on protein surfaces, whereas cis amide bonds are generally buried;10 the observation that the majority of Cys residues near cis peptide bonds are involved in disulfide bonds;10 and the results of early “double-jump” experiments demonstrating that the rate of cis-trans isomerization of imide bonds under lowtemperature conditions in ribonuclease occurs 40-fold more quickly in native-like intermediates than in the unfolded state.31 Furthermore, computational models have suggested that interactions with non-neighboring residues in the native fold can significantly accelerate the cis to trans isomerization of imide bonds.32 Finally, the lack of direct evidence notwithstanding, many commentators have noted in passing the probable role thatnonlocal,tertiaryforcesplayincispeptidebondformation.19,20,22,23 This uncertainty surrounding cis peptide formation has motivated us to take a renewed look at the local and nonlocal environments surrounding cis peptides. By local environment, 146

Journal of Proteome Research • Vol. 7, No. 01, 2008

Wathen and Jia Table 1. General Peptide Statistics no.

protein chains with cis peptides peptides trans cis trans peptides trans amides trans imides cis peptides cis amides cis imides amides trans cis imides trans cis

1338 694 330842 329883 959 329883 315034 14849 959 151 808 315185 315034 151 15657 14849 808

%

51.9 99.7 0.3 95.5 4.5 15.7 84.3 99.952 0.048 94.839 5.161

we are referring specifically to the two residues that precede, and the two residues that follow, a cis peptide bond, whereas by the nonlocal, or tertiary, environment, we are referring to the larger secondary and tertiary surroundings in which cis peptides occur. We corroborate the finding of other groups that aromatic residues occur with high frequencies in the local environment around cis peptides, but we note that the propensity of Gly and Pro around cis peptides is diminished when the local secondary structure is taken into account. Furthermore, we find significant differences between the nonlocal environments surrounding cis and trans peptides, including differences in the length and composition of coil regions with and without cis peptides, differences in the secondary structure (particularly β-structure) adjoining these coil regions, and differences in the nature and types of tertiary neighbors around cis and trans peptides. We conclude that tertiary factors probably play an important, and perhaps even primary, role in cis peptide formation, and we speculate that an autocatalytic isomerization mechanism may be responsible for the formation of cis peptides in proteins. Implications of these findings for protein folding are discussed.

Results and Discussion We have collected statistics about the occurrence of peptides in our data set of 1338 nonredundant, well-defined protein chains (see the Methods section for details about our data set) to identify differences in the local and nonlocal environments around cis and trans peptide bonds. We define cis peptide bonds as those peptide bonds that create an ω dihedral angle between adjacent residues of 90° or less, consistent with other definitions in the literature.11 Using this definition, 694 of the 1338 protein chains in our data set had at least one cis peptide bond (51.9%), similar to what has been reported in other studies.10,20,23 As summarized in Table 1, our data set contained 330 842 peptide bonds in total, 99.7% (329 883) in the trans conformation and 0.3% (959) in the cis conformation. Of the 959 cis peptide bonds detected, 808 (84.3%) were cis imide bonds preceding a Pro residue, and the remaining 151 were cis amide bonds. The 808 cis imide bonds represents 5.16% of all imide bonds, and the residues participating in X-Pro imide bonds with the highest percentage of cis isomers are Trp (16.4%), Tyr (10.7%), Gly (9.1%), Pro (8.1%), Glu (7.5%), and Phe (6.5%). Conversely, imide bonds involving Asp, Ile, Met,

research articles

Local and Nonlocal Environments around Cis Peptides a

Table 2. Secondary Structure around Cis Peptides amides

Table 3. Normalized Residue Frequency (NRF) Values at Positions N2, N1, C1, and C2 around Cis Amide and Cis Imide Bondsa

imides

secondary structure pairing

no.

%

no.

%

helix-helix helix-strand helix-coil strand-helix strand-strand strand-coil coil-helix coil-strand coil-coil

0 0 2 0 23 30 0 2 94

0.0 0.0 1.3 0.0 15.2 19.9 0.0 1.3 62.3

4 0 0 0 23 79 0 2 700

0.5 0.0 0.0 0.0 2.8 9.8 0.0 0.2 86.6

a

As determined by DSSP.33

Leu, Val, His, and Cys had the lowest percentage of cis isomers. Because of their smaller sample size, a similar breakdown of cis amide bonds is not statistically reliable. Overall, however, 151 cis amide bonds were found out of a total of 315 185 amide bonds, giving a cis amide frequency of 0.048%. These general statistics regarding the occurrence of cis peptides agree well with other published results,10,11,16,18,20,23 providing a measure of confidence in our data set. We divide our results into two parts, beginning with an examination of the local residue frequencies in the two residues before, and the two residues after, cis and trans peptides. Subsequently, we report our findings for the nonlocal environments around peptides, including (i) our findings with regard to the differences in the structure of coil regions that do and do not contain cis peptides and (ii) our findings with regard to the differences in tertiary, through-space neighbors that surround cis and trans peptides. Residues in the Local Sequence around Cis Peptides. We refer to the residue positions before and after a peptide bond as the N1 and C1 positions, respectively, and to their two immediate neighbors as the N2 and C2 positions, giving the linear sequence of positions of N2-N1-C1-C2. Normalized residue frequency (NRF) values have been calculated for residues in these positions, where normalization was done not only with regard to the frequency of residues around trans peptides in our data set but also with regard to residue frequencies around trans peptides in specific secondary structures. As can be seen in Table 2, three pairings account for 99.3% and 97.4% of all secondary structure combinations around cis imides and cis amides, respectively. These include (with cis imide/amide percentages given in parentheses) coil-coil(86.6%/62.3%),strand-coil(9.8%/19.9%),andstrand-strand (2.8%/15.2%). The general NRF values for residues involved in cis peptides in our data set are given in Table 3. Considering first the residues at the N1 and C1 positions on either side of cis peptide bonds, we find that the aromatic residues Trp and Tyr have the highest NRF values at the N1 position of cis imides, followed by Gly, Pro, Glu, and Phe, whereas the hydrophobic residues, together with Met and Asp, have the lowest NRF values at this position, all consistent with other published results.10,16,18–20 Clear residue preferences are also evident when the NRF data around cis amides are considered, although the smaller data set size for these cis peptides renders the statistics less reliable. Trp and Gly are heavily preferred in the N1 position, along with Pro, Phe, and Gln, whereas Tyr, Asp, Ser, Trp, and Gly are favored in the C1 position. The hydrophobics Ala, Val, Leu, and Ile all have low NRF values in these cis amide positions. From

amide NRF values residue

Gly Ala Val Leu Ile Pro Phe Tyr Trp Cys Met Ser Thr Lys Arg His Asp Glu Asn Gln

imide NRF values

N2

N1

C1

C2

N2

N2

1.29 0.53 0.75 1.39 1.82 2.02 1.03 2.09 1.83 1.59 0.31 0.59 0.73 0.72 0.69 0.59 1.25 0.31 0.64 0.75

3.39 1.05 0.64 0.38 0.36 1.39 1.33 0.93 3.98 1.05 0.31 0.58 0.12 0.47 0.27 0.88 1.01 0.89 0.63 1.67

1.59 0.65 0.53 0.43 0.79

1.02 1.08 0.47 1.07 0.96 0.85 1.02 1.12 2.73 0.00 1.29 0.95 1.83 0.83 0.82 1.17 0.69 0.62 1.59 0.95

1.04 0.93 1.13 0.69 0.76 1.22 0.90 1.17 0.82 2.05 0.79 0.93 1.06 0.93 1.22 1.10 1.02 0.82 1.02 1.52

1.84 0.80 0.60 0.58 0.49 1.63 1.29 2.20 3.61 0.72 0.55 1.25 0.74 0.86 1.13 0.69 0.48 1.48 1.02 1.15

1.26 1.93 1.70 0.99 0.30 1.78 1.14 0.45 0.90 1.09 1.60 0.86 1.18 0.71

C1

1.00

C2

0.79 1.24 0.89 1.00 0.83 1.49 1.70 1.98 1.08 2.18 0.78 0.99 0.84 0.88 1.05 1.57 0.60 0.43 1.25 0.89

a NRF values that are 50% lower/higher than corresponding trans NRF values are underlined/underlined-bold, respectively.

the N2 and C2 positions, other patterns become apparent. The aromatics retain their elevated NRF values at the C2 position after cis imides, consistent with other published results,19 but not at the N2 position, whereas Arg and Gln have high NRF values at the N2 position but not at the C2 position. Pro has high NRF values at both of these positions, as do both Cys and (at the C2 position) His, two residues often associated with catalytic activity. With regard to cis amides, there is a clear increase in aromatics, as well as in Pro, Leu, and Ile residues, at the N2 position, whereas Trp, Thr, and Asn all have high NRF values in the C2 position following cis amides. Interestingly, both cis amides and cis imides have low NRF values for the negatively charged Asp and Glu residues in the C2 position, suggesting that electrostatics plays some role in cis–trans isomerization. Also, unlike in the C2 position following cis imides, no Cys residues were found at this position after cis amides. Overall, we find that the positions around cis peptides are dominated by aromatic residues, while hydrophobics generally have lowered NRF values. The importance of aromatic residues to both cis imide and cis amide formation has been the subject of several reports in the literature.18,20–22 In total, we found that the aromatic residues Phe, Tyr, and Trp occur in the N1 or C2 position in 30.2% (244 of 808) of cis imide bonds and 22.5% (34 of 151) of cis amide bonds, compared to only 17.5% of trans peptide bonds. Furthermore, the increased presence of both Gly and Pro preceding cis peptides, particularly cis amides, is noteworthy. Gly was found with the highest frequency before a cis peptide bond, occurring before 11.1% of all cis imides and before 27.2% of all cis amides; this has been speculated to better facilitate chain bends that regularly occur due to the geometry of the cis bond.16 Moreover, both His and Cys have elevated NRF values in the N2 and C2 positions around cis peptides, particularly cis imides. Also, as mentioned, both cis amides and cis imides have low propensities for the negatively charged Asp and Glu residues in their C2 position. Finally, there Journal of Proteome Research • Vol. 7, No. 01, 2008 147

research articles

Wathen and Jia a

is also general agreement between the cis amide and cis imide NRF values, particularly at position N1, where the values have a correlation of 0.78, but there are some curious differences. The aromatics Tyr and Trp have the most noticeable differences in their pattern of NRF values, and the hydrophobics Leu and Ile both have elevated NRF values at the N2 position before cis amides but low NRF values at this position before cis imides.

Table 4. Average Length of Coil Regions

Few significant deviations from these trends were observed when we tabulated the NRF values adjusted for secondary structure (see Supporting Information). For coil-coil cis imides, the already elevated frequencies of the aromatics Phe, Tyr, and Trp around cis peptides were found to increase slightly, particularly for cis amides, whereas the NRF values for both Pro and Gly decreased. With regard to the other two dominant secondary structure pairings, although the occurrence of cis peptides with either strand-strand (i.e., within β-strands) or strand-coil secondary structure is low, an examination of the strand-adjusted NRF values for these peptides shows one striking trend: Gly, not found with high frequency in β-strands, dominates the N1 position of strand-strand configurations for both cis imides (occurring 15 of 23 times) and cis amides (occurring 15 of 23 times). This suggests that chain flexibility is required either for cis peptide formation in the presence of β-strands or the incorporation of cis peptides during β-strand formation. Interestingly, almost 1 in 6 cis amides occurs within β-strands, compared to <1 in 36 for cis imides. This difference may reflect an intrinsic difficulty in incorporating the less flexible Pro in the cis configuration within the β-strand architecture, or it may indicate that neighboring β-strands can preferentially increase the isomerization rate and/or cis peptide stability of amides over imides. Secondary Structure and Cis Peptides. Cis peptides occur predominantly in coil regions, either in bends or in type VI β-turns, primarily because their geometry encourages chain reversals. 10 Table 2 provides a breakdown of the immediate secondary structure (as determined by DSSP33) of the residues on either side of cis peptides; 86.6% of cis imides and 62.3% of cis amides occur in coil-coil secondary structure, whereas 9.8% of cis imides and 19.9% of cis amides occur on the boundary of β-strands and coil regions. Intriguingly, 15.2% of cis amides are found in β-strands, whereas only 2.8% of cis imides are found there. We examined the coil regions of proteins in our data set to determine if there are any distinguishing characteristics that might explain the existence of cis peptides. We define a coil region as a continuous stretch of residues that have not been identified as either strand or helix by DSSP33 and which either terminates at the protein termini or are flanked at each end by (i) a strand containing three or more residues or (ii) a helix containing 4 or more residues. Altogether, of the 21 014 identified coil regions, 660 contain a single cis imide bond, 85 contain a single cis amide bond, and 32 contain multiple cis imides and/or amides. Notably, as shown in Table 4, the average number of residues in coil regions containing a cis imide bond (12.4) is almost double that for all-trans coil regions (6.8). Coil regions that contain a single cis amide averaged 12.0 residues in length, still 76% larger than the all-trans coil region average, whereas the smaller set of coil regions with multiple cis peptides had the largest average of all, 17.2 residues. Such substantial differences suggest that the process of finding a folding solution containing a cis peptide requires longer coil regions to accommodate the attendant backbone reversal, a statement that makes most sense if neighboring secondary 148

Journal of Proteome Research • Vol. 7, No. 01, 2008

coil region

no. of coils

total residues

without cis peptides with single cis imide with single cis amide with multiple cis peptides

20237 660 85 32

136676 8201 1016 549

6.8 12.4 12.0 17.2

totals

21014

146442

7.0

a

av length

See text for a definition of coil regions.

structure, including helices and β-sheets, has already formed prior to cis peptide formation. Indeed, if neighboring sheets and helices were to come into being after cis peptide formation, one might anticipate an increase in coil length on the order of 1 or 2 residues to account for any local steric clashes near the cis peptide, but the increase of 5 and 6 residues for chains with a single cis peptide seems unwarranted to overcome such clashes. Table 5 shows the residue frequency values for coil residues at various distances from cis peptides, normalized against equivalent values around trans peptides in our data set. The differences between residue frequencies around cis and trans amide bonds are clearly more acute than are found around cis and trans imide bonds, owing to their smaller number. The most notable trend for cis amides is the low frequency of aromatic residues at distances of 2, 4, and 8 positions beyond the N2 and C2 positions, in contrast to their prominent frequencies in the immediate vicinity of cis amides. Tyr provides the only exception to this rule, having increased frequency beyond the C2 position. Also of note are the elevated frequencies of His, Met, and Pro in the residues before the N2 and after the C2 positions and the low frequencies of the positively charged Arg and Lys residues in these positions. With regard to residue frequencies around imide bonds, although less cis-trans variation is seen for imides than is found for amide bonds, the increased frequencies of both Tyr and Pro are notable. Trp is also found to have a decreased frequency in the four residues beyond N2 and after the C2 residues. Although it is admittedly challenging to assign a specific function to these residue variations, at a minimum they do emphasize that cis peptide formation is not simply a local phenomenon and possibly involves residues that are 4-10 positions away from the peptide in question. These residues may play a role directly in the isomerization process itself, or they may be instrumental for targeting the isomerization process to a specific bond. Extending our investigation, we examined the helices and strands adjacent to coil regions; Table 6 contains a breakdown of the frequencies and characteristics of helices and strands next to coil regions with and without cis peptides. There are clear differences between the frequencies of cis and trans peptides at the termini of coil regions, within three peptides of neighboring secondary structure, particularly for amide residues. In general, we find that cis peptides are found more frequently near β-strands when normalized against trans peptides and less frequently near helices. Cis imides in coil regions are found near helices only about half as often as trans imides, as is the case for cis amides following helices at coil N termini. In contrast, we find that cis imides occur with a higher frequency near β-strands than trans imides (approximately 20% increase), whereas cis amides are found to occur at coil N termini following a strand almost 2.5 times as often as expected

research articles

Local and Nonlocal Environments around Cis Peptides

Table 5. Normalized Residue Frequency (NRF) Values at Various Positions before the N2 Position and after the C2 Position around Cis Peptidesa NRF values for residues around cis amides no. of residues before N2 position residue

Gly Ala Val Leu Ile Pro Phe Tyr Trp Cys Met Ser Thr Lys Arg His Asp Glu Asn Gln a

NRF values for residues around cis imides

no. of residues after C2 position

no. of residues after C2 position

8

4

2

2

4

8

8

4

2

2

4

8

0.92 1.00 1.02 1.09 1.16 1.52 0.52 0.69 0.79 0.77 1.70 1.06 1.03 0.92 0.61 1.35 0.83 1.02 0.84 1.18

0.99 1.05 0.94 0.82 1.28 1.76 0.73 1.02 0.47 0.45 2.05 1.10 0.99 0.90 0.59 1.30 0.75 0.87 0.72 1.01

1.04 1.35 0.97 0.72 1.95 1.78 0.65 0.73 0.85 0.00 2.16 1.24 0.78 0.68 0.53 0.92 0.82 0.65 0.85 0.71

0.93 1.07 1.39 0.67 0.54 0.89 0.56 2.41 0.00 1.45 1.95 1.24 0.60 0.37 0.91 1.97 0.88 1.07 1.59 0.61

0.80 0.94 1.36 0.66 0.77 1.08 0.63 2.17 0.00 1.61 1.47 0.87 0.77 0.54 0.91 2.01 0.86 1.13 1.61 1.02

0.89 1.24 1.02 0.78 0.98 1.15 0.80 1.89 0.24 1.23 1.17 0.86 0.66 0.70 0.66 1.29 1.02 0.93 1.68 0.97

0.95 1.03 1.06 1.07 0.92 1.02 0.93 1.29 1.02 1.17 1.06 1.09 0.93 0.86 1.07 0.86 0.87 0.98 1.16 0.89

0.94 0.96 1.13 0.99 0.96 1.10 0.95 1.31 1.07 0.97 1.00 1.05 0.99 0.87 1.12 0.83 0.81 0.96 1.21 0.98

0.94 1.06 1.12 1.01 0.98 1.10 1.16 1.23 0.82 1.09 0.98 1.04 0.96 0.75 1.15 1.08 0.67 0.94 1.20 0.98

0.96 0.68 1.01 0.88 1.11 1.57 1.00 1.44 0.46 0.68 0.81 1.05 1.15 0.85 1.09 0.94 1.03 0.83 0.93 0.81

0.98 0.82 1.00 0.89 1.12 1.27 1.07 1.20 0.71 1.00 0.87 1.11 1.12 0.94 0.85 0.92 0.96 0.86 1.07 0.92

0.99 0.84 1.04 0.87 1.01 1.18 1.16 1.10 0.88 0.98 0.87 1.03 1.11 0.94 0.87 1.14 0.96 0.90 1.03 1.05

Cis NRF values that are 50% lower/higher than corresponding trans NRF values are underlined/underlined-bold, respectively.

Table 6. Peptide Frequency at Coil Termini,a Adjacent to Secondary Structureb amides

adjacent to helices at coil NT at coil CT adjacent to strands at coil NT at coil CT a

no. of residues before N2 position

Table 7. Size of β-Structure near Coil Regionsa

imides

trans (%)

cis (%)

21.1 18.6

10.4 16.7

13.5 21.9

7.6 9.4

21.3 23.4

50.8 16.7

19.2 16.6

22.9 19.8

See text for a definition of coil regions.

b

trans (%)

cis (%)

As determined by DSSP.33

when compared to trans amides. Note that this increased frequency near β-strands is not shared by cis amides at the C termini of coil regions, where they are found with a 30% lower frequency than trans amides. These differences between cis and trans peptide propensities near secondary structure suggest that neighboring secondary structure, particularly β-sheets, may play an important role in cis peptide formation, as has been suggested by Pal and Chakrabarti.10 This proposition is reinforced by the correlations between cis peptide frequencies and the amount of protein secondary structure. Overall, there is a positive correlation of 0.40 between the number of residues and the number of cis peptides in protein chains in our data set. Not surprisingly (given that the vast majority of cis peptides occur in coil regions), the correlation is even stronger between the number of residues in coil regions and the number of cis peptides in a protein chain (0.45). However, unexpectedly, we find that the correlation between the number of residues in β-structure in a protein chain, and the number of cis peptides in that chain is almost as high as that linking cis peptides and coil residues: 0.43. By comparison, the correlation between the number of cis peptides and the number of residues in helices in a protein chain is only 0.15. This drop in correlation is not evident for trans bonds: the correlations between the number of trans peptides in a protein chain and the number of residues

β-structure

no.

all near cis amides near cis imides

4101 85 331

a

total residues

71499 3390 12081

av size

17.4 39.9 36.5

See text for a definition of coil regions.

in coil regions, strands, and helices are 0.96, 0.68, and 0.79. Further support for a link between cis peptides and β-structure comes from an examination of the size of β-structure surrounding coil regions (Table 7). Overall, there are a total of 71 499 residues assigned to 4101 β-sheets in our nonredundant data set, giving an average of 17.4 residues per β-sheet. The average β-sheet size near cis peptides, however, is considerably larger. β-Sheets that are within 3 residues of a cis imide bond in a coil region contain, on average, 36.5 residues, more than double the overall average β-sheet size. The average size of β-sheets near cis amides in coil regions is larger again at 39.9 residues. Together, these data strongly suggest that the presence of large β-sheets can significantly influence the formation and/or stability of cis peptides, particularly cis amides. Large, constrained structures near cis peptides may provide an anchoring point that encourages cis-trans isomerization by reducing the flexibility of the chain on one side of a peptide bond, thereby decreasing the conformational space available for folding and potentially increasing the rotational strain across this bond. Alternatively, they may play a scaffolding role, bringing together in space an arrangement of residues that encourage cis–trans isomerization. Residues in the Nonlocal, Tertiary Environment around Cis Peptides. To investigate the nonlocal residues in the vicinity of cis peptides, we compared the residue frequencies in the tertiary environments at various sequential positions around cis and trans peptide bonds (Table 8). Here, X is considered to be a nonlocal, tertiary neighbor of Y if the distance between their CR atoms is <8 Å and X and Y are separated in sequence Journal of Proteome Research • Vol. 7, No. 01, 2008 149

research articles

Wathen and Jia a

b

Table 8. Ratios of the Proportion of Residues in the Tertiary Environments around Cis and Trans Peptides ratios around residues at specified positions near amide bonds residue

Gly Ala Val Leu Ile Pro Phe Tyr Trp Cys Met Ser Thr Lys Arg His Asp Glu Asn Gln

ratios around residues at specified positions near imide bonds

N8-N5

N4-N2

N1

C1

C2-C4

C5-C8

N8-N5

N4-N2

N1

C1

C2-C4

C5-C8

1.18 0.67 1.41 0.89 1.10 1.83 0.91 0.98 1.42 0.84 0.77 0.75 0.73 1.13 1.00 0.90 0.95 0.82 1.12 0.85

1.06 0.91 1.30 0.77 1.50 1.16 0.89 1.20 1.68 1.26 0.93 1.29 0.82 0.88 0.76 0.69 1.05 0.50 0.68 0.75

1.34 0.76 0.86 0.62 0.90 1.03 0.82 1.33 1.61 1.31 1.31 1.37 1.39 1.00 0.86 0.91 1.22 0.51 1.59 0.62

1.45 0.87 0.80 0.72 0.80 1.10 0.78 1.25 0.99 1.28 0.94 1.56 0.96 0.75 0.65 1.69 1.17 1.02 1.13 0.90

1.47 0.67 0.81 0.79 0.76 1.08 0.99 1.09 0.43 1.22 0.89 1.20 1.13 0.72 0.82 1.98 1.32 0.89 1.74 1.18

1.49 0.74 0.70 0.84 0.67 1.16 0.80 1.91 0.47 0.85 0.79 1.19 1.44 1.16 0.85 1.12 1.21 0.72 1.23 1.05

1.13 1.00 0.79 0.98 0.86 1.06 0.85 1.27 0.93 1.15 0.91 0.93 0.99 1.09 0.98 1.15 1.01 0.92 1.26 1.05

0.98 0.82 0.89 0.76 0.92 1.05 1.07 1.30 0.87 1.07 1.22 1.20 1.15 1.25 0.91 1.07 1.16 0.74 1.13 1.17

1.14 0.89 0.83 0.64 0.72 1.28 0.93 1.10 1.03 1.31 0.77 1.25 1.05 0.89 1.05 1.19 1.40 1.01 1.23 1.16

1.15 0.83 0.97 0.79 0.96 1.06 1.13 1.17 0.93 1.57 0.77 1.07 0.97 0.78 0.86 1.35 0.95 1.05 1.07 1.28

1.02 0.89 0.90 0.69 0.92 1.02 1.09 1.31 1.22 0.90 1.03 1.30 1.14 0.86 0.86 1.11 0.96 1.06 1.28 1.24

1.59 0.77 0.70 0.85 0.89 0.86 0.95 1.33 0.79 1.60 0.82 1.11 1.10 1.08 1.03 1.09 1.15 0.88 1.31 0.90

a Residue X is in the tertiary environment of residue Y if the distance between their R carbons is <8 Å and abs(Y - X) g 4. below 0.67 or above 1.5 are underlined/underlined-bold, respectively.

by at least 4 residues. The cis/trans ratios at positions N1 and C1 show the tertiary differences in the immediate vicinity of the peptide bond, whereas those at positions farther removed show what changes, if any, occur in the tertiary environments of peptides as one moves away along the main chain. The cis/ trans ratios for many residues are close to unity, particularly around imide bonds, although there are some discernible trends. The tertiary environments around cis imides were found to be enriched for Cys, His, Tyr, Asn, and Gln, compared to the environments around trans imides. Conversely, hydrophobic residues, including Met, were found with reduced frequencies in the environments around cis versus trans imides. With regard to cis amides, Tyr, Cys, Ser, and Gly, together with Trp (on the NT side of cis amides) and His (on the CT side of cis amides), are all found with increased frequencies around cis amides compared to trans amides, whereas the hydrophobic residues (Ala, Val, Leu, Ile, and Phe), Glu, and, to a lesser extent, Lys, Arg, and Gln (on the NT side of cis amides) are all found with reduced frequencies around cis amides. There is a positive correlation between the cis/trans ratios of tertiary neighbors around cis imides and cis amides, both at all positions (0.39) and at just the N1 and C1 positions (0.51). Overall, we find that Cys, His, Tyr, Gly, Ser, and Asn have the highest enrichments when comparing the tertiary environments around cis and trans peptides, whereas the hydrophobics and, to a lesser extent, Arg and Glu, have the lowest. A breakdown of the nonlocal, tertiary neighbors into mediumrange (3–9 residues away in sequence) and long-range (10+ residues away in sequence) shows further differences between cis and trans peptides. We find that the number of mediumrange neighbors around both cis amides and cis imides is considerably less than is found around their trans counterparts: the cis/trans ratios around amides and imides are only 0.48 and 0.65, respectively. Conversely, there is an enrichment of long-range neighbors around cis peptides, with cis amides enriched over trans amides by a factor of 1.54 and cis imides over trans imides by 1.14. The reduction in medium-range neighbors around cis peptides suggests that the protein back150

Journal of Proteome Research • Vol. 7, No. 01, 2008

b

Cis/trans ratios that are

bone beyond the cis bond is more extended than is found around trans bonds, consistent with the increased cis peptide proximity near β-strands. In addition, under a framework model for protein folding, the shift from medium- to long-range contacts around cis peptides may be indicative of later folding events, because long-range contacts are expected to form later in the folding process. Because cis peptides are often found in coil regions close to regular secondary structure, particularly β-strands, we also examined the tertiary neighborhoods around the N and C termini of strands and helices. Table 9 lists the ratios of cis and trans neighbors around the terminal 4 residues of helices, and 2 residues of strands, using a neighbor cutoff distance of 8 Å between the CR atom of the helical/strand residues and the CR atom of the residue in the N1 position next to peptide bonds. To focus on through-space neighbors, neighbors separated by 5 residues or less in sequence are excluded from these calculations. We found that the aromatics Phe, Tyr, and Trp, along with Gly, Cys, Ser, His, and Asp, all had a higher proportion of cis peptides in their tertiary environments when they were located at the termini of β-strands, whereas the hydrophobic (Val, Leu, and Ile) and large charged residues (Lys, Arg, and Glu) all had lower proportions of cis peptides in their environments. With regard to helices, although the sample size of neighboring cis peptides around helical termini is low (particularly around the C-terminal residues of helices), similar patterns emerge. The aromatics residues, together with Gly, Cys, Met, Ser, Thr, and His, all had a higher proportion of cis peptides in their tertiary environments when they were located at helical termini, whereas the hydrophobics (Ala, Val, Leu, and Ile) and charged residues (Glu, Asp, Lys, and, at helical C termini, Arg) had lower proportions of cis peptides in their tertiary environments. Collectively, we find some general patterns in the tertiary environments around cis peptides. Hydrophobic residues, particularly Val, Leu, and Ile, are found less frequently surrounding cis peptides than they are around trans peptides, taking secondary structure into consideration, as is the case

Local and Nonlocal Environments around Cis Peptides Table 9. Ratios of the Proportion of Cis and Trans Peptides in the Tertiary Environmentsa around Helical and Strand Terminib,c helices residue

Gly Ala Val Leu Ile Pro Phe Tyr Trp Cys Met Ser Thr Lys Arg His Asp Glu Asn Gln

strands

NT

CT

NT

CT

1.54 0.79 0.68 0.60 0.85 1.01 1.14 1.60 1.73 7.17 1.06 2.01 1.76 0.82 1.23 0.90 0.51 0.38 1.25 0.49

2.21 0.93 0.87 1.07 0.79 0.79 1.16 1.74 2.55 0.92 1.44 1.25 1.15 0.62 0.76 1.26 0.53 0.58 1.07 0.83

1.46 1.18 0.96 0.82 0.75 0.28 1.43 1.26 1.98 1.06 0.87 0.95 0.79 0.80 0.72 1.84 1.41 0.68 1.69 0.78

1.33 1.13 1.09 0.70 0.78 1.06 1.37 1.45 1.01 1.86 1.02 1.37 1.00 0.61 0.39 1.27 0.92 0.79 0.35 1.23

a Residue X is in the tertiary environment of Y if the distance between their R carbons is <8 Å and abs(Y - X) g 6. b As determined by DSSP.24 Values are averaged over four and two residues at the termini of helices and strands, respectively. c Cis/trans ratios that are below 0.67 or above 1.5 are underlined/underlined-bold, respectively.

for the charged residues Arg, Lys, and, to a lesser extent, Glu. In contrast, Cys and His, the aromatics Tyr and Trp, and the polar residues Ser, Asn, Gln, and (to a lesser extent) Thr occur more frequently near cis peptides than near trans peptides. Toward a General Model of Cis–Trans Isomerization. Although there has been considerable attention paid in the literature to local factors that may induce or stabilize cis peptides, it appears that global factors may play a stronger role than has previously been appreciated. It is true that aromatic residues do occur with higher than expected frequencies in the N1 and C1 positions around cis peptides, but the absolute number of cis peptides with aromatic residues in these positions is low. Studies of cis peptide bond formation in small peptides have determined favorable residue combinations that lead to increased concentrations of cis peptides at equilibrium, but it is unclear how relevant these studies of small peptides are to the stability of whole proteins, where peptide bond stability is undoubtedly influenced by the stability of the entire protein fold. Moreover, whereas neighboring aromatic residues may provide π-stability to the cis orientation of peptide bonds, such stability is thermodynamic rather than kinetic. Thermodynamic stability, particularly for cis imides where the energy difference between the cis and trans isomers is estimated to be only ∼0.5 kcal/mol,14 is of less importance than the kinetic speed of isomerization, both because the stability of any particular peptide bond has to be seen in relation to the overall stability of a folded protein and because trans-cis isomerization is generally viewed as a rate-limiting step in protein folding.22,23 In contrast, although the specific mechanisms by which nonlocal, tertiary interactions may induce a cis-trans isomerization remain unknown, the differences in coil regions and neighboring structures around cis and trans peptide bonds are unmistakable. Coil regions containing cis peptides are, on average, almost double the length of coil regions without cis

research articles peptides and, generally, have less flexibility due to an increase in the amount of Pro. Moreover, the frequency and size of β-structures near cis peptides, particularly cis amides, are remarkable. Because it is unlikely that the presence of a cis peptide can induce greater β structure, the conclusion that large β-structures are somehow involved in cis peptide formation seems to be unavoidable. What seems most likely is that there is a combination of local and nonlocal factors involved in cis-trans isomerization. The local increase in aromatics and decrease in hydrophobics around cis peptides, combined with increases in Cys and His at the N2 and C2 positions, the decrease in negatively charged Asp and Glu at the C2 position, a general decrease in Lys and, for cis amides, Arg in the local neighboring positions around cis peptides, the increase in Pro and Tyr in coil regions beyond the local positions, and the increase in aromatics, polar residues, and Cys and His, together with a decrease in the charged and hydrophobic residues, in the nonlocal, tertiary neighborhood around cis peptides, collectively suggest that a broader cis-trans isomerization mechanism may be responsible for the occurrence of cis peptides in folded proteins. Given this, we propose an autoisomerization model for cis peptide formation, wherein the correct spatial orientation of both local and nonlocal residues around a particular peptide bond during protein folding results in a catalytic process that encourages cis peptide formation. In this light, our observations can be interpreted as follows: (i) the increased presence of aromatics around cis peptides may be involved in targeting the isomerization, as well as in constraining the backbone; (ii) the increased presence of Gly in the N1 position and the decreased presence of the bulky hydrophobics, Met, and β-branched Thr at this position may reflect the need for greater accessibility to the peptide bond during catalysis; (iii) the higher-than-expected frequencies of Cys and His at both the N2 and C2 positions around cis peptides and in the tertiary environment of cis peptides raise the possibility that these two residues may play critical roles in autocatalyzing the cis-trans isomerization process, particularly because they figure prominently in the active sites of both PIN1 and FKPB, two well-studied propyl cis-trans isomerases;34,35 (iv) the low frequencies of negatively charged residues at the C2 position around cis peptides, as well as the low frequencies of the positively and negatively charged residues in the tertiary neighborhood of cis peptides, might be required to avoid interference with the autocatalytic residues, particularly His; (v) the increased presence of β-structure near cis peptides may reflect a scaffolding role for β architecture that promotes cis-trans isomerization by fixing key residues in close proximity to the peptide to be isomerized; and (vi) the longer coil lengths for coil regions containing cis peptides, together with the higher proportion of long-range contacts surrounding cis peptides, indicate that cis peptide formation occurs preferentially in extended coil regions where the peptides have greater exposure and would thus be more prone to autocatalysis. The idea of a structure-based acceleration of the isomerization process has been proposed by Reimer et al., albeit on a far more local scale, who found that the His-Pro moiety can dramatically enhance the cis-trans isomerization rate.36 Moreover, the reports of cisProfX mutations that result in a cis amide in place of a cis imide,12,24,26–30 together with the early report of significantly increased rates of cis-trans isomerization in folded intermediates as compared to unfolded polypeptides,31 strongly support a cis-trans isomerization Journal of Proteome Research • Vol. 7, No. 01, 2008 151

research articles mechanism that relies on the secondary and tertiary structure of folding proteins. Concluding Remarks: Implications for Protein Folding. We have examined both the local and nonlocal environments around cis peptides in this study. We have found as others have that there is an increased presence of aromatics in the local sequence around cis peptides. In addition, we have found nonlocal differences between cis and trans peptides, including differences in the coil regions in which cis peptides are predominantly found, differences in the neighboring structure, particularly β-structure, around these coil regions, and differences in the through-space, tertiary neighborhoods around cis and trans peptides. Because of their rarity, the formation of a cis peptide in a protein requires unusual folding circumstances, the study of which can shed light on general folding principles. Scheraga and co-workers, for example, making innovative use of Pro residues as folding probes, have suggested that the slow folding events involving cis imide bonds may well be chainfolding initiation sites (CFISs) and that the formation of type VI β-turns involving cis imide bonds constitutes an initial step in protein folding.12 We conclude by examining what implications can be drawn for protein folding from a consideration of cis peptide formation, particularly in light of the nonlocal, tertiary autocatalytic mechanism for cis-trans isomerization we are proposing. One implication of our proposed model for protein folding is that some elements of secondary and/or tertiary structure formation must precede cis peptide formation. Contrary to the conclusions of Scheraga and co-workers described above, this relationship implies that cis peptides are not folding initiation sites, but rather sites where protein folding is temporarily interrupted. This position is also supported indirectly by the observation that the long time scales involved in cis-trans isomerization relative to protein folding would present problems for proteins that relied on cis peptides to initiate folding: the long time before folding initiation would leave a protein prone to (i) proteolysis and (ii) unwanted cis-trans isomerizations in peptide bonds that are to remain trans. A second implication of our model regarding protein folding is a consequence of the strength of the signals detected by our statistical analysis. One might well ask why the catalytic sites around cis peptides are not more easily recognizable if there is indeed a generalized structural mechanisms for their formation. The answer is that cis peptide formation under this model is not viewed as the final step in protein folding. Instead, a three-stage model of protein folding emerges from our analysis, which consists of an early folding stage that creates the correct secondary and/or tertiary structures required for cis-trans isomerization, followed by a slow isomerization step that changes the conformation of one or more peptide bonds, which in turns is followed by a second folding stage that searches for a final folding solution that brings overall stability to a protein. There may well be considerable folding following cis-trans isomerization that reorganizes a partially folded protein so as to disrupt the autocatalytic site of isomerization. In essence, we interpret our statistical analysis as picking up a folding echo created during an intermediate stage of protein folding This model of protein folding is consistent with basic folding models that advocate the creation of specific, secondary structure early in the folding process, including the CFIS model of protein folding espoused by Scheraga and co-workers,12 despite the fact that we have drawn opposite conclusions regarding the role of cis peptides during folding. Moreover, a model that 152

Journal of Proteome Research • Vol. 7, No. 01, 2008

Wathen and Jia advocates early folding events provides a means to explain the discrepancy between the number of cis peptides expected based on energetics arguments alone and the number actually observed. Folded sections of a protein effectively sequester their peptide bonds, locking them in the trans orientation and thus reducing the occurrence of cis peptides. This argument is consistent with the kinetic explanation presented by Jabs et al. to explain the low number of cis peptides observed.18

Methods We compiled a nonredundant data set of protein structures using the PDB-REPRDB database,37 which consists of representative protein chains from the RCSB Protein Data Bank, release 2007_01_27. Only protein chains solved by X-ray crystallography with resolutions of e2.0 Å and R factors of e0.25 were considered. In addition, to be included, protein chains had to be free of chain breaks, have coordinates for all non-hydrogen atoms, and be at least 40 residues in length. Membrane proteins, as well as those polypeptide chains that were part of larger structural complexes, were also discarded. Only one representative was chosen for our data set from among those protein chains that had sequence identities of >30% or structural alignments of <10 Å. In all, our data set consisted of 1338 protein chains. The DSSP program33 was used to identify the secondary structure of each residue in our data sets. To analyze these data sets, we adopted the ω dihedral angle definitions used by Stewart et al.;11 this gave two classes of peptide bonds, namely, cis [abs(ω) < 90°] and trans [abs(ω) g 90°], where abs( ) refers to the absolute value function. Normalized residue frequency (NRF) values around cis amides and cis imides were calculated using the formula

[fX,cis(i) ⁄ ftotal,cis(i)] ⁄ [fX,trans(i) ⁄ ftotal,trans(i)]

(1)

where fX,cis(i) and fX,trans(i) are the counts of residues of type X in the ith position around cis and trans peptides, respectively, and ftotal,cis(i) and ftotal,trans(i) are the total counts of all residues in the ith position around cis and trans peptides. Similarly, NRF values adjusted for secondary structure were tabulated by restricting the frequency counts to particular secondary structural pairs assigned by DSSP33 to the residues surrounding a peptide bond. Statistics were tabulated to compare the frequency with which cis and trans peptide bonds are found in coil regions adjacent to both helices and strands. For this purpose, the secondary structure as assigned by DSSP33 was simplified, so that each residue was designated as part of a helix, strand, or coil region. A minimum of 4 adjacent residues for helices, and 3 for strands, was required. For calculating the percentage of cis (trans) peptide coil bonds adjacent to helices (strands), we calculated the total number of cis (trans) bonds in coil regions adjacent to helices (strand) and divided by the total number of cis (trans) bonds in coil regions. More precisely, the percentage of cis peptide bonds at the CT end of helices was given by ncis,nt ⁄ (ncis,coil-coil + ncis,helix-coil + ncis,coil-helix)

(2)

where ncis,nt is the number of cis peptide bonds in coil regions that are within 0, 1, or 2 bonds from the CT end of a helix (and so the NT end of the coil region) and ncis,coil-coil, ncis,helix-coil, and ncis,coil-helix are the total number of cis peptide bonds in coil-coil, helix-coil, and coil-helix secondary structure pairs. Similar formulas were used for calculating cis and trans statistics at both ends of helices and strands. To examine the tertiary neighborhood of a peptide bond, the tertiary neighbors around residues at various positions from

Local and Nonlocal Environments around Cis Peptides

research articles

that bond were determined. Residue X is defined as a tertiary neighbor to residue Y if the distance between their CR atoms is less than a specified distance cutoff value, and the sequential separation between X and Y was above a separation threshold. Distance cutoff values for tertiary neighbors between 6 and 12 Å were explored, and various separation threshold values were tried between 3 and 10 Å. A normalized score relating the relative number of tertiary neighbors around cis and trans peptides was determined using a formula similar to that used for calculating NRF values

(13) Wedemeyer, W. J.; Welker, E.; Scheraga, H. A. Proline cis-trans isomerization and protein folding. Biochemistry 2002, 41, 14637– 14644. (14) Maigret, B.; Perahia, D.; Pullman, B. Molecular orbital calculations on the conformation of polypeptides and proteins. IV. The conformation of the prolyl and hydroxyprolyl residues. J. Theor. Biol. 1970, 29, 275–291. (15) Weiss, M. S.; Jabs, A.; Hilgenfeld, R. Peptide bonds revisited. Nat. Struct. Biol. 1998, 5, 676. (16) Deane, C. M.; Lummis, S. C. R. The role and predicted propensity of conserved proline residues in the 5-HT3 receptor. J. Biol. Chem. 2001, 276, 37962–37966. (17) Vitagliano, L.; Berisio, R.; Mastrangelo, A.; Mazarella, L.; Zagari, A. Preferred proline puckerings in cis and trans peptide groups: implications for collagen stability. Protein Sci. 2001, 10, 2627–2632. (18) Jabs, A.; Weiss, M. S.; Hilgenfeld, R. Non-proline cis peptide bonds in proteins. J. Mol. Biol. 1999, 286, 291–304. (19) Lorenzen, S.; Peters, B.; Goede, A.; Preissner, R.; Frommel, C. Conservation of cis prolyl bonds in proteins during evolution. Proteins 2005, 58, 589–595. (20) Meng, H. Y.; Thomas, K. M.; Lee, A. E.; Zondlo, N. J. Effects of i and i + 3 residue identity on cis-trans isomerism of the aromatici+1-prolyli+2 amide bond: implications for type VI β-turn formation. Biopolymers 2006, 84, 192–204. (21) Dyson, H. J.; Rance, M.; Houghten, R. A.; Lerner, A. R.; Wright, P. E. Folding of immunogenic peptide fragments of proteins in water solution. J. Mol. Biol. 1988, 201, 161–200. (22) Wu, W.-J.; Raleigh, D. P. Local control of peptide conformation: stabilization of cis proline peptide bonds by aromatic proline interactions. Biopolymers 1998, 45, 381–394. (23) Reimer, U.; Scherer, G.; Drewello, M.; Kruber, S.; Schutkowski, M.; Fischer, G. Side-chain effects on peptidyl-prolyl cis/trans isomerisation. J. Mol. Biol. 1998, 279, 449–460. (24) Guan, R.-J.; Xiang, Y.; He, X.-L.; Wang, C.-G.; Wang, M.; Zhang, Y.; Sundberg, E. J.; Wang, D.-C. Structural mechanism governing cis and trans isomeric states and an intramolecular switch for cis/ trans isomerization of a non-proline peptide bond observed in crystal structures of scorpion toxins. J. Mol. Biol. 2004, 341, 1189– 1204. (25) Forstner, M.; Muller, A.; Rognan, D.; Kriechbaum, M.; Wallimann, T. Mutation of cis-proline 207 in mitochondrial creatine kinase to alanine leads to increased acid stability. Protein Eng. 1998, 11, 563– 568. (26) Birolo, L.; Malashkevich, V. N.; Capitani, G.; De Luca, F.; Moretta, A.; Jansonius, J. N.; Marino, G. Functional and structural analysis of cis-proline mutants of Escherichia coli aspartate aminotransferase. Biochemistry 1999, 38, 905–913. (27) Tweedy, N. B.; Nair, S. K.; Paterno, S. A.; Fierke, C. A.; Christianson, D. W. Structure and energetics of a non-proline cis-peptidyl linkage in a proline-202 f alanine carbonic anhydrase II variant. Biochemisty 1993, 32, 10944–10949. (28) Mayr, L. M.; Willbold, D.; Rosh, P.; Schmid, F. X. Generation of a non-prolyl cis peptide bond in ribonuclease T1. J. Mol. Biol. 1994, 240, 288–293. (29) Jin, L.; Stec, B.; Kantrowitz, E. R. A cis-proline to alanine mutant of E. coli aspartate transcarbamoylase: kinetic studies and threedimensional crystal structures. Biochemistry 2000, 39, 8058–8066. (30) Wu, Y.; Matthews, C. R. A cis-prolyl peptide bond isomerization dominates the folding of the alpha subunit of Trp synthase, a TIM barrel protein. J. Mol. Biol. 2002, 322, 7–13. (31) Cook, K. H.; Schmid, F. X.; Baldwin, R. L. Role of proline isomerization in folding of ribonuclease A at low temperatures. Proc. Natl. Acad. Sci. U.S.A. 1979, 76, 6157–6161. (32) Levitt, M. Effect of proline residues on protein folding. J. Mol. Biol. 1981, 145, 251–263. (33) Kabsch, W.; Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22, 2577–2637. (34) Ranganathan, R.; Lu, K. P.; Hunter, T.; Noel, J. P. Structural and functional analysis of the mitotic rotamase Pin1 suggests substrate recognition is phosphorylation dependent. Cell 1997, 89, 875–886. (35) Van Duyne, G. D.; Standaert, R. F.; Karplus, P. A.; Schreiber, S. L.; Clardy, J. Atomic structure of FKBP-FK506, an immunophilinimmunosuppressant complex. Science 1991, 252, 839–842. (36) Reimer, U.; Mokdad, N. E.; Schutkowski, M.; Fischer, G. Intramolecular assistance of cis/trans isomerization of the histidine-proline moiety. Biochemistry 1997, 36, 13802–13808. (37) Noguchi, T.; Matsuda, H.; Akiyama, Y. PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB). Nucleic Acids Res. 2001, 29, 219–220.

(NX,cis ⁄ Ntotal,cis) ⁄ (NX,trans ⁄ Ntotal,trans)

(3)

where NX,cis and NX,trans are the tertiary neighbor counts of residues of type X around cis and trans peptides, respectively, and Ntotal,cis and Ntotal,trans are the total counts of all tertiary neighbors around cis and trans peptides. When comparing the type of tertiary neighbors around cis and trans peptides, we further broke neighbors into short- (separated by 1–2 residues in sequence), medium- (separated by 3–9 residues in sequence), and long-range (separated by 10+ residues in sequence). Custom software was built for analyzing our data sets using the C++ programming language.

Acknowledgment. This research is supported by Canadian Institutes of Health Research to Z.J. B.W. is the recipient of a Natural Science and Engineering Research Council of Canada graduate studentship. Z.J. is a Canada Research Chair in Structural Biology. We are thankful to Jo Willan for her critical reading of the manuscript. Supporting Information Available: A breakdown of the residue frequency data around cis and trans peptides by secondary structure pairing is available free of charge via the Internet at http://pubs.acs.org. References (1) LaPlace, L. A.; Rogers, M. T. Cis and trans configurations of the peptide bond in N-monosubstituted amides by nuclear magnetic resonace. J. Am. Chem. Soc. 1964, 86, 337–341. (2) Christensen, D. H.; Kortzeborn, R. N.; Bak, B.; Led, J. J. Results of ab initio calculations on formamide. J. Chem. Phys. 1970, 53, 3912– 3922. (3) Drakenberg, T.; Forsén, S. The barrier to internal rotation in monosubstituted amides. J. Chem. Soc., Chem. Commun. 1971, 1404–1405. (4) Perricaudet, M.; Pullman, A. An ab initio quantum-mechanical investigation on the rotational isomerism in amides and esters. J. Pept. Protein Res. 1973, 5, 99–107. (5) Radzicka, A.; Pedersen, L.; Wolfenden, R. Influences of solvent water on protein folding: free energies of salvation of cis and trans peptides are nearly identical. Biochemistry 1988, 27, 4538–4541. (6) Jorgensen, W. L.; Gao, J. Cis-trans energy difference for the peptide bond in the gas phase and in aqueous solution. J. Am. Chem. Soc. 1988, 110, 4212–4216. (7) Schulz, G. D.; Schimer, R. H. Principles of Protein Structure; Springer-Verlag: New York, XXXX; pp 25. (8) Schnur, D. M.; Yuh, Y. H.; Dalton, D. R. Molecular mechanics study of amide conformations. J. Org. Chem. 1989, 54, 3779–3785. (9) Scherer, G.; Krammer, M. L.; Schutkowski, M.; Reimer, U.; Fischer, G. Barriers to rotation in secondary amide peptide bonds. J. Am. Chem. Soc. 1998, 120, 5568–5574. (10) Pal, D.; Chakrabarti, P. Cis peptide bonds in proteins: residues involved, their conformations, interactions and locations. J. Mol. Biol. 1999, 294, 271–288. (11) Stewart, D. E.; Sarkar, A.; Wampler, J. E. Occurrence and role of cis peptide bonds in protein structure. J. Mol. Biol. 1990, 214, 253– 260. (12) Dodge, R. W.; Scheraga, H. A. Folding and unfolding kinetics of the proline-to-alanine mutants of bovine pancreatic ribonuclease A. Biochemistry 1996, 35, 1548–1559.

PR0704027 Journal of Proteome Research • Vol. 7, No. 01, 2008 153