Prevalent Structural Disorder in E. coli and S. cerevisiae Proteomes

Ghaemmaghami, S.; Huh, W. K.; Bower, K.; Howson, R. W.; Belle, A.; Dephoure, N.; O'Shea, E. K.; Weissman, J. S. Global analysis of protein expression ...
0 downloads 0 Views 118KB Size
Prevalent Structural Disorder in E. coli and S. cerevisiae Proteomes Peter Tompa,* Zsuzsanna Doszta´ nyi, and Istva´ n Simon Institute of Enzymology, Biological Research Center, Hungarian Academy of Sciences, Budapest, Hungary Received March 13, 2006

Intrinsically unstructured proteins, which exist without a well-defined 3D structure, carry out essential functions and occur with high frequency, as predicted for genomes. The generality of this phenomenon, however, is questioned by the uncertainty of what fraction of genomes actually encodes for expressed proteins. Here, we used two independent bioinformatic predictors, PONDR VSL1, and IUPred, to demonstrate that disorder prevails in the recently characterized proteomes and essential proteins of E. coli and S. cerevisiae, at levels exceeding that estimated from the genomes. The S. cerevisiae proteome contains three times as much disorder as that of E. coli, with 50-60% of proteins containing at least one long (>30 residues) disordered segment. This evolutionary advance can be explained by the observation that disorder is much higher in Gene Ontology categories related to regulatory, as opposed to metabolic, functions, and also in categories unique to yeast. Thus, protein disorder is a widespread and functionally important phenomenon, which needs to be characterized in full detail for understanding complex organisms at the molecular level. Keywords: Intrinsic disorder • disordered protein • unstructured protein

Introduction Intrinsically unstructured/disordered proteins (IUPs) constitute a recently recognized class of proteins, characterized by the lack of a well-defined three-dimensional structure. Although these proteins or protein domains exist as an ensemble of rapidly interconverting conformations, they carry out important functions often associated with signal transduction and transcription regulation.1-5 The importance of protein disorder has been inferred from its occurrence in important proteins, such as p53,6 BRCA1,7 and the prion protein,8 (cf. DisProt database),9 the range of functional advantages structural disorder confers onto proteins,3,4 and its high predicted frequency in various organisms.10-12 These observations justify the prior call for reassessing the classical structure-function paradigm,1 which stated that a well-defined three-dimensional structure is a prerequisite of protein function. The underlying assumption concerning the high frequency of protein disorder, however, is based on predictions of the putative protein-coding genes of various genomes,10-12 which is of arguable relevance due to the uncertainty of what fraction of these genes actually corresponds to expressed proteins. This latter point is convincingly illustrated by the differences between the genome and proteome of model organisms E. coli13,14 and S. cerevisiae,15 and the steadily decreasing number of putative protein-coding genes in the human genome.16 Thus, for a dependable assessment of the prevalence of protein disorder, one must predict disorder for the proteome(s). The proteome has been recently determined in systematic hightroughput LC-MS experiments for E. coli13,14 and epitope * To whom correspondence should be addressed. Tel.: +361-279-3143. E-mail: [email protected].

1996

Journal of Proteome Research 2006, 5, 1996-2000

Published on Web 07/15/2006

tagging/immunoblot analysis for S. cerevisiae.15 These experiments identified 1560 out of 4397 possible protein-coding genes for E. coli and 3854 out of 6666 for S. cerevisiae as expressed under the given conditions. The complement of essential proteins, deletion of which is lethal to the organism, has also been recently described for both E. coli17 and S. cerevisiae.18,19 For details, see Materials and Methods.

Materials and Methods Databases. Putative protein-coding genome sequences were downloaded from GenBank (http://www.ncbi.nlm.nih.gov/) of E. coli, and from ftp://genome-ftp.stanford.edu/pub/yeast/ data_download/sequence/genomic_sequence/orf_protein/ orf_trans_all.fasta.gz for S. cerevisiae. The lists of expressed proteins (proteome) for E. coli reported in two independent works13,14 were downloaded from http:// mcponline.highwire.org/cgi/data/M400030-MCP200/DC1/1 (Table 1.xls) and http://mcponline.highwire.org/cgi/data/ D500006-MCP200/DC1/1 (Table 3.xls) and merged. For S. cerevisiae, the proteome was taken from the Supporting Information of Ghaemmaghami et al.15 Essential proteins for E. coli were taken from the PEC database 17, whereas for S. cerevisiae, the data reported in18,19 were taken from http://www-sequence.stanford.edu/group/ yeast_deletion_project/Essential_ORFs.txt. Gene Ontology Categories. For E. coli, genome GO biological process classification was taken from the GOA GO annotation project at EBI at ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/ proteomes/18.E_coli_K12.goa.gz, and was filtered for expressed proteins. The detailed GO terms were mapped into the SLIM terms of the yeast annotation using the program map2slim. For S. cerevisiae, GO biological process annotation was taken 10.1021/pr0600881 CCC: $33.50

 2006 American Chemical Society

research articles

Prevalent Structural Disorder Table 1. Structural Disorder in E. coli and S. cerevisiae Genomes, Proteomes and Essential Proteinsa E. coli

S. cerevisiae

genome proteome essential genome proteome essential

residues > 30 full

5.54 8.72 1.9

residues > 30 full

3.62 10.38 0.88

IUPred (%) 10.62 20.62 4.12 PONDR (%) 4.7 7.8 14.94 24.23 0.73 2.06 7.25 13.68 2.07

21.62 40.8 3.95

22.31 49.07 4.28

20.76 52.4 2.46

22.74 52.57 2.25

22.99 60.72 2.15

21.58 64.72 0.78

a Structural disorder is predicted for the E. coli and S. cerevisiae genomes, proteomes and essential proteins, by two predictors relying on different principles, IUPred (top), and PONDR VSL1 (bottom). For each prediction three different numbers are shown, the percent of residues in disordered regions (residues), the percent of proteins with at least one long disordered segment (>30, cf. also Figure 1) and the percent of proteins that are disordered along their entire length (full).

from the Saccharomyces Genome Database (SGD) FTP site20 at http://www.yeastgenome.org/, using the go_slim_mapping.tab file, in which only the most specific GO slim term is reported. Prediction of Structural Disorder. The disorder content was calculated with predictors IUPred21 and PONDR VSL1.22 In both cases, the outputs were smoothed over 31 residues. A slight tuning of the cutoff values was applied in order to make the predictions directly comparable and to keep the false positive rate low. Specifically, the cutoff values were set so that 5% of residues were predicted to be disordered. With these settings, less than 10% of proteins were predicted to contain one long (>30 residues) disordered segment in our filtered monomeric globular protein list, reported previously.23 In the case of PONDR VSL1, this effectively meant raising the default cutoff value (0.5) by 0.14. By both predictors, the average percentage

of disordered residues in the dataset, the percentage of proteins with a disordered segment of at least 30 residues in length, and the percentage of fully disordered proteins was calculated. Proteins shorter than 30 residues were omitted. Fully disordered proteins were proteins in which more than 50% of the residues were disordered, and which did not contain any continuous ordered sequence longer than 30 residues.

Results and Discussion To obtain an unbiased assessment of disorder in the experimentally determined proteomes, one has to resort to bioinformatic predictors (for details, see Materials and Methods), since experimental data are available for a couple of hundred proteins only.9 Thus, we applied here two principally different predictors, PONDR VSL1,22 and IUPred.21 PONDR VSL1 is a machine learning approch to recognize disordered segments, which performs well on both short and long disordered segments. IUPred, on the other hand, is based on a physical model of protein disorder, which relies on the assumption that sequences, which cannot form sufficient favorable interactions, would be disordered.23 As the parameters of IUPred were calculated by using globular proteins only, without relying on disordered protein datasets, this method provides an independent assessment of protein disorder. The predictors have been thoroughly tested at the sixth CASP (Critical Assessment of techniques for protein Structure Prediction) experiment, and have been found to consistently predict experimentally determined disorder at about 90% accuracy.24 We have applied the predictors to the E. coli and S. cerevisiae genomes, proteomes and essential complement of proteomes (Figure 1 and Table 1), for assessing the level of disorder characterized by three different, but related, measures: (i) the percent of residues that fall into locally disordered regions in the full dataset, which shows the overall importance of disorder, but does not distinguish between scattered long disordered regions

Table 2. Structural Disorder in “Biological Process” GO Categories in E. colia GO category

cell homeostasis vitamin metabolism carbohydrate metabolism generation of precursor metabolites and energy lipid metabolism amino acid and derivative metabolism cellular respiration electron transport protein modification transport cell wall organization and biogenesis RNA metabolism morphogenesis signal transduction transcription ribosome biogenesis and assembly response to stress protein biosynthesis DNA metabolism cell cycle/cytokinesis organelle/membrane organization and biogenesis biological process unknown

no. of proteins

disorder in database (%)

proteins with long disordered region (%)b

fully disordered proteins (%)

7 42 170 60

0.85 1.57 2.67 2.67

0.00 0.00 5.00 5.00

0.00 0.00 0.00 0.00

62 131 24 95 14 195 24 65 25 31 118 16 70 95 68 40 9

2.31 2.49 4.40 3.92 3.74 4.56 4.30 6.08 5.60 5.30 6.89 6.02 6.56 8.46 9.73 15.73 19.32

5.64 6.49 8.33 9.47 10.71 11.03 12.50 13.08 14.00 14.51 15.25 15.62 18.57 20.00 30.88 31.25 44.45

0.00 0.00 0.00 0.00 0.00 0.26 0.00 0.00 0.00 0.00 1.27 0.00 0.71 4.73 2.94 2.50 0.00

634

7.01

16.24

1.58

a

Disorder in proteins falling into various GO categories in the E. coli proteome is predicted with PONDR VSL1 and IUPred, and the average for the two predictions is calculated. The percent of residues in disordered regions, the percent of proteins with at least one long (>30 residues) disordered segment and the percent of proteins that are disordered along their entire length, is given. b Categories are ranked by the second parameter. Proteins, for which the biological process is unknown, are shown separated and were not discussed.

Journal of Proteome Research • Vol. 5, No. 8, 2006 1997

research articles

Tompa et al.

Table 3. Structural Disorder in “Biological Process” GO Categories in S. cerevisiaea GO category

no. of proteins

disorder in database (%)

proteins with long disordered region (%)b

fully disordered proteins (%)

electron transport amino acid and derivative metabolism generation of precursor metabolites and energy carbohydrate metabolism vitamin metabolism cellular respiration lipid metabolism protein biosynthesis cell homeostasis protein modification organelle/membrane organization and biogenesis response to stress transport protein catabolism *vesicle-mediated transport RNA metabolism cell wall organization and biogenesis *cell budding *nuclear organization and biogenesis DNA metabolism *sporulation *cytoskeleton organization and biogenesis ribosome biogenesis and assembly signal transduction *meiosis cell cycle/cytokinesis conjugation morphogenesis transcription *pseudohyphal growth

8 105 62 66 30 54 88 287 36 256 138 145 272 53 174 276 79 25 31 232 22 85 119 60 61 120 40 12 242 36

5.02 6.35 6.62 11.82 9.41 13.87 12.23 16.02 23.41 20.56 19.95 27.66 16.29 15.41 23.58 22.41 27.46 32.06 29.81 26.55 33.42 32.55 25.52 30.95 31.88 32.41 29.35 30.91 37.48 41.19

0.00 17.62 24.20 25.00 26.66 31.48 38.07 43.38 48.61 51.75 53.26 55.52 55.52 56.61 64.08 67.21 67.72 70.00 70.97 71.34 72.72 72.94 73.53 75.00 75.41 77.92 78.75 79.16 80.99 83.34

6.25 0.47 4.84 0.00 0.00 1.85 0.00 3.49 1.39 1.36 2.90 6.55 1.48 0.94 2.87 3.62 2.53 4.00 1.61 1.94 4.54 2.94 4.62 2.50 3.28 5.00 2.50 4.17 4.34 0.00

1055

20.14

45.59

4.46

biological process unknown a

Disorder in proteins falling into various GO categories in the S. cerevisiae proteome is predicted with PONDR VSL1 and IUPred, and the average for the two predictions is calculated. The percent of residues in disordered regions, the percent of proteins with at least one long (>30 residues) disordered segment and the percent of proteins that are disordered along their entire length, is given. b Categories are ranked by the second parameter. Proteins, for which the biological process is unknown, are shown separated and were not discussed. GO categories unique to S. cerevisiae (i.e., missing from E. coli, cf. Table 1), are marked in italics and with an asterisk.

and frequent, short segments; (ii) the percent of proteins which have at least one long (>30 residues) disordered segment, which is generally considered to be functionally significant,10,12 and (iii) the percent of proteins that are fully disordered, i.e., at least 50% of their residues are disordered and did not contain any continuous ordered sequence longer than 30 residues. Although the predictors are unrelated, their results fall very close to each other, and their false positive rates are very low (see Methods), which warrant confidence in three key conclusions with respect to the frequency of structural disorder: (1) disorder in proteomes exceeds that estimated from genomes, with disorder in essential proteins being generally even higher; (2) disorder reaches high levels in the eukaryote: more than 20% of all residues fall into locally disordered regions and 5060% of all proteins have at least one long disordered segment, and (3) disorder by most measures is about three times higher in yeast than in the bacterium. The first conclusion underlines the premise of our study that genome-based assessment of protein disorder is of uncertain relevance, and has actually resulted in an underprediction of this structural feature. The observed abundance in the proteome, and in essential proteins in particular, suggests that structural disorder is an important structural and functional trait, which is probably directly related to the functional advantages extreme flexibility confers onto proteins. As discussed in prior reviews,3-5,25,26 IUPs fit into six broad functional categories, in which function either directly stems form the ability of the polypeptide chain to attain multiple conforma1998

Journal of Proteome Research • Vol. 5, No. 8, 2006

tions or from molecular recognition, when the disordered protein binds other proteins, DNA, RNA, or small ligands. This kind of recognition is accompanied by partial or full-scale induced folding,27 which may provide for an increased speed of association, binding specificity without excessive binding strength, or a large surface for binding multiple partners. The apparent benefits of these functional modes in regulatory functions may explain the advance of protein disorder in eukaryotic evolution.10-12 An additional constituent is provided by that structural disorder also permits an exquisite adaptability in the binding process that enables binding to different partners, i.e., binding promiscuity,28 an extreme functional manifestation of which is the potential to fulfill more than one, unrelated functions, i.e., to moonlight.29 This capacity may be of prime importance in maintaining complex intracellular networks required to support eukaryotic cell function. To test the general validity of these inferences at the proteome level, we assessed the occurrence of disorder in various “biological process” Gene Ontology (GO) categories30 for both species. Predictions by the two predictors have been averaged, and are presented by the three measures (% of residues in disordered regions, % of proteins with at least one long disordered segment and % of proteins disordered along their entire length) in Tables 2 and 3. The ratio of disorder varies widely along the various categories (e.g., it covers the range of 0% to close to 100% of proteins with at least one long disordered region). A clear functional separation can be seen between categories, which have very low or very high levels of disorder: in the low-

research articles

Prevalent Structural Disorder

K60694, NKFP MediChem2, and the Wellcome Trust International Senior Research Fellowship ISRF 067595. We also acknowledge the Bolyai Ja´nos fellowships for Z.D. and P.T. Thanks are due to Dr. Monika Fuxreiter for her critical comments on the manuscript.

References

Figure 1. Structural disorder in E. coli and S. cerevisiae proteomes. Structural disorder is predicted for the E. coli (gray columns) and S. cerevisiae (empty columns) genomes, proteomes and essential proteins, by two predictors relying on different principles, IUPred (a) and PONDR VSL1 (b). For each prediction three different numbers have been determined (cf. section Prediction of structural disorder), of which the percent of proteins with at least one long (>30 residues) disordered segment is shown here. All data are summarized in Table 1.

disorder categories, almost all classes are associated with catalytic functions, such as metabolism and biogenesis, whereas in the high-disorder region regulatory functions such as cell cycle, signal transduction, and transcription dominate. In addition, functional categories unique to yeast (e.g., cell budding, sporulation or meiosis) invariably fall into the highdisorder half of the list (Table 3). The signifiant increase of disorder in common regulatory categories (e.g., from 15.25% to 80.99% in transcription or 14.51% to 75.00% in signal transduction in the case of proteins with at least one long disordered region), and the abundance in the yeast-specific functions translate into the large excess of disorder in S. cerevisiae over E. coli. This observation substantiates at the proteome level prior suggestions that disorder increases with increasing complexity of organisms.10,11,22

Conclusions Our observations reinforce that due to its importance in basic regulatory functions, structural disorder constitutes a key mechanistic element of protein function at the level of the proteome. This notion puts renewed emphasis on previous claims1,3-5 that identification and detailed characterization of IUPs is a task imperative for reassessing the structure-function paradigm of proteins.1 In accord, we might safely conclude that the quest for obtaining a coherent molecular picture of eukaryotic cell function has taken on an exciting novel twist.

Acknowledgment. This work was supported by Grant Nos. GVOP-3.2.1.-2004-05-0195/3.0, OTKA F043609, T049073,

(1) Wright, P. E.; Dyson, H. J. Intrinsically unstructured proteins: reassessing the protein structure-function paradigm. J. Mol. Biol. 1999, 293, 321-331. (2) Uversky, V. N.; Gillespie, J. R.; Fink, A. L. Why are “natively unfolded” proteins unstructured under physiologic conditions? Proteins 2000, 41, 415-427. (3) Dunker, A. K.; Brown, C. J.; Lawson, J. D.; Iakoucheva, L. M.; Obradovic, Z. Intrinsic Disorder and Protein Function. Biochemistry 2002, 41, 6573-6582. (4) Tompa, P. Intrinsically unstructured proteins. Trends Biochem. Sci. 2002, 27, 527-533. (5) Dyson, H. J.; Wright, P. E. Intrinsically unstructured proteins and their functions. Nat. Rev. Mol. Cell. Biol. 2005, 6, 197-208. (6) Dawson, R.; Muller, L.; Dehner, A.; Klein, C.; Kessler, H.; Buchner, J. The N-terminal domain of p53 is natively unfolded. J. Mol. Biol. 2003, 332, 1131-1141. (7) Mark, W. Y.; Liao, J. C.; Lu, Y.; Ayed, A.; Laister, R.; Szymczyna, B.; Chakrabartty, A.; Arrowsmith, C. H. Characterization of segments from the central region of BRCA1: an intrinsically disordered scaffold for multiple protein-protein and proteinDNA interactions? J. Mol. Biol. 2005, 345, 275-287. (8) Zahn, R.; Liu, A.; Luhrs, T.; Riek, R.; von Schroetter, C.; Lopez Garcia, F.; Billeter, M.; Calzolai, L.; Wider, G.; Wuthrich, K. NMR solution structure of the human prion protein. Proc. Nat’l. Acad. Sci. U.S.A. 2000, 97, 145-150. (9) Vucetic, S.; Obradovic, Z.; Vacic, V.; Radivojac, P.; Peng, K.; Iakoucheva, L. M.; Cortese, M. S.; Lawson, J. D.; Brown, C. J.; Sikes, J. G.; Newton, C. D.; Dunker, A. K. DisProt: a database of protein disorder. Bioinformatics 2005, 21, 137-140. (10) Dunker, A. K.; Obradovic, Z.; Romero, P.; Garner, E. C.; Brown, C. J. Intrinsic protein disorder in complete genomes. Genome Inform. Ser. Workshop Genome Inform. 2000, 11, 161-171. (11) Ward, J. J.; Sodhi, J. S.; McGuffin, L. J.; Buxton, B. F.; Jones, D. T. Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J. Mol. Biol. 2004, 337, 635-645. (12) Oldfield, C. J.; Cheng, Y.; Cortese, M. S.; Brown, C. J.; Uversky, V. N.; Dunker, A. K. Comparing and combining predictors of mostly disordered proteins. Biochemistry 2005, 44, 1989-2000. (13) Lopez-Campistrous, A.; Semchuk, P.; Burke, L.; Palmer-Stone, T.; Brokx, S. J.; Broderick, G.; Bottorff, D.; Bolch, S.; Weiner, J. H.; Ellison, M. J. Localization, annotation, and comparison of the Escherichia coli K-12 proteome under two states of growth. Mol. Cell. Proteomics 2005, 4, 1205-1209. (14) Taoka, M.; Yamauchi, Y.; Shinkawa, T.; Kaji, H.; Motohashi, W.; Nakayama, H.; Takahashi, N.; Isobe, T. Only a small subset of the horizontally transferred chromosomal genes in Escherichia coli are translated into proteins. Mol. Cell. Proteomics 2004, 3, 780-787. (15) Ghaemmaghami, S.; Huh, W. K.; Bower, K.; Howson, R. W.; Belle, A.; Dephoure, N.; O’Shea, E. K.; Weissman, J. S. Global analysis of protein expression in yeast. Nature (London) 2003, 425, 737741. (16) International Hum. Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature (London) 2004, 431, 931-45. (17) Hashimoto, M.; Ichimura, T.; Mizoguchi, H.; Tanaka, K.; Fujimitsu, K.; Keyamura, K.; Ote, T.; Yamakawa, T.; Yamazaki, Y.; Mori, H.; Katayama, T.; Kato, J. Cell size and nucleoid organization of engineered Escherichia coli cells with a reduced genome. Mol. Microbiol. 2005, 55, 137-149. (18) Winzeler, E. A.; Shoemaker, D. D.; Astromoff, A.; Liang, H.; Anderson, K.; Andre, B.; Bangham, R.; Benito, R.; Boeke, J. D.; Bussey, H.; Chu, A. M.; Connelly, C.; Davis, K.; Dietrich, F.; Dow, S. W.; El Bakkoury, M.; Foury, F.; Friend, S. H.; Gentalen, E.; Giaever, G.; Hegemann, J. H.; Jones, T.; Laub, M.; Liao, H.; Liebundguth, N.; Lockhart, D. J.; Lucau-Danila, A.; Lussier, M.; M′Rabet, N.; Menard, P.; Mittmann, M.; Pai, C.; Rebischung, C.; Revuelta, J. L.; Riles, L.; Roberts, C. J.; Ross-MacDonald, P.; Scherens, B.; Snyder, M.; Sookhai-Mahadeo, S.; Storms, R. K.;

Journal of Proteome Research • Vol. 5, No. 8, 2006 1999

research articles Veronneau, S.; Voet, M.; Volckaert, G.; Ward, T. R.; Wysocki, R.; Yen, G. S.; Yu, K.; Zimmermann, K.; Philippsen, P.; Johnston, M.; Davis, R. W. Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 1999, 285, 901-906. (19) Giaever, G.; Chu, A. M.; Ni, L.; Connelly, C.; Riles, L.; Veronneau, S.; Dow, S.; Lucau-Danila, A.; Anderson, K.; Andre, B.; Arkin, A. P.; Astromoff, A.; El-Bakkoury, M.; Bangham, R.; Benito, R.; Brachat, S.; Campanaro, S.; Curtiss, M.; Davis, K.; Deutschbauer, A.; Entian, K. D.; Flaherty, P.; Foury, F.; Garfinkel, D. J.; Gerstein, M.; Gotte, D.; Guldener, U.; Hegemann, J. H.; Hempel, S.; Herman, Z.; Jaramillo, D. F.; Kelly, D. E.; Kelly, S. L.; Kotter, P.; LaBonte, D.; Lamb, D. C.; Lan, N.; Liang, H.; Liao, H.; Liu, L.; Luo, C.; Lussier, M.; Mao, R.; Menard, P.; Ooi, S. L.; Revuelta, J. L.; Roberts, C. J.; Rose, M.; Ross-Macdonald, P.; Scherens, B.; Schimmack, G.; Shafer, B.; Shoemaker, D. D.; Sookhai-Mahadeo, S.; Storms, R. K.; Strathern, J. N.; Valle, G.; Voet, M.; Volckaert, G.; Wang, C. Y.; Ward, T. R.; Wilhelmy, J.; Winzeler, E. A.; Yang, Y.; Yen, G.; Youngman, E.; Yu, K.; Bussey, H.; Boeke, J. D.; Snyder, M.; Philippsen, P.; Davis, R. W.; Johnston, M. Functional profiling of the Saccharomyces cerevisiae genome. Nature (London) 2002, 418, 387-391. (20) Balakrishnan, R.; Christie, K. R.; Costanzo, M. C.; Dolinski, K.; Dwight, S. S.; Engel, S. R.; Fisk, D. G.; Hirschman, J. E.; Hong, E. L.; Nash, R.; Oughtred, R.; Skrzypek, M.; Theesfeld, C. L.; Binkley, G.; Lane, C.; Schroeder, M.; Sethuraman, A.; Dong, S.; Weng, S.; Miyasato, S.; Andrada, R.; Botstein, D.; Cherry, J. M. Saccharomyces Genome Database. http://www.yeastgenome.org/. (21) Doszta´nyi, Z.; Csizmok, V.; Tompa, P.; Simon, I. IUPred: web server for the prediction of intrinsically unstructured regions of

2000

Journal of Proteome Research • Vol. 5, No. 8, 2006

Tompa et al.

(22) (23)

(24) (25) (26) (27) (28)

(29) (30)

proteins based on estimated energy content. Bioinformatics 2005, 21, 3433-3434. Obradovic, Z.; Peng, K.; Vucetic, S.; Radivojac, P.; Dunker, A. K. Exploiting heterogeneous sequence properties improves prediction of protein disorder. Proteins 2005, 61 Suppl 7, 176-182. Doszta´nyi, Z.; Csizmok, V.; Tompa, P.; Simon, I. The pairwise energy content estimated from amino acid composition discriminates between folded and instrinsically unstructured proteins. J. Mol. Biol. 2005, 347, 827-839. Jin, Y.; Dunbrack, R. L., Jr. Assessment of disorder predictions in CASP6. Proteins 2005, 61 Suppl 7, 167-175. Gunasekaran, K.; Tsai, C. J.; Kumar, S.; Zanuy, D.; Nussinov, R. Extended disordered proteins: targeting function with less scaffold. Trends Biochem. Sci. 2003, 28, 81-85. Tompa, P. The interplay between structure and function in intrinsically unstructured proteins. FEBS Lett. 2005, 579, 33463354. Dyson, H. J.; Wright, P. E. Coupling of folding and binding for unstructured proteins. Curr. Opin. Struct. Biol. 2002, 12, 54-60. Kriwacki, R. W.; Hengst, L.; Tennant, L.; Reed, S. I.; Wright, P. E. Structural studies of p21Waf1/Cip1/Sdi1 in the free and Cdk2bound state: conformational disorder mediates binding diversity. Proc. Nat’l. Acad. Sci. U.S.A. 1996, 93, 11504-11509. Tompa, P.; Szasz, C.; Buday, L. Structural disorder throws new light on moonlighting. Trends Biochem. Sci. 2005, 30, 484-489. Gene Ontology Consortium. Creating the gene ontology resource: design and implementation. Genome Res. 2001, 11, 1425-1433.

PR0600881