Counting the Zinc-Proteins Encoded in the Human Genome - Journal

Dec 15, 2005 - This is achieved by a combination of approaches, which include: (i) searching in the proteome for the zinc-binding patterns that, on th...
1 downloads 13 Views 253KB Size
Counting the Zinc-Proteins Encoded in the Human Genome Claudia Andreini†,,‡ Lucia Banci†,,‡,§ Ivano Bertini†,,‡ and Antonio Rosato†,‡ Magnetic Resonance Center (CERM), University of Florence, Via L. Sacconi 6, 50019 Sesto Fiorentino, Italy, Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy, and ProtEra s.r.l., Viale delle Idee 26, 50019, Sesto Fiorentino, Italy Received October 26, 2005

Abstract: Metalloproteins are proteins capable of binding one or more metal ions, which may be required for their biological function, or for regulation of their activities or for structural purposes. Genome sequencing projects have provided a huge number of protein primary sequences, but, even though several different elaborate analyses and annotations have been enabled by a rich and ever-increasing portfolio of bioinformatic tools, metalbinding properties remain difficult to predict as well as to investigate experimentally. Consequently, the present knowledge about metalloproteins is only partial. The present bioinformatic research proposes a strategy to answer the question of how many and which proteins encoded in the human genome may require zinc for their physiological function. This is achieved by a combination of approaches, which include: (i) searching in the proteome for the zinc-binding patterns that, on their turn, are obtained from all available X-ray data; (ii) using libraries of metal-binding protein domains based on multiple sequence alignments of known metalloproteins obtained from the Pfam database; and (iii) mining the annotations of human gene sequences, which are based on any type of information available. It is found that 1684 proteins in the human proteome are independently identified by all three approaches as zinc-proteins, 746 are identified by two, and 777 are identified by only one method. By assuming that all proteins identified by at least two approaches are truly zinc-binding and inspecting the proteins identified by a single method, it can be proposed that ca. 2800 human proteins are potentially zinc-binding in vivo, corresponding to 10% of the human proteome, with an uncertainty of 400 sequences. Available functional information suggests that the large majority of human zinc-binding proteins are involved in the regulation of gene expression. The most abundant class of zinc-binding proteins in humans is that of zinc-fingers, with Cys4 and Cys2His2 being the most common types of coordination environment. * To whom correspondence should be addressed. Tel: +39 055 4574272. Fax: +39 055 4574271. E-mail: [email protected]. † Magnetic Resonance Center (CERM), University of Florence. ‡ Department of Chemistry, University of Florence. § ProtEra s.r.l..

196

Journal of Proteome Research 2006, 5, 196-201

Published on Web 12/15/2005

Keywords: zinc • metalloproteins • zinc finger • metalloprotease

Introduction Metal ions play crucial roles in most of the biochemical processes at the basis of Life.1 Indeed, many proteins need to bind one or more metal ions to be able to perform their function (hence called metalloproteins), either because the metal ion is involved in the catalytic mechanism or because it stabilizes/determines the protein tertiary or quaternary structure. Metal ions are also very important for the structure and function (in the case of RNA) of nucleic acids. The intracellular concentration of several metals as well as their distribution among the various cell compartments and their incorporation into metalloproteins is tightly controlled.2-4 A proper balance of the equilibria involved in these control processes is necessary for a healthy phenotype. Genome sequencing projects have provided us with the sequences of all the proteins that the various organisms can produce (proteome). To fully exploit these data, functional annotation is performed based on gene prediction algorithms and homology searches.5-7 The result in terms of attributing a function to each protein potentially produced by an organism is thus determined by features such as whether a previously characterized homologue is found, the level of homology detected (e.g., determined by the E-value, or by a sequence identity threshold), or the number of homologues identified. The same concepts apply to metalloproteins, where however the presence of a motif, which is necessary for the protein to bind the metal ion(s), could be exploited to improve the identification and classification (including functional inferences) of proteins.8-10 Genome-wide identification and classification of metalloproteins, e.g., zinc proteins, may be difficult even experimentally,11 even though some systematic efforts in this direction have been recently deployed.12-14 Typical possible problems in the experimental detection of an uncharacterized metalloprotein are that the protein may be obtained in the demetalated form or that a nonmetalloprotein may be purified in association to a spurious metal ion. Zinc is essential for Life and is the second most abundant transition metal ion in living organisms after iron. In contrast to other transition metal ions, such as copper and iron, zinc(II) does not undergo redox reactions thanks to its filled d shell. In Nature, it has essentially two possible roles: catalytic or structural. Zinc may also modulate signaling events, as it occurs in process(es) maintaining zinc homeostasis, e.g., through zincregulated protein expression.15;16 In this work, we have ad10.1021/pr050361j CCC: $33.50

 2006 American Chemical Society

technical notes dressed the problem of the definition of the content of zincbinding proteins in the human proteome. This has been done by exploiting a previously proposed strategy to search for metalloproteins, where all structurally characterized metal binding patterns (MBP’s) are retrieved from the PDB and used together with sequence analysis to scan the proteins encoded in genomes.10 In parallel, a library of metal-binding protein domains based on multiple sequence alignments of known metalloproteins has been used (taken from Pfam17) to browse the human proteome. A third complementary approach is based on the annotations of human gene sequences, which are based on any type of information available. The results of this work are available from a web-accessible database at http:// www.postgenomicnmr.net:8000/.

Methods All 2395 structures available in the PDB18 as of June 2005 and containing at least one Zn ion in the coordinate file have been retrieved. To obtain a nonredundant ensemble of structures, the proteins binding with the same pattern(s) and having a sequence identity greater than 98% were considered the same protein. A set of 890 structures of nonredundant protein sequences was thus collected. The latter have been analyzed by checking the publication describing the structure to remove proteins binding zinc spuriously (e.g., because of crystallization conditions, such as in 1GUD), resulting in 782 structures. The coordinate files of each structure were used to identify the residues coordinating the metal ion(s), which define the Metal Binding Pattern (MBP). Every residue having at least one heavy atom at a distance shorter than 2.8 Å from the metal was defined as a ligand. Note that more than one MBP can be associated to a single protein, depending on the number of metal ions bound. As a result of this analysis, a library of MBP’s was assembled, containing 683 distinct MBP’s. The entire PDB sequence repertoire (containing both proteins binding and not binding zinc) was then aligned against each of the zinc-proteins from the PDB (query sequence), using the MBP as a seed for alignment with the program PHI-BLAST.19 The coordinates of each protein retrieved by a given query were analyzed to separate cases in which the MBP actually binds a metal ion from those in which it does not. The ratio between the number of identical amino acids aligned around the MBP and the input PDB protein length (Idglobal) was calculated as a quality parameter.10 Values greater than 0.2 gave 99% probability that the protein is a zinc-protein. After the above calibration, all sequences in the human proteome (release 35)20;21 have been similarly analyzed. Predicted human zinc-proteins (Idglobal > 0.2) were then grouped, by creating ensembles containing sequences with at least 40% pairwise identity. The multiple alignment of the sequences in each ensemble was then used to create a profile, which was used as input to the program HMMER, to perform searches with enhanced sensitivity.22 In this way, it was possible to retrieve some additional putative human zinc-binding protein sequences. BLAST23 searches were performed when starting from individual sequences. In all cases, the conservation of the MBP was imposed as an additional constraint. The use of profiles does not explicitly takes into account the MBP, which is thus allowed some variability. Its conservation is checked a posteriori, using a filter where conservation of the ligands is strictly required but some variability in their spacing is allowed (within 20% when the distance between ligands is higher than 5 residues or one residue in all other cases). All of the proteins

Andreini et al.

retrieved have been clustered again, and the ensembles obtained inspected to detect possible false positives. In particular, ensembles containing proteins with ambiguous annotations (e.g., “hypothetical”) or annotations indicating that the proteins did not bind zinc were checked, by verifying the presence of a common MBP as well as by looking in the literature for experimental evidence in homologous systems (identified with BLAST searches against the whole gene bank). To inspect annotations, the genome bank was queried for the words “zinc” or “Zn”, restricting the search to human proteins. The query does not only inspect a protein’s annotation, but also references, protein domains, and all other notes associated to it. Proteins whose annotation contained the keywords in reference to another protein rather then the one retrieved were manually removed from the ensemble. To look for the presence of zinc-binding protein domains in human proteins, the Pfam library of domains17 was used. All domains whose description contained the words “zinc” or “Zn” were used as input for searches in the proteome. An E-value17 of 0.05 was taken as the threshold to assign a domain to a protein sequence. All proteins retrieved were kept. To obtain functional hints, all predicted zinc-protein sequences were analyzed against the entire Pfam domain library as well as in terms of their Gene Ontology (GO)24 functional annotation.

Results and Discussion An Estimate of the Number of Putative Human ZincBinding Proteins. The human proteome (the ensemble of all proteins encoded in the human genome)20,21,25 was searched for zinc-proteins using three complementary approaches: structure-, annotation-, and domain-based. The structurebased approach exploits the structural information available from the PDB to automatically gather a library of all possible MBP’s associated with zinc-binding. These are then searched for in the human proteome, using sequence identity around the MBP as a parameter to guide the identification. In the annotation-based approach, the text of the annotations (which often contain a description of the putative function) attached to genes in the human genome has been mined to extract all proteins functionally associated with zinc. Finally, in the domain-based approach all protein domains defined in the Pfam database have been analyzed to identify those functionally associated to zinc. This has been done initially by inspecting the description of the various domains contained in the database. Figure 1 summarizes the results of the three approaches and their overlap. The dataset of initial results has been extensively inspected manually to remove clear false positives, such as some Cys-rich proteins unlikely to have a physiologically relevant zinc-binding activity. It is then found that 1684 proteins are independently identified by all three methods as potential zinc-binding proteins, 746 are identified by two, and 777 are identified by only one approach. All three methods detect a similar share of so-called hypothetical proteins, i.e., proteins predicted to exist on the basis of the genome sequence but completely lacking experimental validation. By counting the proteins identified by at least two methods it is possible to set a most probable lower limit for the number of human zincproteins at 2430. It is instructive to look at the ensembles of proteins independently identified by two out of three methods, to identify possible limitations or caveats associated with each of the three search methods. In particular, the two most Journal of Proteome Research • Vol. 5, No. 1, 2006 197

Human Zinc-Binding Proteins

Figure 1. Number of zinc-proteins in the human proteome predicted by the three approaches, and their overlap (number of proteins identified by one, two or all three of the methods). The total number of putative zinc-proteins is 3207, with 2430 identified by at least two methods.

interesting groups are those of proteins identified by the annotation- and domain-based methods, counting 301 proteins, and of proteins identified by the structure- and domainbased methods, counting 348 proteins. The ensemble of putative zinc-binding proteins identified by the annotationand structure-based approaches, but not by the domain-based method is somewhat smaller, possibly indicating that for known proteins similar to others already structurally characterized, the coverage by the Pfam domain database is already relatively good. Proteins identified by the annotation- and domain-based methods but not by the structure-based method are mainly characterized by the presence of zinc-binding domains that are not structurally characterized and/or (more often) that are very short in sequence (20-40 amino acids), with relatively high variability of the sequence or of the ligand spacing, both of which prevent detection by our structure-based search method. An example of the first kind is the THAP domain, a ca. 90residue domain restricted to animals, which is involved in DNA binding and not structurally characterized, featuring a conserved Cys2His2 coordination environment. A quite unique case is that of some members of the ADAMTS family of metalloproteases (isoenzymes 2, 11, 18, 22, 23, 32), which are annotated as zinc-binding and contain a conserved zinc-dependent protease domain, but in fact lack the zinc ligands, as previously shown,26 and thus are correctly ignored by the structure-based method. This example shows the importance of associating a MBP to protein sequences and/or domains, as a further filter to enhance the reliability of the predictions when performing genome-scale analyses. As an example of proteins containing a zinc-binding domain structurally characterized but with variable ligand spacing, it is possible to take the PHD domain, an interleaved type of Zn-finger chelating 2 zinc ions ca. 50residue long. Regarding the 348 putative zinc-binding proteins identified by the structure- and domain-based methods, the most common reason these are not retrieved by text-mining the annotations of the proteome is that metal-binding capabilities are referred to without specifying the nature of the metal bound. It is expected that, because annotations are reviewed and improved quite often, these discrepancies will disappear rapidly. It is not obvious to assess the percentage of false zincproteins among the 777 sequences identified by a single method. The sequences retrieved only by the structure-based method correspond mostly to validated zinc-binding proteins, where however zinc-binding is not always conserved within the protein family, or where zinc-binding has not been 198

Journal of Proteome Research • Vol. 5, No. 1, 2006

technical notes highlighted in the Pfam description and in the genome annotations. An example of the latter case is that of histone deacetylases where a number of structures containing zinc are available (for example PDB 1T64), but this is not described even in the Gene Ontology (GO) functional annotation. Another case worth mentioning is that of metallophosphatases, where there is still some ambiguity on the identity of the functionally relevant metal ions, also given the fact that occupancy of the catalytic site by a mixture of metals has been reported.27 The active site of metallophosphatases in fact contains a pair of divalent metal ions, which could be zinc/iron or manganese/ manganese.28 Proteins in this family are retrieved by the structure-based method, due to the presence of structures in the PDB with zinc as the bound metal. Indeed, at present the hypothesis that the relevant metalation of metallophosphatases is zinc/iron is receiving most support.28,29 When analyzing protein sequences retrieved by text-mining the annotations of the human proteome, it is observed that the large majority of the sequences are annotated as putative zinc-fingers or, in a smaller number of instances, as containing protein domains associated to other zinc-binding domains, even if a Pfam analysis with default settings fails to reveal the presence of any zinc-finger or other zinc-binding domain. A few of the putative zinc-binding proteins detected only by the annotation-based approach are proteins for which interaction with zinc has been demonstrated only biochemically, e.g. as in the case of plakophilins.30 Overall, it is reasonable to assume that ca. half of the human protein sequences retrieved by a single search method can be taken as putative zinc-binding in vivo. It can be thus concluded that, based on current knowledge, some 2800 human proteins should be zinc-binding, corresponding to 10% of the 27960 sequences in the release of the human proteome used (www.ncbi.nlm.nih.gov/RefSeq/) with an uncertainty of 400 sequences. In the future, it is expected that the three approaches to prediction will converge, as more structures of zinc-proteins will be deposited in the PDB (see later) or biochemical characterization will prove or disprove the annotation of zincproteins. Projection of the Coverage of Human Zinc-Binding Proteins. As the results described in the preceding section depend entirely on our current knowledge about zinc-binding proteins, it is important to address, at least qualitatively, the issue of what kind of variation in the predictions will result from the foreseeable future expansion of our knowledge. To estimate this, we reasoned that a gross correlation between the number of zinc-proteins identified biochemically and the number of their structures in the PDB is expected. In fact, increased biochemical information about the presence of zinc-binding sites will presumably help in structural determination of zincproteins, and, viceversa, detection of unexpected zinc ions in crystal structures will prompt biochemical studies. Therefore, the growth of the number of predicted zinc-proteins as a function of the number of zinc-protein structures available over the years should be a good indicator of the trend of variation of the numbers predicted. This is shown in Figure 2. It can be seen that a burst in prediction occurs when the data accumulated in the early 90’s are taken into account, due to the release of zinc-finger structures.31 Zinc fingers are the most common human zinc-proteins (see later). After the release of the first 75 experimental structures of different zinc-proteins, the subsequent nearly twelve-fold increase in the amount of experimental information available resulted in the number of

technical notes

Figure 2. Number of zinc-proteins predicted from the same human proteome through the structure-based approach as a function of the number of experimental zinc-protein structures available from the PDB. The abscissa of each point corresponds to the number of structures released at the end of the indicated year. Dates are taken from the “Release date” field in the PDB file. Structures of proteins with sequence identity greater than 90% have been counted only once.

predicted zinc-proteins being less than doubled. This is due to the fact that new structures do not always contribute new MBP’s. In conclusion, the number of predicted zinc-proteins is expected to vary modestly in the future. Consequently, the present estimate of zinc-binding proteins being 10% of the human proteome should be relatively stable. Physiological Role of Zinc. The overwhelming majority of the MBPs detected among putative human zinc-binding proteins retrieved by the structure-based search approach have four ligands, and constitute 83% of all zinc-binding patterns detected in the human proteome. MBPs with three ligands are found in 13% of the proteins retrieved. A distribution of the various types of patterns, based on the identity and position along the sequence of the ligands is shown in Figure 3. It can be seen that nearly all (97%) of four-ligand MBPs contain at least one Cys ligand, with 40% of the MBPs containing four Cys, and 27% being of the type CCHH (ordered according to the sequence), which correspond to the MBP of the most common human zinc fingers. Notably, even if MBPs may have the same ligands (e.g., two Cys and two His), there is often a strong preference for a given order in the amino acid sequence (e.g., the pattern CCHC is much more common than CHCC, or CCHH is much more common than CHCH, see Figure 3). In zinc-proteins where the metal ion is coordinated by three ligands, the most common MBP type is by far HHH (31%). This pattern is contained in several zinc-dependent enzymes, such as carbonic anhydrases and matrix metalloproteinases. The HHH type of pattern can thus be associated with the most common zinc-dependent human catalytic site. A functional classification of the human zinc-binding proteins can be attempted based on the available literature, the GO classification and their domain composition. More than 40% of the putative zinc-proteins can be classified as transcription factors (comprising mostly zinc fingers) or zinc fingers with no additional functional information (Figure 4). This result parallels the strong preference for tetracoordinate zinc-binding sites, where zinc plays more often a structural rather than catalytic role. The remaining 60% is composed mostly by enzymes, but includes also other proteins (e.g., involved in ion

Andreini et al.

Figure 3. Distribution of the MBP types detected in putative human zinc-proteins. Only patterns with four (top) or three (bottom) ligands are shown, which cumulatively group ca. 96% of the proteins (83% and 13%, respectively). Pattern types are distinguished by the zinc ligands and by their order in the protein sequence (N- to C-terminal).

transport). Zinc-binding proteins were identified for all the classes of enzymes (Figure 4). The most populated class is that of hydrolases, which contains peptidases and phosphatases. In general, both use a zinc-activated water molecule to perform a nucleophilic attack on the substrate.32 The second most numerous class of enzymes is that of tranferases (Figure 4) largely composed by kinases, where zinc plays mainly a structural role and is not necessarily conserved. Among transport proteins, it is worth mentioning potassium voltage-gated channels. In these systems, zinc ions are essential for the assembly of the functional tetrameric channel.33,34 The zinc is tetrahedrally coordinated by 3 Cys and 1 His (the MBP is HX(5)CX(20)CX(0)C) in which the first Cys is actually provided by a monomer and the other three ligands by another monomer. In this example, zinc plays an interesting variation of its common structural role, i.e., it is responsible for stabilizing the quaternary structure. By comparing the analyses summarized in Figures 3 and 4, it is found that some MBP are specifically associated to a functional role, whereas other MBPs are found in proteins belonging to different functional classes. For example, the common CCHH pattern is quite characteristic for the functional class of transcription factors. Instead another very common MBP, the pattern CX(2)CX(10-24)CX(2)C, is contained in proteins with very different functions such as transcription factors and enzymes with different catalytic activities, and is involved in a structural motif, the treble clef motif,35 which is present in different Pfam domains. From the physiological point of view, it can be speculated that the observed distribution of functions and binding patterns is strongly driven by the stringent need of the human organism for a tight regulation of gene expression in response to the complexity of the organism itself. Zinc-fingers (as well as other zinc-binding protein domains) with their characteristic CCHH MBP, constitute highly successful, and thus intensely used, flexible modules to be exploited for this purpose.36 Journal of Proteome Research • Vol. 5, No. 1, 2006 199

Human Zinc-Binding Proteins

technical notes

Figure 4. Functional classification of all 3207 putative human zinc-proteins. Most zinc fingers are included among transcription factors. 427 zinc fingers that are not annotated as such have been maintained separate.

Finally, it is important to keep in mind that the zinc-proteins identified in this work are only putative, i.e., they are likely to be able to bind zinc in vitro, but whether they do bind zinc in vivo is not directly assessed. The same combination of ligands can be found in patterns binding to different metals in vivo (e.g., patterns with four Cys can bind zinc or iron-sulfur clusters). However, in this respect the structure-based method proves quite selective due to the fact that the number as well as the identity of the amino acids between and neighboring the ligands is explicitly taken into account. In cells, metalloproteins often acquire metals via specific delivery pathways, and thus metal occupancy is determined by their interactions with metallochaperones,2-4,37,38 which is kinetically controlled. Consequently, a given metal-binding site in vivo may be loaded with a metal that is not the one binding with the highest affinity, or even with different metals under different environmental conditions.39 These caveats need be addressed in the context of the living cell, and bioinformatics can provide hints to direct experimental analyses.

Conclusion In summary, the present work presents a first step in the direction of compiling a catalog of human zinc-proteins, based on a bioinformatic analysis of the genome sequence using an integration of different tools. 10% of the human proteome is potentially zinc-binding. This research provides hints on the use of zinc in humans, especially highlighting the importance of this metal in the regulation of gene expression. Tables reporting the putative zinc-binding proteins grouped in the 7 ensembles shown in Figure 1 are available at the address: http://www.postgenomicnmr.net:8000/Ftp. The database can also be browsed on line at http://www.postgenomicnmr.net:8000/. Abbreviations: MBP: metal-binding pattern.

Acknowledgment. This work was supported by Ente Cassa di Risparmio di Firenze, the European Commission (SPINE project, contract QLG2-CT-2002-00988) and MIUR (COFIN 2003 and FISR). References (1) Bertini, I.; Sigel, A.; Sigel, H. Handbook on Metalloproteins; Marcel Dekker: New York, 2001; pp 1-1800.

200

Journal of Proteome Research • Vol. 5, No. 1, 2006

(2) Changela, A.; Chen, K.; Xue, Y.; Holshen, J.; Outten, C. E.; O’Halloran, T. V.; Mondragon, A. Science 2003, 301, 1383-1387. (3) Finney, L. A.; O’Halloran, T. V. Science 2003, 300, 931-936. (4) Bertinato, J.; L′Abbe, M. R. J. Biol. Chem. 2003, 278, 35071-35078. (5) Mathe, C.; Sagot, M. F.; Schiex, T.; Rouze, P. Nucleic Acids Res. 2002, 30, 4103-4117. (6) Milanesi, L.; D′Angelo, D.; Rogozin, I. B. Bioinformatics 1999, 15, 612-621. (7) Wiehe, T.; Gebauer-Jung, S.; Mitchell-Olds, T.; Guigo, R. Genome Res. 2001, 11, 1574-1583. (8) Bertini, I.; Luchinat, C.; Provenzani, A.; Rosato, A.; Vasos, P. R. Proteins Struct. Funct. Genet. 2002, 46, 110-127. (9) Degtyarenko, K.; Contrino, S. BMC. Struct. Biol. 2004, 4, 3. (10) Andreini, C.; Bertini, I.; Rosato, A. Bioinformatics 2004, 20, 13731380. (11) Bertini, I.; Rosato, A. Proc. Natl. Acad. Sci. U.S.A. 2003, 100, 36013604. (12) Scott, R. A.; Shokes, J. E.; Cosper, N. J.; Jenney, F. E.; Adams, M. W. W. J. Synchrotron Rad. 2005, 12, 19-22. (13) Hogbom, M.; Ericsson, U. B.; Lam, R.; Bakali, H. M.; Kuznetsova, E.; Nordlund, P.; Zamble, D. B. Mol. Cell Proteomics 2005, 4, 827834. (14) Shi, W.; Zhan, C.; Ignatov, A.; Manjasetty, B. A.; Marinkovic, N.; Sullivan, M.; Huang, R.; Chance, M. R. Structure (Camb.) 2005, 13, 1473-1486. (15) Gaither, L. A.; Eide, D. J. Biometals 2001, 14, 251-270. (16) Hantke, K. Biometals 2001, 14, 239-249. (17) Bateman, A.; Coin, L.; Durbin, R.; Finn, R. D.; Hollich, V.; GriffithsJones, S.; Khanna, A.; Marshall, M.; Moxon, S.; Sonnhammer, E. L.; Studholme, D. J.; Yeats, C.; Eddy, S. R. Nucleic Acids Res. 2004, 32 Database issue, D138-D141. (18) Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. Nucl. Acids Res. 2000, 28, 235-242. (19) Zhang, Z.; Schaffer, A. A.; Miller, W.; Madden, T. L.; Lipman, D. J.; Koonin, E. V.; Altschul, S. F. Nucl. Acids Res. 1998, 26, 39863990. (20) Venter, J. C.; Adams, M. D.; Myers, E. W.; Li, P. W.; Mural, R. J.; et al. Science 2001, 291, 1305-1351. (21) Lander, E. S.; Linton, L. M.; Birren, B.; Nusbaum, C.; Zody, M. C.; Baldwin, J.; Devon, K.; Dewar, K.; Doyle, M.; FitzHugh, W.; Funke, R.; Gage, D.; Harris, K.; Heaford, A.; Howland, J.; Kann, L.; Lehoczky, J.; Levine, R.; McEwan, P.; McKernan, K.; Meldrim, J.; Mesirov, J. P.; Miranda, C.; Morris, W.; Naylor, J.; Raymond, C.; Rosetti, M.; Santos, R.; Sheridan, A.; Sougnez, C.; StangeThomann, N.; Stojanovic, N.; Subramanian, A.; Wyman, D.; Rogers, J.; Sulston, J.; Ainscough, R.; Beck, S.; Bentley, D.; Burton, J.; Clee, C.; Carter, N.; Coulson, A.; Deadman, R.; Deloukas, P.; Dunham, A.; Dunham, I.; Durbin, R.; French, L.; Grafham, D.; Gregory, S.; Hubbard, T.; Humphray, S.; Hunt, A.; Jones, M.; Lloyd, C.; McMurray, A.; Matthews, L.; Mercer, S.; Milne, S.; Mullikin, J. C.; Mungall, A.; Plumb, R.; Ross, M.; Shownkeen, R.; Sims, S.; Waterston, R. H.; Wilson, R. K.; Hillier, L. W.; McPherson, J. D.; Marra, M. A.; Mardis, E. R.; Fulton, L. A.; Chinwalla, A. T.; Pepin, K. H.; Gish, W. R.; Chissoe, S. L.; Wendl, M. C.; Delehaunty,

technical notes K. D.; Miner, T. L.; Delehaunty, A.; Kramer, J. B.; Cook, L. L.; Fulton, R. S.; Johnson, D. L.; Minx, P. J.; Clifton, S. W.; Hawkins, T.; Branscomb, E.; Predki, P.; Richardson, P.; Wenning, S.; Slezak, T.; Doggett, N.; Cheng, J. F.; Olsen, A.; Lucas, S.; Elkin, C.; Uberbacher, E.; Frazier, M.; Gibbs, R. A.; Muzny, D. M.; Scherer, S. E.; Bouck, J. B.; Sodergren, E. J.; Worley, K. C.; Rives, C. M.; Gorrell, J. H.; Metzker, M. L.; Naylor, S. L.; Kucherlapati, R. S.; Nelson, D. L.; Weinstock, G. M.; Sakaki, Y.; Fujiyama, A.; Hattori, M.; Yada, T.; Toyoda, A.; Itoh, T.; Kawagoe, C.; Watanabe, H.; Totoki, Y.; Taylor, T.; Weissenbach, J.; Heilig, R.; Saurin, W.; Artiguenave, F.; Brottier, P.; Bruls, T.; Pelletier, E.; Robert, C.; Wincker, P.; Smith, D. R.; Doucette-Stamm, L.; Rubenfield, M.; Weinstock, K.; Lee, H. M.; Dubois, J.; Rosenthal, A.; Platzer, M.; Nyakatura, G.; Taudien, S.; Rump, A.; Yang, H.; Yu, J.; Wang, J.; Huang, G.; Gu, J.; Hood, L.; Rowen, L.; Madan, A.; Qin, S.; Davis, R. W.; Federspiel, N. A.; Abola, A. P.; Proctor, M. J.; Myers, R. M.; Schmutz, J.; Dickson, M.; Grimwood, J.; Cox, D. R.; Olson, M. V.; Kaul, R.; Shimizu, N.; Kawasaki, K.; Minoshima, S.; Evans, G. A.; Athanasiou, M.; Schultz, R.; Roe, B. A.; Chen, F.; Pan, H.; Ramser, J.; Lehrach, H.; Reinhardt, R.; McCombie, W. R.; de la Bastide, M.; Dedhia, N.; Blocker, H.; Hornischer, K.; Nordsiek, G.; Agarwala, R.; Aravind, L.; Bailey, J. A.; Bateman, A.; Batzoglou, S.; Birney, E.; Bork, P.; Brown, D. G.; Burge, C. B.; Cerutti, L.; Chen, H. C.; Church, D.; Clamp, M.; Copley, R. R.; Doerks, T.; Eddy, S. R.; Eichler, E. E.; Furey, T. S.; Galagan, J.; Gilbert, J. G.; Harmon, C.; Hayashizaki, Y.; Haussler, D.; Hermjakob, H.; Hokamp, K.; Jang, W.; Johnson, L. S.; Jones, T. A.; Kasif, S.; Kaspryzk, A.; Kennedy, S.; Kent, W. J.; Kitts, P.; Koonin, E. V.; Korf, I.; Kulp, D.; Lancet, D.; Lowe, T. M.; McLysaght, A.; Mikkelsen, T.; Moran, J. V.; Mulder, N.; Pollara, V. J.; Ponting, C. P.; Schuler, G.; Schultz, J.; Slater, G.; Smit, A. F.; Stupka, E.; Szustakowski, J.; Thierry-Mieg, D.; Thierry-Mieg, J.; Wagner, L.; Wallis, J.; Wheeler, R.; Williams, A.; Wolf, Y. I.; Wolfe, K. H.; Yang, S. P.; Yeh, R. F.; Collins, F.; Guyer, M. S.; Peterson, J.; Felsenfeld, A.; Wetterstrand, K. A.; Patrinos, A.; Morgan, M. J.; Szustakowki, J.; de Jong, P.; Catanese, J. J.; Osoegawa, K.; Shizuya, H.; Choi, S.; Chen, Y. J. Nature 2001, 409, 860-921.

Andreini et al. (22) Eddy, S. R. Bioinformatics 1998, 14, 755-763. (23) Altschul, S. F.; Madden, T. L.; Schaeffer, A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D. J. Nucl. Acids Res. 1997, 25, 3389-3402. (24) Ashburner, M.; Ball, C. A.; Blake, J. A.; Botstein, D.; Butler, H.; Cherry, J. M.; Davis, A. P.; Dolinski, K.; Dwight, S. S.; Eppig, J. T.; Harris, M. A.; Hill, D. P.; Issel-Tarver, L.; Kasarskis, A.; Lewis, S.; Matese, J. C.; Richardson, J. E.; Ringwald, M.; Rubin, G. M.; Sherlock, G. Nat. Genet. 2000, 25, 25-29. (25) Pruitt, K. D.; Maglott, D. R. Nucl. Acids Res. 2001, 29, 137-140. (26) Andreini, C.; Banci, L.; Bertini, I.; Elmi, S.; Rosato, A. J. Proteome Res. 2005, 4, 881-888. (27) Swingle, M. R.; Honkanen, R. E.; Ciszak, E. M. J. Biol. Chem. 2004, 279, 33992-33999. (28) Ullrich, V.; Namgaladze, D.; Frein, D. Toxicol. Lett. 2003, 139, 107-110. (29) Namgaladze, D.; Hofer, H. W.; Ullrich, V. J. Biol. Chem. 2002, 277, 5962-5969. (30) Hofmann, I.; Mucke, N.; Reed, J.; Herrmann, H.; Langowski, J. Eur. J. Biochem. 2000, 267, 4381-4389. (31) Lee, M. S.; Gippert, G. P.; Soman, K. V.; Case, D. A.; Wright, P. E. Science 1989, 245, 635-637. (32) Coleman, J. E. Curr. Opin. Chem. Biol. 1998, 2, 222-234. (33) Nanao, M. H.; Zhou, W.; Pfaffinger, P. J.; Choe, S. Proc. Natl. Acad. Sci. U. S.A 2003, 100, 8670-8675. (34) Bixby, K. A.; Nanao, M. H.; Shen, N. V.; Kreusch, A.; Bellamy, H.; Pfaffinger, P. J.; Choe, S. Nat. Struct. Biol. 1999, 6, 38-43. (35) Krishna, S. S.; Majumdar, A.; Grishin, N. V. Nucl. Acid Res. 2003, 31, 532-550. (36) Tupler, R.; Perini, G.; Green, M. R. Nature 2001, 409, 832-833. (37) Banci, L.; Rosato, A. Acc. Chem. Res. 2003, 36, 215-221. (38) Outten, C. E.; O’Halloran, T. V. Science 2001, 292, 2488-2492. (39) Schmidt, M.; Meier, B.; Parak, F. J. Biol. Inorg. Chem. 1996, 1, 532-531.

PR050361J

Journal of Proteome Research • Vol. 5, No. 1, 2006 201