The Ubiquitin Superfamily - ACS Publications - American Chemical

Cognia Corporation, 117 East 55th Street, New York, New York 10022. Received March 26 ... genomics efforts have uncovered raw sequence data, implying...
0 downloads 0 Views 152KB Size
The Ubiquitin Superfamily: Members, Features, and Phylogenies Christopher N. Larsen* and Hailin Wang Cognia Corporation, 117 East 55th Street, New York, New York 10022 Received March 26, 2002

The ubiquitin superfamily is a rich repository of small, conserved, functionally unique, and important proteins. Its member proteins fold simply and similarly, with kinetic and thermodynamic ease (Sorenson, J. M.; Head-Gordon, T. Toward minimalist models of larger proteins: A ubiquitin-like protein. Proteins 2002, 46, 368-379). They have been implicated in numerous cancers, neurodegenerations, inflammations, and various disorders affecting signal transduction or protein half-life. These proteins serve the cell generally as portable recognition tags with distinct intracellular roles; indeed, tagging with small protein modifiers has become a new hallmark of post-translational modifications and other signal transduction phenomenon (Finley, D. J. Signal transduction. An alternative to destruction. Nature 2001, 412, 283, 285-286). Because many ubiquitin-like proteins bear similarities in sequence, structure, and function, we gathered protein sequences containing the ubiquitin domain from public databases and created a highly granular and defined protein catabolism database to catalog, summarize, reference, and relate them to their targets and specific ligases (to be described elsewhere). In this paper, we reveal a compilation of proteins possessing the ubiquitin domain. This comprises the first and most important part of our database content. We searched available organismal proteomes for sequencerelated members of the ubiquitin superfamily and here present over 200 proteins possessing this domain. These proteins were organized phylogenetically and functionally, thereby defining several new families. To our knowledge, this is the most complete assemblage of ubiquitin domains to date. Keywords: ubiquitin • SUMO-1 • NEDD8 • proteolysis • post-translational modification • signal transduction • catabolism • database

Background Ubiquitin, a 76-residue polypeptide present universally in eukaryotes, is widely known as a post-translational tag used to signal a protein’s hydrolytic destruction.1-3 Other functions have emerged for ubiquitin recently as well, depending on its differential internal isopeptide linkages.4 In addition, several ubiquitin-like proteins have been discovered from genomesequencing efforts, other structural studies, and genetic screens. These new data show that proteins with the ubiquitin domain are adaptable, transposable genetic elements, which have been appended to other genes and utilized for many different cellular functions, depending on the ubiquitin-like protein’s identity, subcellular location, and method of covalent attachment. The post-translational ligation of proteins to members of the ubiquitin superfamily can signal many different fates for the target protein. Among ubiquitin-like proteins, the recent focus has been primarily on the ligation of ubiquitin, Nedd8,5 SUMO1,6 and ISG15.7 Post-translational protein modification with these tags serves different functions. Ubiquitin itself has previously been predicted to serve as a chaperonin-like folding or expression aid during translation, providing a nucleating folding template or perhaps a domain to impart thermodynamic stability. The domain is extremely robust, and in the * To whom correspondence should be addressed. Tel: 212-331-7840. Fax: 801-459-8850. E-mail: [email protected]. 10.1021/pr025522n CCC: $22.00

 2002 American Chemical Society

case of ubiquitin is capable of being purified natively from perchloric acid precipitations, heat treatments, alcohol denaturations, and organic solvent concentrations, while having various pH-dependent aqueous solubilities on the order of tens of milligrams/milliliter. We expect that some of these interesting characteristics will be shared by other members of the superfamily of proteins that house the domain. Because ubiquitin is one of the most highly conserved proteins known in eukaryotes,8 and because Nedd8, SUMO-1, and ISG15 (three ubiquitin paralogs) bear a strong sequence homology to ubiquitin, we hypothesized that exhaustive sequence-related database queries would be useful in uncovering new ubiquitin-like candidates for post-translational protein modifications. We therefore surveyed the public databases for related proteins. Parenthetical searches of structural databases9 revealed structurally similar proteins that were unrelated by sequence homology, but which converged adaptively toward the thermodynamically stable polypeptide backbone path of the “beta-grasp”10 structural fold. This study will not address these proteins (examples include IgG, Ubx, FERM, RBD, molybdopterin synthase11), since none of them have been found ligated to other proteins either. Much anecdotal and serendipitous evidence of ubiquitinlike proteins has been uncovered since the first description of a ubiquitin-like protein was reported in 1987, as the result of an anti-ubiquitin immunoblot cross-reactive band.12 Newer Journal of Proteome Research 2002, 1, 411-419

411

Published on Web 06/11/2002

research articles

Larsen and Wang

Table 1. Organization of the PCDB Contenta group

name

identity

1 2 3 4 5 6 7

ubiquitin superfamily conjugation factors deconjugation factors degradation machinery accessory factors conventional proteases substrates

Ub domains and tags used to modify other proteins enzymes responsible for the attachment of group 1 proteins enzymes which remove group 1 proteins proteins involved in hydrolyzing ubiquitin-like conjugates proteins necessary for the action of the extended ubiquitin system other enzymes required for hydrolytic destruction of proteins proteins ligated to ubiquitin domains of group 1

a A classification hierarchy of the extended ubiquitin system was generated with regard to molecular function. The seven organized groups are shown below; hundreds of subcategories within each group are not shown and will be reported elsewhere on release of the database. Group 1 is the subject of this study. Group 2 includes E1, E2 (ubiquitin conjugating enzymes), E3s, and other factors that catalyze the attachment of group 1 proteins. Group 3 includes proteases such as deubiquitinatng enzymes, deneddylation proteases, etc. Group 4 encompasses the 20S proteasome and its regulators such as the 19S/PA700 cap, 11S/PA28, et al. Group 5 includes transport factors, chaperonins, and other factors required to deliver modification substrates to ligation machinery. Group 6 involves classical proteases. Group 7 identifies the substrates of the modification of proteins by group 1 modifiers.

genomics efforts have uncovered raw sequence data, implying many ubiquitin-like proteins, and so our search of public databases for entries possessing similarity to ubiquitin revealed many related proteins. Regardless of the results of sequencebased approaches to outlining a ubiquitin superfamily, treatment of these data should occur within the understanding of the ubiquitin biochemistry itself; raw informatics tools without biochemical context are not as useful in uncovering new, potentially functionally related proteins. This report analyzes the superfamily with regard to function, because much of this knowledge is readily extractable from sequence data (see below). Several features of the ubiquitin superfamily made it both important and amenable to this type of bioinformatic summation. First, ubiquitin and its homologues are small, having only 76 residues. This makes multiple sequence alignments relatively easy to compute and view; thus, hundreds of orthologs can be analyzed in parallel. In other unrelated proteins having longer amino acid sequences and a multiplicity of domains, BLAST13 searches and other homology-driven approaches are diluted and confounded by foreign sequences that distract the investigator from the domain in question. Ironically, for this reason the small size of the ubiquitin domain has made some sequences elusive to database searches, when the domain is fused to other proteins (described below). These factors necessitate a method focused on exactly the domain in question. Second, the extensive biochemical outline of ubiquitin conjugation enzymology has made it possible to devise other genetic screens and functional assays for ubiquitin-like conjugation pathways and all the members of those families. Finally, the conservation of the ubiquitin domain has made identification of new members of the superfamily relatively straightforward, once they are finally fished out of the noise of an entire genome. One important feature of the ubiquitin superfamily involves the domain’s frequent genetic fusion to other translated sequences. The fusion of ubiquitin domains made the identification of proteins with this sequence feature difficult, since the homology score was diluted by the unrelated parts of the fusion protein. We therefore developed an iterative survey and refined it to include only the ubiquitin-like sequences themselves. After BLAST searches using mammalian ubiquitin, ClustalW14 alignments and tree dendrograms were used to define potential groups. Consensi of these groups (outgroups, Operational taxonomic units, or OTUs15) were generated, and newly identified sequences were used to re-BLAST the databases, defining newly related proteins once more. Each search iteratively utilized the data from sequences identified before, until finally a complete and stable picture of the superfamily 412

Journal of Proteome Research • Vol. 1, No. 5, 2002

was formed. The families were grouped and cropped. Here, we report the identification analysis of 22 families of ubiquitinlike proteins, with emphasis on the ubiquitin domain present in all of them. Cognia is an early stage bioinformatics company whose first in-house offering will be a database of protein catabolism machinery (PCDB).16 PCDBs content will be focused primarily on the functions of proteins that specify modifications of ubiquitin-dependent protein turnover. PCDB will also encompass all types of protein down-regulation, the ubiquitin system, its substrates, ubiquitin-like proteins, and other proteolytic cascades that govern drug target half-lives. As a general marketer and developer of bioinformatics products, Cognia creates expert-curated relational databases and tools based on primary literature and proprietary high-throughput data.

Results and Discussion Construction of our Protein Catabolism Database identified seven rational categories of proteins for its content. These categories are shown in Table 1. PCDB will house protein, complex, and interaction data regarding the relevant machinery of these systems. It will focus on the post-translational modification reactions that commit a protein to turnover, ubiquitylation and deubiquitylation enzymes, the proteasome, and all the paralogous ubiquitin-like pathways that have impinged on the ubiquitin system in form, homology, enzymology, metabolism, and function. Our initial analysis of nonredundant proteome databases for the presence of ubiquitin domains revealed over 1000 potential ubiquitin homologues. Using ubiquitin as the database search query, returned protein sequences were used as queries to iteratively search again, via a remote BLAST protocol. After several rounds, we aligned the resulting proteins using ClustalW, extracted the ubiquitin domains from each amino acid sequence, and realigned the domains. The resulting singledomain alignment was used to generate a phylogenetic tree of the superfamily, using Treeview. We identified 25 groups of proteins from this analysis, containing a ubiquitin-like sequence domain. Here, we define the ubiquitin domain functionally as those amino acids within the alignment between residue Q2 of mammalian ubiquitin (MQIFVKTL...) and the free C-terminus (LRLRGG). Because of the large number of proteins bearing homology to ubiquitin, we restricted our survey to those merely required to fully define the family groupings. A linear phenogram defining these families is shown in Figure 1. While Figure 1 intends to show the different families, Figure 1S (Supporting Information) shows the tree dendrogram in full, and Figure 2 shows interesting or confounded portions of the

The Ubiquitin Superfamily

research articles

Figure 1. Members of the ubiquitin superfamily. (A) Linear phenogram resulting from study of the ubiquitin domains only. Proteins closely related to human ubiquitin are shown at the top, while less related members are at the bottom. Lower Bar: 10% identity. Horizontal line distances between branch points reflect the degree of homology. Vertical bars or distances reflect the number of members found within public databases. The prototypic protein name is shown to the right of the groups, where the relevant sequences are circumscribed with a vertical bar. (B) Block diagrams of the protein consensi, using average eukaryotic transcript lengths for accurate proportions. Key: gray, ubiquitin domain; white, unrelated sequence extracted from the alignment analysis.

entire tree in Figure 1S (Supporting Information). Each figure is discussed in detail below. Members of the Ubiquitin Superfamily. We will define the “ubiquitin superfamily” as those proteins comprised entirely of a single ubiquitin domain and those with multiple Ub domains, ubiquitin-like proteins more than 10% identity to mammalian ubiquitin, and proteins from other parts of eukaryotic metabolism that utilize the domain internally or as

part of a processed proprotein where the domain is proteolytically removed. Whether the domain is ligated or not is also addressed. Not surprisingly, the two most ubiquitin-like proteins at the top of Figure 1, Nedd817 and ISG15,18 have already been shown to be ligated to other proteins, as is ubiquitin. The ligation targets of these families are different though, reflecting diverged and independent enzymatic conjugation and deconjugation machinery within the cell. For clarity, ubiquitin itself Journal of Proteome Research • Vol. 1, No. 5, 2002 413

research articles

Larsen and Wang

Figure 2. Close-up of some important families. (A) Ubiquitin and Nedd8. Because the radial phylogram is uninformative in regions of the highest homology, we have represented them here as a linear phenogram, where each protein is given its own line. The radial representation is inadequate due to the extreme similarity between proteins. (B) SUMO, (C) USPs, (D) Rad23, (E) Ubiquilin, (F) Parkin and GDX, (G) BAT3/Scythe and Dsk2. These groups were extracted and enlarged because of their disease importance, clustering, viewability concerns. The entire radial phylogram with all the domains can be easily viewed in Figure 1S (Supporting Information).

is presented in Figure 1 as a single domain, even though it is naturally present within eukaryotic genomes as three different types of linear genetic fusion, either to itself (Ub-c) or to other ribosomal proteins (Ub-a/b). Interestingly, many of the most evolutionarily conserved ubiquitin-like domains in our analysis are present as uncleavable fusion to other proteins, such as in parkin and ubiquilin, two proteins that may be important in understanding neuronal disorders. The function of the ubiquitin domains within these fusion proteins is unknown. Ubiquilin may contain multiple ubiquitin-like domains and is implicated in Alzheimers pathogenesis.19 In Rad23,20 parkin,21 and USP14,22 the freely soluble ubiquitin domain can be removed from the fusion partner and studied independently. Unlike the translation products of the bona fide ubiquitin genes, most other ubiquitin-like fusion 414

Journal of Proteome Research • Vol. 1, No. 5, 2002

proteins are not separated physiologically from their partner, the exception being Ubl-S30, a rat Fau/MNSF protein, which like the ubiquitin proproteins is a ribosomal fusion protein that is separated post-translationally by an endopeptiditic cleavage event.23 Other ubiquitin-like families possessing more than one ubiquitin domain, such as UCRP/ISG15,24 are not yet known to be cleaved between the ubiquitin domains by domainspecific endopeptidases. Distinct deconjugating activities for the whole protein exist within the cell for each of these, though. Ubiquitin can be removed from peptides or proteins via either ubiquitin C-terminal hydrolases25 or ubiquitin-specific proteases,26 Nedd-8 can be removed from its Cullin substrates by the COP9 signalosome,27 ISG15 is removed by USP18,28 and a family of SUMO-1 proteases has been identified.29 In general,

research articles

The Ubiquitin Superfamily Table 2. Features of Ubiquitin Domainsa protein family

ubiquitin Nedd8 Bat3 ubiquilin ISG15 Dsk2 Parkin Rad23 GDX Fat10 Fau elongin B Hub1 HCG1 URM1 USP APG12 OAS APG8 SUMO elongin C Bag

pseudonyms

Rub-1 Scythe Plic-1 UCRP Chap1 Ariadne HHR23 UbL4 MNSF, Fub1, UbL Rub2, UbL5 UbL3 UBP6, TGT6

SMT3x, Sentrin

% UbL

% cons

term

class

conserved lysines

functional notes

100 49 41 40 34 33 32 27 27 26 25 22 20 18 18 18 18 17 14 12 11 10

99 79 93 72 65 37 92 43 44 51 57 67 82 57 52 52 50 54 55 80 43 14

GG GG PQ NR GG NR RK KA EK KP GG AD YY LP GG CG WG KK DD GG FL QG

I I II II I II II II II I II II I II I II I II II I II II

6, 27, 29,48,63 4,6,11,27, 33, 48 22,39,56,70,80 62, 82 29 NA 27,48 28 NA 70 NA 28 12,13,28,29,46 NA NA 30 NA 240, 242, 249 NA 25, 46 71 NA

protein degradation Cullin modification apoptosis neuronal inclusions interferon induced spindle pole body ubiquitin ligase UV resistance ? cell cycle checkpoint immune regulation DNA replication cell polarity ? ? deubiquitination autophagy 2′-5′-oligoA synthetase autophagy localizations DNA replication anti-apoptotic

a Representative members of the ubiquitin superfamily are displayed in column 1. Column 3 (% UbL) defines the percent identity to mammalian ubiquitin. Column 4 (% Cons) is the percent identity to the human representative and is used as a measure of conservation. Column 5, “term” identifies the C-terminal residues present at the end of the aligned ubiquitin domain in the human copy. Small C-terminal residues (e.g., GG) are characteristic of the possible reversal of ubiquitin-like ligation by proteases such as the ubiquitin specific proteases “USP”. This feature defines the two classes of ubiquitin-like domains (I, II), column 6, which refer to whether the protein tag is post-translationally ligated (class I) or not (class II) to target proteins. Column 7 shows the completely conserved lysines that are expected to be used in isopeptide bonds if the ubiquitin domain is built into homopolymers. Heteropolymers of mixtures of ubiquitinlike domains have not been described. Column 8 describes some cellular processes in which the superfamily member has a role.

each of the different conjugation families also possesses its own conjugation enzymes, which utilize parallel pathways. The enzymes of in these conjugation pathways have been identified for ubiquitin,30 SUMO-1,31 SUMO-2 and -3,32 and APG8 and -1233 (see below). Because their conjugating and deconjugating activities may not be specific to homologous enzyme families, assignment of roles within evolutionarily related proteins is not possible by inspection of the sequence alone. For example, SUMO-1 peptide conjugates can be cleaved by UCH activities and conjugated by UBC9, previously thought to be used in ubiquitin conjugation, so we predict that the enzymes that process these tags may come from any number of distinct protein families within the extended ubiquitin system. Similarly, other processing activities are likely to come from within the already described families of proteins that were assumed to act solely on ubiquitin. Analysis of the bottom portion of Figure 1 is difficult. This portion focuses on those proteins with around 10% identity with the identical mammalian ubiquitins, which were used as the standard for this study. In the lower region of Figure 1, the distance (time) since branching divergence is notably large, as depicted by the long time since divergence and little homology. Arguably, these less related proteins were only artifactually identified by the robust capability of ClustalW to assemble juxtapositions between two amino acids in an alignment. Thus, they represent only double the signal-to-noise level of randomly aligned proteins and are so the rational cutoff for our analysis. They may be recognized only as evolutionary cousins, arising from pathways utilizing similar conjugation enzymology (APG, URM1). The reasons for a disrupted region of ubiquitin homology within the OAS, HCG-1, elongin, or the various BAG families is unknown. In passing, we note a stronger homology of the OAS proteins to ISG15. Finally, we identified several unique flaviviral ubiquitin domains as well. These enormous proteins form a class of proteins involved in mammalian gastrointestinal upset.34 Perhaps the signal transduction prop-

erties of this domain has been co-opted for other purposes by these adventitious viruses, just as other cellular polymerases have been modified by retro-, papilloma-, and herpes-viruses. In addition to viral ubiquitin, it is remarkable that ubiquitin domains are found within other parts of the ubiquitin system itself! Within the eukaryotic PCDB classification hierarchy (Figure 1), the deconjugation activities of group 3 includes ubiquitin domains in the ubiquitin specific proteases. Here, the domain may serve only to target the hydrolase to the 26S proteasome or other ubiquitin receptors, but the function is unknown. PCDB’s group 2, the conjugation machinery, also contains a family with ubiquitin domains. Parkin is a ubiquitin ligase responsible for specific features of Parkinson’s disease and possesses a ubiquitin-like region of unknown function. Features of Ubiquitin-Like Proteins. The ubiquitin domain is present almost exclusively within proteins as a N-terminal fusion. Only within putative viral protein products, and the loosely related BAG family of proteins, does the domain occur on the interior of the fusion protein as an insertion, and we find that these alignments imply fractured domains, interrupted “intron-style” by other parts of the host sequence. In most instances, the ubiquitin domain is presumably appended structurally to the host fusion protein as a stand-alone-folding unit, separate from the body of the rest of the protein. The domain can be removed with interesting effect.35 Nearby sequence elements may effect the solvent-accessible parts of the ubiquitin-like protein, or may alter its folding path, but we consider it to be an independent structural element, responding to structural evolutionary pressures on its own. We note a broad variability of function attributed to the various members of the superfamily. Table 2 shows more defined features and functional relations within the ubiquitin superfamily members. The family members in the table are defined from top to bottom in terms of their identity to mammalian ubiquitin (%UbL). Our cutoff in this study was at 10% identity. In terms of conservation (%Con), there is no Journal of Proteome Research • Vol. 1, No. 5, 2002 415

research articles correlation between the similarity to ubiquitin and the similarity within the family itself. This implies that the importance of the domain is unrelated to its resemblance to ubiquitin. In fact, several proteins of notable importance have little resemblance to ubiquitin, and are nearly as well conserved. We predict that all domains ending with a glycine, except those of the UBP and BAG family, will be found attached to other proteins in the cell, and subsequently removed by deconjugation activities. The small glycine is necessary, if deconjugating protease active sites are to overcome steric problems accessing the isopeptide bond of the target protein’s N- lysine linkage. In the case of human UBP6, the ubiquitinlike domain is present (oddly) as a fusion to another protein that has a ubiquitin binding site internally. One candidate for conjugation revealed by this study then is Fau, or MNSF. In the rat, this protein is in fact present as an R-linked proprotein, which is removed from the S30 fusion, though it is not processed by UCH-L1 or UCH-L3 (Larsen, C. N. Unpublished results). Whether the resulting free Ubl domain is proteolyzed or attached to other proteins remains a mystery. Only one protein without the gly gly c-terminus has been found post-translationally attached to other proteins. This is the yeast Hub1.36 We do not predict that the attachment of Hub1 through a YY-isopeptide bond is cleavable, but if so it would require unique chymotrypsin-like hydrophobic elements in the protease binding sites. Other deconjugating activities such as isopeptidase T37 or the mammalian UCHs38 have strict requirement for the small residues at this terminus, and binding affinities are reduced by several orders of magnitude if even an alanine is substituted at these positions. Conceivably, a point mutation in the binding site of one of the UBPs would be sufficient to change this hydrolytic specificity, and thus, the Hub1 specific protease, if it exists, could be hiding in the relatively uncharacterized UBP family. Many of the ubiquitin-like domains possess completely conserved lysines. This implies that they may be ligatable into polymeric chains, as is ubiquitin through its lys 6, 29, 48, and 63 side chains.39 Amide polymer chemistry requires two chemical functionalities, one being the C-terminal carboxylate, the other being a single lysine amino group. Additionally, these proteins may also serve as nuclei for the growing chain of traditional polyubiquitin chains. Candidates for ubiquitin domain nucleation/polymerization include Ub, SUMO2 and -3, Nedd8, Bat3, Ubiquilin, ISG15, Parkin, FAT10, and others shown. Branching of the polymer requires two or more conserved lysines, and this can be found in the table as well. Examples of “branched chain” ubiquitin polymers have not yet been described in the literature. Of particular interest in this regard is the recent work being done by Peng and Gygi, where 6-His tagged ubiquitin polymers are used to purify conjugates from the yeast proteome. After the conjugates are separated by cation exchange chromotography, reverse phase nanoscale HPLC, and tandem mass spectrometry, the sequences of thousands of isopeptide-linked peptides are revealed, with their exact linkage position in the protein. Ubiquitin “signature sequences” reveal a presence of variable isopeptide linkages that could indicate branching. This method will prove highly useful in the search for conjugates of any of the other ubiquitinlike modifications we postulate here.

Conclusions In conclusion, this study identifies 22 separate families of ubiquitin-like proteins. Having varying function, ligatability, 416

Journal of Proteome Research • Vol. 1, No. 5, 2002

Larsen and Wang

effect, and result on the signal transduction pathways, they represent an interesting evolutionary, structural, and functional group of protein domains for study. Even the most well studied of these proteins (Ub, Nedd8, SUMO-1, ISG15, and Hub1) have functions that are not well understood. Many of the domains reported have not yet been reported or described in any detail as ubiquitin-like proteins, such as GDX, HCG-1, OAS, and Bag1. The functions of these domains spans structural roles, signal transduction effects, complex activation, and cell development. Our study addresses for the first time all of the related members of the ubiquitin superfamily, from a protein informatics view. Knowing the members of this family has already shed light on the methods used to study every ubiquitin-like protein that was discovered after ubiquitin, and, hopefully, these and other methods can be used on all the proteins described herein as well.

Experimental Methods Protein sequences were retrieved from public databases via the Internet. Protein databases queried included Swissprot,40 PIR,41 and Genbank42 translations, among others. We also included all sequences derived from the three-dimensional structures of the Brookhaven Protein Data Bank, the PRF, the yeast (Saccharomyces cerevisiae) genomic CDS translations, the Berkeley/Celera Drosophila genome project, and the patent division of NCBI GenPept. Though Escherichia coli sequences were used in the search, they were incidentally screened. No bacterial sequences bore enough homology to appear in the study; thus, all proteins reported were eukaryotic. Ubiquitin, a protein conserved completely in mammals, was used as the standard in this study. This protein sequence of 76 amino acids was used to query NCBI with BLAST searches, using the default search parameters. Proteins revealed by these searches were inspected down to a “hit” score of 40, and homologous ubiquitin domains present were extracted from the protein sequence, for use in BLASTing the database once again for relevant hits. All returned sequences were then related back to ubiquitin for homology, by multiple sequence alignment at the Baylor center, using ClustalW on default parameters. After many iterations, proteins having less than 10% homology within the ubiquitin-like domain were discarded from analysis and considered to be spurious artifacts of the alignment process. Extraction of ubiquitin domains from all returned proteins was followed by a compilation of these proteins in spreadsheets (Table 3). A text document containing all domain sequences was generated in FASTA format on a 900 MHz PowerMac G4 using Simpletext, and the resulting file was used to run a computation of the single, global alignment, using a locally compiled ClustalW, version 1.5. Analysis of 255 candidate sequences with regard to each other (255255 alignments) took 14 min, or approximately 4.79 teraflop. Output of the Clustal search were then generated as the following files: Superfamily.msf, Superfamily.aln, and Superfamily.dnd, denoting the multiple sequence format file, the raw alignment, and the dendrogram, used to generate graphical views of the process. Approximately one-fifth of the candidate proteins were discarded as irrelevant. We used Treeview43 to generate phylogenies of the ubiquitin OTUs. Radial, linear, and branched phenograms, cladograms, and phylograms were created and inspected, and we chose linear phenograms as being the most informative for this analysis. Output of the phenogram was coupled to block

research articles

The Ubiquitin Superfamily Table 3. Protein Sequences Used in This Studya protein

Ub Ub Ub Ub Ub Ub Ub Ub Ub Ub Ub Ub Ub Ub NEDD8 NEDD8 Rub1/NEDD8 NEDD8 UCRP UCRP UCRP UBL-4/GdX UBl4 GDX parkin I parkin II parkin III parkin Dsk2 Dsk2 BAT3 HLA-B3 BAT3 Scythe ubiquilin-1 ubiquilin-2 ubiquilin-3 HHR23a Rad23b Rad23 Rad23 Rad23b MHR23a DHR23 Hub1 HUB1 UBL5 FAT10 FAT10 FAT10 FAT10 UBP6 Usp14 Ubpc TGT6 USP14 USP14 USP7 USP6

species

accession no.

protein

species

ubiquitin Homo sapiens BAA23632 Ub Plasmodium falciparum Arabidopsis thaliana NM_116090 Ub Aglaothamnion neglectum Rattus norvegicus BAA03983 Ub Cavia porcellus Drosophila melanogaster AAA29001 Ub Ovis aries Bos taurus Q28169 Ub Artemia franciscana Bombyx mori AB021972 Ub Oncorhynchus mykiss Saccharomyces cerevisiae Q07188 Ub Mus musculus Schistosoma mansoni AAD02414 Ub Cricetulus griseus G. cydonium S43306 Ub Gallus gallus Zea mays Q41751 Ub Mus musculus Pisum sativa AAD03344 Ub Caenorhabditis elegans Suberites domuncula CAA76578 Ub Xenopus laevis Glycine max P03993 Ub Saccharomyces cerevisiae Nicotiniana sylvestris S28420 neural precursor cell expressed developmentally and down-regulated 8 Homo sapiens Q15843 NEDD8 Caenorhabditis elegans Mus musculus P29595 NEDD8 Arabidopsis thaliana Saccharomyces cerevisiae S51867 NEDD8 Schizosaccharomyces pombe Drosophila melanogaster CG10679 NEDD8 Bos taurus interferon-stimulated gene product 15/ubiquitin cross-reactive protein Homo sapiens P05161 UCRP Bos taurus Ovis aries AF152103 UCRP Carassius auratus Mus musculus Q64339 genomic DNA-X Homo sapiens P11441 GDX-l Mus musculus Homo sapiens AK000405 GDX-like Saccharomyces cerevisiae Mus musculus P21126 parkin Homo sapiens XP_011437 parkin Mus musculus Homo sapiens NP_054642 parkin2 Mus musculus Homo sapiens NP_054643 parkin Mus musculus Rattus norvegicus NP_064478 Dsk2 Saccharomyces cerevisiae NP_014003 CHAP1-dsk2 Homo sapiens Schizosaccharomyces pombe T38404 HLA-B-associated transcript 3 Homo sapiens XP_004176 Bat3/scythe Drosophila melanogaster Rattus norvegicus NP_446061 ubl-misc Arabidopsis thaliana Mus musculus AAC82479 BAT2 Homo sapiens Xenopus laevis T30561 ubiquilin Homo sapiens NP_038466 ubiquilin Mus musculus Homo sapiens XP_010227 ubiquilin Bos taurus Homo sapiens XP_006071 Rad23 Homo sapiens XP_005553 Rad23 Schizosaccharomyces pombe Homo sapiens XP_009054 Rad23 Lycopersicon esculentum Saccharomyces cerevisiae NP_010877 Rad23-II Daucus carota Arabidopsis thaliana NP_198663 Rad23-I Daucus carota Mus musculus NP_033037 Rad23 Oryza sativa Mus musculus NP_033036 rcb Dictyostelium discoideum Drosophila melanogaster AAF59352 homology to ubiquitin-1 Saccharomyces cerevisiae NP_014430 Hub1 Arabidopsis thaliana Schizosaccharomyces pombe T40200 Hub1 Drosophila melanogaster Homo sapiens NP_077268.1 Hub1 Ceanorhabditis elegans FAT10 Homo sapiens AF123050_1 UBlike border virus Mus musculus AAG27477 Ubl-polyprotein reindeer virus Rattus norvegicus CAC42833 UBlike cow virus Arabidopsis thaliana S55243 Ubl Pestivirus ubiquitin-specific proteases Saccharomyces cerevisiae NP_116665 UCH Coccidioides immitis Drosophila melanogaster AAF52908 UBP6 Oryza sativa Schizosaccharomyces pombe Q92353 UBP6 Aspergillus niger Oryctolagus cuniculus P40826 TGT Caenorhabditis elegans Mus musculus BAB27544 unknown 3 Arabidopsis thaliana Homo sapiens NP_005142 unknown 2 Trypanosome brucei Arabidopsis thaliana NP_566680 unknown 1 Drosophila melanogaster Arabidopsis thaliana AF302660_1

accession no.

Q26029 P42740 BAA11842 AAG48523 CAA52417 AAK51460 NP_035794 Q60507 P79781 S12583 P14792 AAA49978 U74318

T22249 AC007654 NC_003423 AAF73911 AAB57687 AAF17609

AK006408 CAA88151

AAG13891 AAG13892 BAA82404

AF189009_1

CG7546 NM_124899 P48634

NP_062382 AAK61367

CAA21170 CAB65692 CAA72742 CAA72741 AAB65841 AAD17913

NP_190104.1 AE003790 U88173 AAB60887 unpublished AAC77906 G567767 AF288062_1 unpublished unpublished T19227 NP_180571 unpublished AAD34759

Journal of Proteome Research • Vol. 1, No. 5, 2002 417

research articles

Larsen and Wang

Table 3 (Continued) protein

S30 fusion MNSF-β.UBL-S30 cDNA FUB1 UblS30/cDNA FAU Ubl Fau-alt4 SUMO-1 SUMO-1 SUMO-1 SUMO-1 SUMO-1 SUMO-1 SUMO-1 SUMO-1 SUMO-1 URM1 URM1 Urm1 Ubl-3 HCG-1 HCG-1l APG12 APG12 APG12

elongin B elongin B elongin B-like elongin B APG8 AUT2/APG8 TRIP14 OAS p54_OASL OAS_A Bag1 Bag1 Bag1 variant? Bag4 Bag5 Baqg1a Bag1b Bag1

species

accession no.

protein

Fau AF400225_1 Fau-alt3 P35545 fau-alt2 I48346 Fau P35545 S30 fusion A47416 arsenite resistance protein A47416 Fau AAB52915 Ubl-S30/fau AAF54550 small ubiquitin-related modifiers Homo sapiens AAC50996 SMT3a Schizosaccharomyces cervisiae O13351 SMT3b Xenopus laevis CAB09801 SMT3H2 Mus musculus BAB30417 SMT3H1 Cervus nippon AF242526 SMT3b Onchorhynchus mykiss AB036430 SMT3b Drosophila melanogaster LD07775 SMT3 Bos taurus U89439 SMT3 Caenorhabditis elegans P55853 SMT3 ubiquitin-related modifier-1 Mus musculus BAB31673 URM1 Neurospora crassa CAC28714 URM1 Homo sapiens NP_112176 URM1 human chorionic gonadotrophin-like Homo sapiens XP_007136 HCG-1l Homo sapiens AAD02323 Hcg1l Mus musculus AAD02325 Hcg1l autophagy-12 Homo sapiens AAH12266 APG12 Saccharomyces cerevisiae NP_009776 APG12 Mus musculus BAB62092 APG12 elongin B/C Drosophila melanogaster AAG22163 elongin C Homo sapiens NP_009039 elongin B Homo sapiens BC013306 elongin B/RNA pol II/p18 Caenorhabditis briggsae AC084453 elongin B autophagy-8 Homo sapiens CAC43939 APG8-like Saccharomyces cerevisiae YNL223W APG8-like 2′-5′ oligoadenylate synthetase Homo sapiens Q15646 OAS_B Mus musculus BAB26655 OAS3 Mus musculus Q9Z2F2 OAS1 Gallus gallus BAB19016 OAS_B Bcl2-associated athanogenes Mus musculus BC003722 Bag1 Homo sapiens XP_036229 F57B10.11 Homo sapiens AF241726 H14N18.1 Homo sapiens NP_004865 Bag3 Homo sapiens NP_004864 Bag2 Schizosaccharomyces pombe CAA19031 F12F1.7 Schizosaccharomyces pombe T40519 Bag-1 Toxoplasma gondii S69797 Spodoptera frugiperda Mus musculus Arabidopsis thaliana Mus musculus Rattus norvegicus Rattus rattus Sus scrofa Drosophila melanogaster

species

accession no.

Drosophila melanogaster Drosophila melanogaster Drosophila melanogaster Ictalurus punctatus Cricetulus griseus Caenorhabditis elegans Homo sapiens

AAF54552 AAF54551 AAF54549 AF402841_1 Q60435 AAB37076 JC1278

Homo sapiens Bos Taurus Homo sapiens Mus musculus Rattus norvegicus Sus scrofa Arabidopsis thaliana Cicer arietinum Oryza sativa

P55854 P55855 NP_008868 L008868 L79949 L77617 P55852 T09529 P55857

Saccharomyces cerevisiae NP_012258 Sus scrofa AAK40623 Schizosaccharomyces pombe CAB94946 Drosophila melanogaster Arabidopsis thaliana Caenorhabditis elegans

AF044219 NP_177910 unpublished

Schizosaccharomyces pombe T50108 Drosophila melanogaster AE003542 AE002602 Ceanorhabditis elegans B0336.8 Homo sapiens Caenorhabditis elegans Mus musculus Rattus norvegicus

NP_005639 AF326942_1 AAH09104 NP_112391

Schizosaccharomyces pombe CAC00556 Homo sapiens T12492 Gallus gallus Homo sapiens Sus scrofa Gallus gallus

BAA19575 BAB18647 CAA12397 BAA19575

Brassica napus Caenorhabditis elegans Caenorhabditis elegans Homo sapiens Homo sapiens Arabidopsis thaliana Homo sapiens

AAA32985 T32746 T33308 O95817 O95816 AAC17606 XP_036229

a The accession number represents the NCBI Entrez searchable term. Discarded sequences are not shown; protein name is standardized to its group, when two or more names for the same polypeptide sequence exist in the literature. Subclassifed blocks of sequences in the table are grouped according to their position in Figure 1, as defined by one taxonomic group per vertical bar.

diagrams of the proteins using Canvas 7 (Deneba Corp., USA), and tables depicting other features of the families were appended as aids to understanding. The output of this study is being used to populate the first subgroup of PCDB proteins involved in the extended ubiquitin system within our proprietary SQL/Java database, now being developed for release in early 2003.

Supporting Information Available: Radial phylogram of the ubiquitin superfamily. This material is available free of charge via the Internet at http://pubs.acs.org. 418

Journal of Proteome Research • Vol. 1, No. 5, 2002

References (1) Sorenson, J. M.; Head-Gordon, T. Toward minimalist models of larger proteins: A ubiquitin-like protein. Proteins 2002, 46, 368379. (2) Finley, D. J. Signal transduction. An alternative to destruction. Nature 2001, 412, 283, 285-286. (3) Gregori, L.; Poosch, M. S.; Cousins, G.; Chau, V. A uniform isopeptide-linked multiubiquitin chain is sufficient to target substrate for degradation in ubiquitin-mediated proteolysis. J. Biol. Chem. 1990, 265, 8354-8357. (4) Hofmann, R. M.; Pickart, C. M. In vitro assembly and recognition of Lys-63 polyubiquitin chains. J. Biol. Chem. 2001, 276, 2793627943.

research articles

The Ubiquitin Superfamily (5) Kurz, T.; Pintard, L.; Willis, J. H.; Hamill, D. R.; Gonczy, P.; Peter, M.; Bowerman, B. Cytoskeletal regulation by the nedd8 ubiquitinlike protein modification pathway. Science 2002, 295, 294-298. (6) Hochstrasser, M. SP-RING for SUMO: New functions bloom for a ubiquitin-like protein. Cell 2001, 107, 5-8. (7) Malakhov, O. A.; Malakhov, M. P.; Hetherington, C. J.; Zhang, D. E. Lipopolysaccharide activates the expression of ISG15 specific proteasesUBP43 via interferon regulatory factor 3. J. Biol. Chem. 2002, in press. (8) Redman, K. L.; Rechsteiner, M. Extended reading frame of a ubiquitin gene encodes a stable, conserved, basic protein. J. Biol. Chem. 1988, 263, 4926-4931. (9) Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. The Protein Databank. Nucleic Acids Res. 2000, 28, 235-242. (10) Haan, C.; Is’harc, H.; Hermanns, H. M.; Schmitz-Van De Leur, H.; Kerr, I. M.; Heinrich, P. C.; Grotzinger, J.; Behrmann, I. Mapping of a region within the N terminus of Jak1 involved in cytokine receptor interaction. J. Biol. Chem. 2001, 276, 3745137458. (11) Rudolph, M. J.; Wuebbens, M. M.; Rajagopalan, K. V.; Schindelin, H. Crystal structure of molybdopterin synthase and its evolutionary relationship to ubiquitin activation. Nat. Struct. Biol. 2001, 8, 42-46. (12) Haas, A. L.; Ahrens, P.; Bright, P. M.; Ankel, H. Interferon induces a 15-kilodalton protein exhibiting marked homology to ubiquitin. J. Biol. Chem. 1987, 262, 11315-11323. (13) Altschul, S. F.; Gish, W.; Miller, W.; Myers, E. W.; Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403410. (14) Thompson, J. D.; Higgins, D. G.; Gibson, T. J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22, 46734680. (15) Clewley, J. P. A user’s guide to producing and interpreting tree diagrams in taxonomy and phylogenetics. Communic. Dis. Public Health 1998, 1, 64-66. (16) Cognia is a registered trademark of the Cognia Corporation. February 2002 URL: http://www.cognia.com. (17) Freed, E.; Lacey, K. R.; Huie, P.; Lyapina, S. A.; Deshaies, R. J.; Stearns, T.; Jackson, P. K. Components of an SCF ubiquitin ligase localize to the centrosome and regulate the centrosome duplication cycle. Genes Dev. 1999, 13, 2242-2257. (18) Hamerman, J. A.; Hayashi, F.; Schroeder, L. A.; Gygi, S. P.; Haas, A. L.; Hampson, L.; Coughlin, P.; Aebersold, R.; Aderem, A. Serpin 2a is induced in activated macrophages and conjugates to a ubiquitin homolog. J. Immunol. 2002, 168, 2415-2423. (19) Mah, A. L.; Perry, G.; Smith, M. A.; Monteiro, M. J. Identification of ubiquilin, a novel presenilin interactor that increases presenilin protein accumulation. J. Cell Biol. 2000, 151, 847-862. (20) Schauber, C.; Chen, L.; Tongaonkar, P.; Vega, I.; Lambertson, D.; Potts, W.; Madura, K. Rad23 links DNA repair to the ubiquitin/ proteasome pathway. Nature 1998, 391, 715. (21) Mizuno, Y.; Hattori, N.; Mori, H.; Suzuki, T.; Tanaka, K. Parkin and Parkinson’s disease. Curr. Opin. Neurol. 2001, 4, 477-482. (22) Borodovsky, A.; Kessler, B. M.; Casagrande, R.; Overkleeft, H. S.; Wilkinson, K. D.; Ploegh, H. L. A novel active site-directed probe specific for deubiquitylating enzymes reveals proteasome association of USP14. EMBO J. 20, 5187-5196. (23) Olvera, J.; Wool, I. G. The carboxyl extension of a ubiquitin-like protein is rat ribosomal protein S30. J. Biol. Chem. 1993, 268, 17967-17974. (24) Potter, J. L.; Narasimhan, J.; Mende-Mueller, L.; Haas, A. L. Precursor processing of pro-ISG15/UCRP, an interferon-betainduced ubiquitin-like protein. J. Biol. Chem. 1999, 274, 2506125068.

(25) Larsen, C. N.; Krantz, B. A.; Wilkinson, K. D. Substrate specificity of deubiquitinating enzymes, ubiquitin C-terminal hydrolases. Biochemistry 1998, 37, 3358-3368. (26) Lin, H.; Yin, L.; Reid, J.; Wilkinson, K. D.; Wing, S. S. Divergent N-terminal sequences of a deubiquitinating enzyme modulate substrate specificity. J. Biol. Chem. 2001, 276, 20357-20363. (27) Lyapina, S.; Cope, G.; Shevchenko, A.; Serino, G.; Tsuge, T.; Zhou, C.; Wolf, D. A.; Wei, N.; Shevchenko, A.; Deshaies, R. J. Promotion of NEDD-CUL1 conjugate cleavage by COP9 signalosome. Science 2001, 292, 1382-1385. (28) Malakhov, M. P.; Malakhova, O. A.; Kim, K. I.; Ritchie, K. J.; Zhang, D. E. UBP43 (USP18) specifically removes ISG15 from conjugated proteins. J. Biol. Chem. 2002, 277, 9976-9981. (29) Gong, L.; Kamitani, T.; Millas, S.; Yeh, E. T. Identification of a novel isopeptidase with dual specificity for ubiquitin- and NEDD8-conjugated proteins. J. Biol. Chem. 2000, 275, 1421214216. (30) Hershko, A.; Ciechanover, A. The ubiquitin system. Annu. Rev. Biochem. 1998, 67, 425-479. (31) Lin, D.; Tatham, M. H.; Yu, B.; Kim, S.; Hay, R. T.; Chen, Y. Identification of a substrate recognition site on Ubc9. J. Biol. Chem. 2002, in press. (32) Tatham, M. H.; Jaffray, E.; Vaughan, O. A.; Desterro, J. M.; Botting, C. H.; Naismith, J. H.; Hay, R. T. Polymeric chains of SUMO-2 and SUMO-3 are conjugated to protein substrates by SAE1/SAE2 and Ubc9. J. Biol. Chem. 2001, 276, 35368-35374. (33) Ohsumi, Y. Molecular mechanism of autophagy in yeast, Saccharomyces cerevisiae. Philos. Trans. R. Soc. London B: Biol. Sci. 1999, 354, 1577-1580; discussion 1580-1581. (34) Becher, P.; Orlich, M.; Thiel, H. J. Ribosomal S27a coding sequences upstream of ubiquitin coding sequences in the genome of a pestivirus. J. Virol. 1989, 72 8697-8704. (35) Chen, L.; Shinde, U.; Ortolan, T. G.; Madura, K. Ubiquitinassociated (UBA) domains in Rad23 bind ubiquitin and promote inhibition of multi-ubiquitin chain assembly. EMBO Rep. 2001, 2, 933-938. (36) Dittmar, G. A. G.; Wilkinson, C. R. M.; Jedrzejewski, P. T.; Finley, D. J. Role of a Ubiquitin-Like Modification in Polarized Morphogenesis. Science 2002 (#1069989 in press). (37) Wilkinson, K. D.; Tashayev, V. L.; O’Connor, L. B.; Larsen, C. N.; Kasperek, E.; Pickart, C. M. Metabolism of the polyubiquitin degradation signal, structure, mechanism, and role of isopeptidase T. Biochemistry 1995, 34, 14535-14546. (38) Johnston, S. C.; Riddle, S. M.; Cohen, R. E.; Hill, C. P. Structural basis for the specificity of ubiquitin C-terminal hydrolases. EMBO J. 1999, 18, 3877-3887. (39) Pickart, C. M. Mechanisms underlying ubiquitination. Annu. Rev. Biochem. 2001, 70, 503-533. (40) Gasteiger, E.; Jung, E.; Bairoch, A. SWISS-PROT, Connecting biological knowledge via a protein database. Curr. Issues Mol. Biol. 2001, 3, 47-55. (41) Wu, C. H.; Huang, H.; Arminski, L.; Castro-Alvear, J.; Chen, Y.; Hu, Z. Z.; Ledley, R. S.; Lewis, K. C.; Mewes, H. W.; Orcutt, B. C.; Suzek, B. E.; Tsugita, A.; Vinayaka, C. R.; Yeh, L. S. L.; Zhang, J.; Barker. W. C. The protein informatics resource, an integrated public resource of functional annotation of proteins. Nucleic Acids Res. 2002, 30, 35-37. (42) Benson, D. A.; Karsch-Mizrachi, I.; Lipman, D. J.; Ostell, J.; Rapp, B. A.; Wheeler, D. L. GenBank. Nucleic Acids Res. 2002, 30, 1720. (43) Page, R. D. M. TREEVIEW, An application to display phylogenetic trees on personal computers. Comput. Appl. Biosci. 1996, 12, 357-358.

PR025522N

Journal of Proteome Research • Vol. 1, No. 5, 2002 419