The Functions of Sco Proteins from Genome-Based Analysis

Symbiobacterium thermophilum IAM 14863, Bacteria, Actinobacteria, 1, 1 ..... theme based on the insertion of a metal within a redox-active protein fra...
1 downloads 0 Views 625KB Size
The Functions of Sco Proteins from Genome-Based Analysis Lucia Banci, Ivano Bertini,* Gabriele Cavallaro, and Antonio Rosato Magnetic Resonance Center (CERM), University of Florence, Via L. Sacconi 6, 50019 Sesto Fiorentino, Italy and Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy Received October 12, 2006

Sco proteins are widespread proteins found in eukaryotic as well as in many prokaryotic organisms. The 3D structure of representatives from human, yeast, and Bacillus subtilis has been determined, showing a thioredoxin-like fold. Sco proteins have been implicated mainly as copper transporters involved in the assembly of the CuA cofactor in cytochrome c oxidase. Some mutations have been identified in humans that lead to defective cytochrome c oxidase formation and thus to fatal illnesses. However, it appears that the physiological function of Sco proteins goes beyond assembly of the CuA cofactor. Extensive analysis of completely sequenced prokaryotic genomes reveals that 18% of them contain either Sco proteins but not CuA-containing proteins or vice versa. In addition, in several cases, multiple Sco-encoding genes occur even if only a single potential Sco target is encoded in the genome. Genomic context analysis indeed points to a more general role for Sco proteins in copper transport, also to copper enzymes lacking a CuA cofactor. To obtain further insight into the possible role of Sco in the assembly of other cofactors, a search for Cox11 proteins, which are important for CuB biosynthesis, was also performed. A general framework for the action of Sco proteins is proposed, based on the hypothesis that they can couple metal transport and thiol/disulfide-based oxidoreductase activity, as well as select between either of these two cellular functions. This model reconciles the variety of experimental observations made on these proteins over the years, and can constitute a basis for further studies. Keywords: Sco • cytochrome c oxidase assembly • CuA • copper transport • redox protection

Introduction Sco proteins have been first identified in yeast as essential gene products for accumulation of the mitochondrially synthesized cytochrome c oxidase (COX).1 The hypothesis that Sco proteins could play a role in copper delivery to COX was formulated in 1996 by Tzagoloff and co-workers, based on the observation that overexpression of yeast Sco1 was able to rescue respiratory deficiency in yeast mutants lacking the copper metallochaperone Cox17.2 Subsequent studies reinforced the view that Sco proteins receive copper ions from Cox17 and then deliver them to COX subunit II (COX2), to achieve assembly of the CuA site.3-5 Indeed, it has been proposed that the functionality of yeast Sco1 in vivo depends on copper(I) binding through two cysteines present in a conserved CXXXC motif and a conserved histidine.6 Despite the considerable efforts spent to integrate the large amount of experimental data obtained over the last years into a consistent picture, the mechanism by which Sco would accomplish copper delivery to COX in mitochondria is still obscure in many respects. Humans have two Sco proteins that are both essential, because mutations in either one lead to * To whom correspondence should be addressed. Prof. Ivano Bertini, Magnetic Resonance Center, University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino (Italy). Tel.: +39 055 4574272. Fax: +39 055 4574271. E-mail: [email protected].

1568

Journal of Proteome Research 2007, 6, 1568-1579

Published on Web 02/15/2007

tissue-specific COX deficiencies associated with different, fatal clinical phenotypes.7-13 In 2004, Shoubridge and co-workers proposed that the human Sco proteins, which they suggested to function as homodimers, have non-overlapping, cooperative functions in COX assembly.14 Recently, a Sco protein has been identified in mice and human as the mediator of p53 regulation of mitochondrial respiration, providing a possible explanation for the metabolic switch from aerobiosis to glycolisis exhibited by cancer cells (a phenomenon known as the Warburg effect).15 Sco proteins are found also in prokaryotic organisms.16 It is generally assumed that the functioning of Sco proteins in COX maturation is conserved among prokaryotes and eukaryotes by virtue of the similarity between mitochondrial and prokaryotic respiratory chains,17,18 ultimately due to the endosymbiotic bacterial origin of mitochondria.19 A considerable amount of data relative to bacterial Sco proteins is available for Bacillus subtilis, whose single Sco homologue (called YpmQ or BsSco) was the first structurally characterized member of this class of proteins, showing a thioredoxin fold.20 The 3D structures of the homologues from human21,22 and yeast23 are also available. Deletion of the ypmQ gene in B. subtilis results in COX deficiency without affecting the activity of the alternative respiratory enzyme menaquinol oxidase, that contains a CuB but not a CuA center.24 It was thus proposed that YpmQ, similarly to its mitochondrial counterparts, is involved in the 10.1021/pr060538p CCC: $37.00

 2007 American Chemical Society

Functions of Sco Proteins from Genome-Based Analysis

assembly of the COX CuA site.24 The analogy of YmpQ to eukaryotic Sco proteins was reinforced by characterization of its metal binding properties.24-26 Longest known prokaryotic Sco are those from Rhodobacter sphaeroides (called PrrC)27 and from Rhodobacter capsulatus (called SenC).28 PrrC has been proposed to function as a signal mediator between the cbb3 cytochrome c oxidase and the sensor kinase PrrB (also called RegB), which forms with the response regulator PrrA (or RegA) a two-component, phosphorylation-dependent system regulating the transcription of photosynthesis genes in response to changes in O2 tension.29,30 In this model, PrrC would not be involved in COX assembly. However, more recent results do not agree well with this model.31 Studies on SenC provide support for a role in copper delivery to cbb3 oxidase, even though this oxidase does not contain a CuA site.32 It was argued that the alterations in photosynthesis gene expression observed in Rhodobacter species affected by Sco mutations are actually an indirect effect due to a decreased cytochrome c oxidase activity.32 However, Sco homologues are not required for the assembly of cbb3 oxidase in Neisseria gonorrhoeae and Neisseria meningitidis, two pathogens lacking a CuA-containing COX.33 On the basis of the observation that sco mutants of Neisseria species are more sensitive than the wild-type strains in oxidative killing assays, it was proposed that Sco proteins in these bacteria are involved in protecting periplasmic proteins from oxidative damage.33 A similar behavior has been described also for senC mutants of R. capsulatus.34 In this work, we have performed a comprehensive search for Sco proteins encoded in prokaryotic genomes, and we have analyzed them in terms of (i) sequence features and (ii) genomic context, with the aim of obtaining confident predictions on their function in different bacteria and archaea. Our predictions are discussed in the light of the various roles that have been previously proposed for these proteins and that are summarized in this introduction. To investigate possible functional relationships, we examined the occurrence of Cox11 proteins as well. We propose a general model for Sco functioning, based on the integration of a metal ion into a redox-active, thioredoxin-related domain. The coupling of redox and metalbinding properties potentially endows Sco proteins with the capability to fulfill a relatively wide range of purposes depending on the cellular process in which they act. This framework may reconcile the diverse facts and descriptions that comprise the current knowledge on Sco, providing hints to understand their mode of action not only in prokaryotes but also in eukaryotes.

Methods We searched for genes encoding Sco proteins in all the 311 prokaryotic genomes (285 from Bacteria and 26 from Archaea) that were annotated as complete at NCBI (http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi) on March 2006. Initially, we used the HMMER program (http://hmmer.janelia.org/) to search the NCBI RefSeq sequence database35 for matches to the “Sco1-SenC” hidden Markov model (HMM) taken from the Pfam database36 (http://www.sanger.ac.uk/ Software/Pfam/). We applied a high threshold for the expectation value scores (E-value e 10) to minimize the possibility to miss distant homologues, resulting in an ensemble of 4766 potential hits. Out of these hits, we considered as true those assigned by the COGnitor program37,38 to COG1999 (comprising Sco proteins in the COG database39), whereas we excluded

research articles those assigned to any other COG. The COG (Clusters of Orthologous Groups) database is a manually curated collection of homologous protein ensembles. In this way, we selected 195 proteins and discarded 3440, while 1131 hits remained ambiguous; that is, they were not assigned to any COG. Then, we discarded those ambiguous hits identified by HMMER with E-value g 1 and lacking the CXXXC sequence pattern, thus, leaving out 962 of them. We analyzed the remaining 169 ambiguous hits by the BLink tool at NCBI (http://www.ncbi.nlm.nih.gov/sutils/static/blinkhelp.html), which provides precomputed results of BLASTP40 searches against the protein nonredundant (nr) database. We accepted as true the hits having the best three BLASTP matches corresponding either to proteins already identified as Sco in this search, or to proteins encoded in noncomplete genomes and annotated as Sco. We thus selected further 51 hits, bringing up the total to 246, and discarded 118. Subsequently, we could identify an additional 8 Sco proteins through the genomic context analysis (see below), that revealed their clear homology to Sco proteins already included in our ensemble. We thus finished our search with a total of 254 Sco proteins. We checked that this ensemble included all the proteins present in COG1999, as well as all those present in the SCO1-SenC Pfam family, and encoded in the 311 genomes taken into account. We applied a procedure analogous to that described above for searching COX2 genes in the same genomes. A HMMER search for matches to the Pfam HMM designated as COX2 resulted in 1883 potential hits with E-value e 10. We filtered out 1247 hits that the COGnitor program assigned to any COG different from COG1622 (comprising COX2 in the COG database), remaining with 636 hits either assigned to COG1622 (313 cases) or nonassigned (323 cases). Since COG1622 (as well as the COX2 Pfam family) also contains, in addition to CuAcontaining subunits of heme-copper oxidases, subunits of these enzymes that lack a CuA center (quinol oxidases18), we filtered the ensemble of COG1622-assigned hits by requiring them to have the CXXXCXXXH sequence pattern strongly associated to the presence of a CuA-binding site (http:// www.expasy.org/cgi-bin/prosite-search-ac?PDOC00075), thereby selecting 197 of these 313 hits. We applied the same criterion to the nonassigned hits, discarding 302 of them and selecting 21. Finally, we analyzed these latter 21 hits using BLink as above, selecting 19 of them and discarding two that we recognized as nitrous oxide reductases. Altogether, we identified a total of 216 COX2-encoding genes. We used Pfam searches coupled with the requirement of a heme-binding CXXCH motif to detect COX2-cytochrome c gene fusions in these sequences (76 instances). The above-mentioned search for COX2 genes was also used to identify genes encoding nitrous oxide reductase (NosZ, COG4263 in the COG database), because the NosZ C-terminal domain is homologous to the COX2 domain and contains a CuA site.41 We thus repeated the same procedure performed for COX2 but selecting hits assigned to COG4263 rather than to COG1622, resulting in a total of 29 NosZ-encoding genes. Finally, we used the same approach also to identify Cox11 genes, obtaining an initial ensemble of 1205 matches to the Pfam “CtaG_Cox11” HMM. We selected 69 hits assigned by COGnitor to COG3175, plus one (out of 230 nonassigned hits) on the basis of BLink results; therefore, we identified a total of 70 Cox11 proteins. Prokaryotic genomes where no Sco, COX2, NosZ, or Cox11 proteins were identified are listed in Supporting Information Table S1. Journal of Proteome Research • Vol. 6, No. 4, 2007 1569

research articles For any Sco protein sequence identified, we inferred Sco domain boundaries based on the start and end points of the alignment of the SCO1-SenC HMM on that sequence returned by HMMER. The occurrence and position of transmembrane helices were predicted using the TMHMM program42 (http:// www.cbs.dtu.dk/services/TMHMM-2.0/). We used the HMMALIGN tool of HMMER to align the Sco domain sequences to the SCO1-SenC HMM, and the HMMBUILD tool of the same program to build a profile HMM from the resulting multiple alignment. This profile HMM was visualized with the HMMlogo program.43 Amino acid residues occurring at a given position with probability >0.5 were ranked as highly conserved. We used the NACCESS program44 to calculate residue-specific solvent accessibility in the structure of B. subtilis Sco (PDB code 1ON4),20 and we used a 50% threshold to define exposed residues. We used the CLUSTALW program45 to calculate pairwise sequence alignments and to construct a dendrogram from the HMMALIGN multiple alignment. This dendrogram was displayed in a tree-like representation using the DRAWTREE program of the PHYLIP package.46 We manually analyzed the local neighborhood of Sco genes using the SHOPS Web tool47 (http://www.bioinformatics.med.uu.nl/shops/). SHOPS predicts operon structure based on loglikelihood analysis of intergenic distances.48 For organisms with complete genomes not included in the SHOPS database, we directly analyzed the table of protein coding genes available at RefSeq, and we considered as Sco gene neighbors those (i) encoded in the same DNA strand and (ii) located no more than 300 bp apart from Sco or any other neighbor. To describe the genomic neighborhood of Sco proteins, we generally followed COG-based annotation.

Results and Discussion Sequence Features and Distribution across Species of Sco Proteins. The search procedure described in the Methods section resulted in the identification of 254 Sco proteins encoded in a total of 311 prokaryotic genomes (285 from Bacteria and 26 from Archaea). The complete list of these proteins and other related information are given in Supporting Information Table S2. All the proteins retrieved contain a single Sco domain, in a few cases together with some other protein domains. The majority of the proteins is predicted to contain at least one membrane-spanning helix, commonly located in the amino-terminal region (see Supporting Information Table S2). Such N-terminal tails have very different lengths in human Sco1 and Sco2, but not in the two yeast proteins. It has been argued that the tails may be important in determining specialized functions for the human Sco proteins.14 In particular, these transmembrane regions may critically affect monomer/dimer equilibria of both human proteins, as well as the formation of hetero-oligomers.14 There are no available data supporting the existence of similar phenomena in prokaryotic systems. The average pairwise identity of the amino acid sequences of Sco domains is 21 ( 9%, hinting at a significant sequence variability across this protein family. Nonetheless, we constructed a profile HMM based on the multiple alignment of the 254 domain sequences to the Pfam SCO1-SenC HMM (available in Supporting Information Table S3), and we could identify 27 residues that are highly conserved (Figure 1A). These include, as expected, two cysteines in a CXXXC motif and one histidine (at position 121 in Figure 1A) which are involved in copper binding.6,22 Among highly conserved residues are also two aspartates in a DXXXD motif, which has been described 1570

Journal of Proteome Research • Vol. 6, No. 4, 2007

Banci et al.

in a previous analysis of proteins involved in COX assembly.49 Ten of these 27 residues are exposed to the solvent. Mapping such solvent-accessible residues on the three-dimensional structure of B. subtilis Sco (Figure 1B) reveals that, except for two glycines in loops, they concentrate on the side of the protein surface comprising the CXXXC motif. It is thus likely that Sco interacts with its protein partners through this side, and that this region encompasses the main determinants of Sco function. Close to the CXXXC cysteines, there is a hydrophobic groove mostly formed by aromatic residues,50 three of which are highly conserved (at positions 27, 76, and 80 in Figure 1A). It is possible that these residues are involved in intermolecular stacking interactions with partner proteins, but we envisage also the possibility that they might be part of an electron-transfer pathway from/to the CXXXC site. Table 1 and Table S1 (Supporting Information) summarize the results of our search for Sco-encoding genes in prokaryotic genomes. Sco proteins are present in 128 organisms covering a large variety of species from both Bacteria and Archaea. These organisms usually have a single Sco gene (63 cases), but can have up to seven. On the other hand, 183 organisms do not contain Sco homologues, which are lacking altogether in six main prokaryotic groups (Table 1 and Table S1 (Supporting Information)). Since the classical view on Sco function is that it is involved in the assembly of COX, and of the CuA center in particular, we searched the same prokaryotic genomes also for COX2 genes that encode the CuA-containing subunit II of COX. The results of this search (Table 1) allowed us to obtain a broad overview of the correlation between the occurrences of Sco and COX2 genes. We described this relationship by classifying the organisms in terms of the number of Sco and COX2 genes identified in their genomes (Table 2). In this outline, we defined as ‘regular’ genomes those either containing the same number of Sco and COX2 genes or lacking both, regardless of any prediction of Sco functions. Similarly, we defined as ‘defective’ and ‘redundant’ genomes those containing a number of Sco genes smaller and larger, respectively, than the number of COX2 genes. Even at this basic level of analysis, the indication emerges that, although a general correlation exists between Sco and COX2 genes (they are either both present or both absent in 82% of the genomes), this has significant exceptions. For example, 38 organisms do not have any Sco, yet have at least one COX2 gene. In these organisms, COX must be assembled without Sco, implying that they evolved a process of COX maturation where Sco is not required. On the other hand, we identified 18 organisms that have one or more Sco genes despite lacking COX2. These instances most clearly point out the possibility that Sco can function in prokaryotes also outside the process of COX assembly. When an organism encodes multiple Sco proteins, the similarity among their amino acid sequences is generally comparable to that observed throughout the entire family: on average, the pairwise identity between two Sco domains from the same organism is 24 ( 10%. This suggests that different Sco proteins expressed by a given organism might be adapted to play specific roles within the cell, rather than serving redundant functions. The Biological Functions of Sco Proteins. To obtain clues about the functional differentiation of Sco proteins, we analyzed the local neighborhood of Sco-encoding genes in all the prokaryotic genomes containing them. The analysis of the context of Sco genes resulted in the identification of a few common operon types, all occurring in organisms containing

Functions of Sco Proteins from Genome-Based Analysis

research articles

Figure 1. (A) Profile HMM of Sco domain sequences in HMM-logo visualization. Residues ranked as highly conserved by HMMER are indicated either by one asterisk (if they are not exposed to the solvent) or by two asterisks (if they are exposed to the solvent). (B) Structure of the B. subtilis Sco homologue YpmQ (PDB code 1ON4), showing the position of the residues that are highly conserved and exposed to the solvent (depicted in blue, in spacefilling mode). Except for two glycines (Gly23 and Gly33 in YpmQ numbering), these residues are found on one side of the protein surface, including the CXXXC motif and the putative copper-binding histidine (His135 in YpmQ numbering). Journal of Proteome Research • Vol. 6, No. 4, 2007 1571

research articles

Banci et al.

Table 1. List of Prokaryotic Genomes Where at Least One Protein among Sco, COX2, NosZ, or Cox11 Homologues Was Identifieda genome

domain

group

Sco

COX2

NosZ

caa3

Cox11

Aeropyrum pernix K1 Pyrobaculum aerophilum str. IM2 Sulfolobus acidocaldarius DSM 639 Sulfolobus solfataricus P2 Sulfolobus tokodaii str. 7 Archaeoglobus fulgidus DSM 4304 Haloarcula marismortui ATCC 43049 Halobacterium sp. NRC-1 Natronomonas pharaonis DSM 2160 Picrophilus torridus DSM 9790 Corynebacterium diphtheriae NCTC 13129 Corynebacterium efficiens YS-314 Corynebacterium glutamicum ATCC 13032 (Bielefeld) Corynebacterium glutamicum ATCC 13032 (Kitasato) Corynebacterium jeikeium K411 Frankia sp. CcI3 Leifsonia xyli subsp. xyli str. CTCB07 Mycobacterium avium subsp. paratuberculosis K-10 Mycobacterium bovis AF2122/97 Mycobacterium leprae TN Mycobacterium tuberculosis CDC1551 Mycobacterium tuberculosis H37Rv Nocardia farcinica IFM 10152 Propionibacterium acnes KPA171202 Streptomyces avermitilis MA-4680 Streptomyces coelicolor A3(2) Symbiobacterium thermophilum IAM 14863 Thermobifida fusca YX Tropheryma whipplei str. Twist Tropheryma whipplei TW08/27 Agrobacterium tumefaciens str. C58 (Cereon) Agrobacterium tumefaciens str. C58 (UWash) Anaplasma marginale str. St. Maries Anaplasma phagocytophilum HZ Bartonella henselae str. Houston-1 Bartonella quintana str. Toulouse Bradyrhizobium japonicum USDA 110 Brucella abortus biovar 1 str. 9-941 Brucella melitensis 16M Brucella melitensis biovar Abortus 2308 Brucella suis 1330 Candidatus pelagibacter ubique HTCC1062 Caulobacter crescentus CB15 Ehrlichia canis str. Jake Ehrlichia chaffeensis str. Arkansas Ehrlichia ruminantium str. Gardel Ehrlichia ruminantium str. Welgevonden (UPret) Ehrlichia ruminantium str. Welgevonden (CIRAD) Erythrobacter litoralis HTCC2594 Gluconobacter oxydans 621H Magnetospirillum magneticum AMB-1 Mesorhizobium loti MAFF303099 Neorickettsia sennetsu str. Miyayama Nitrobacter winogradskyi Nb-255 Novosphingobium aromaticivorans DSM 12444 Rhizobium etli CFN 42 Rhodobacter sphaeroides 2.4.1 Rhodopseudomonas palustris CGA009 Rhodopseudomonas palustris HaA2 Rhodospirillum rubrum ATCC 11170 Rickettsia conorii str. Malish 7 Rickettsia felis URRWXCal2 Rickettsia prowazekii str. Madrid E Rickettsia typhi str. Wilmington Silicibacter pomeroyi DSS-3 Sinorhizobium meliloti 1021 Wolbachia endosymbiont of Drosophila melanogaster Wolbachia endosymbiont strain TRS of Brugia malayi Aquifex aeolicus VF5 Salinibacter ruber DSM 13855 Azoarcus sp. EbN1 Bordetella bronchiseptica RB50 Bordetella parapertussis 12822 Bordetella pertussis Tohama I

Archaea Archaea Archaea Archaea Archaea Archaea Archaea Archaea Archaea Archaea Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria

Crenarchaeota Crenarchaeota Crenarchaeota Crenarchaeota Crenarchaeota Euryarchaeota Euryarchaeota Euryarchaeota Euryarchaeota Euryarchaeota Actinobacteria Actinobacteria Actinobacteria Actinobacteria Actinobacteria Actinobacteria Actinobacteria Actinobacteria Actinobacteria Actinobacteria Actinobacteria Actinobacteria Actinobacteria Actinobacteria Actinobacteria Actinobacteria Actinobacteria Actinobacteria Actinobacteria Actinobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria R-Proteobacteria Aquificae Bacteroidetes/Chlorobi β-Proteobacteria β-Proteobacteria β-Proteobacteria β-Proteobacteria

2 2 None None None None 1 1 2 None None None None None None 2 None None None None None None None None 1 1 1 1 None None 1 1 1 1 1 1 1 1 1 1 1 None 2 1 1 1 1 1 1 1 3 1 1 1 2 1 1 1 1 1 2 2 2 1 4 1 1 1 3 2 4 3 3 3

2 2 1 1 2 2 4 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 None None 3 1 1 1 1 1 1 1 1 1 1 1 1 None 3 3 1 3 1 4 2 1 2 None 1 1 1 None 2 3 1 1 2 2 4 1 1 1

None None None None None None 1 None None None None None None None None None None None None None None None None None None None None None None None None None None None None None 1 1 1 1 1 None None None None None None None None None 1 None None None None None None 1 1 None None None None None 1 1 None None None 1 1 None None None

None None None None None None None None None None None None None None None None None None None None None None None None None None 1 None None None None None None None None None None None None None None None None None None None None None None None None 1 None None None 1 1 None None None None None None None None 1 None None None 1 1 1 1 1

None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None 1 1 1 1 None None 1 None 1 None 1 1 1 1 1 1 1 1 1 None 1 1 1 2 1 1 1 1 1 None 1 1 1 None 1 1 1 1 None None 1 None None None

1572

Journal of Proteome Research • Vol. 6, No. 4, 2007

research articles

Functions of Sco Proteins from Genome-Based Analysis Table 1 (Continued) genome

domain

group

Sco

COX2

NosZ

caa3

Cox11

Burkholderia mallei ATCC 23344 Burkholderia pseudomallei 1710b Burkholderia pseudomallei K96243 Burkholderia sp. 383 Burkholderia thailandensis E264 Chromobacterium violaceum ATCC 12472 Dechloromonas aromatica RCB Neisseria gonorrhoeae FA 1090 Neisseria meningitidis MC58 Neisseria meningitidis Z2491 Nitrosomonas europaea ATCC 19718 Nitrosospira multiformis ATCC 25196 Ralstonia eutropha JMP134 Ralstonia solanacearum GMI1000 Thiobacillus denitrificans ATCC 25259 Anabaena variabilis ATCC 29413 Cyanobacteria bacterium Yellowstone A-Prime Cyanobacteria bacterium Yellowstone B-Prime Gloeobacter violaceus PCC 7421 Nostoc sp. PCC 7120 Prochlorococcus marinus str. MIT 9312 Prochlorococcus marinus str. MIT 9313 Prochlorococcus marinus str. NATL2A Prochlorococcus marinus subsp. marinus str. CCMP1375 Prochlorococcus marinus subsp. pastoris str. CCMP1986 Synechococcus elongatus PCC 6301 Synechococcus elongatus PCC 7942 Synechococcus sp. CC9605 Synechococcus sp. CC9902 Synechococcus sp. WH 8102 Synechocystis sp. PCC 6803 Thermosynechococcus elongatus BP-1 Deinococcus radiodurans R1 Thermus thermophilus HB27 Thermus thermophilus HB8 Anaeromyxobacter dehalogenans 2CP-C Bdellovibrio bacteriovorus HD100 Desulfovibrio desulfuricans G20 Desulfovibrio vulgaris subsp. vulgaris str. Hildenborough Geobacter metallireducens GS-15 Geobacter sulfurreducens PCA Pelobacter carbinolicus DSM 2380 Campylobacter jejuni RM1221 Campylobacter jejuni subsp. jejuni NCTC 11168 Thiomicrospira denitrificans ATCC 33889 Wolinella succinogenes DSM 1740 Bacillus anthracis str. Ames Bacillus anthracis str. 'Ames Ancestor' Bacillus anthracis str. Sterne Bacillus cereus ATCC 10987 Bacillus cereus ATCC 14579 Bacillus cereus E33L Bacillus clausii KSM-K16 Bacillus halodurans C-125 Bacillus licheniformis ATCC 14580 (Novozymes) Bacillus licheniformis ATCC 14580 (UGoet) Bacillus subtilis subsp. subtilis str. 168 Bacillus thuringiensis serovar konkukian str. 97-27 Geobacillus kaustophilus HTA426 Oceanobacillus iheyensis HTE831 Colwellia psychrerythraea 34H Hahella chejuensis KCTC 2396 Idiomarina loihiensis L2TR Legionella pneumophila str. Lens Legionella pneumophila str. Paris Legionella pneumophila subsp. pneumophila str. Philadelphia 1 Methylococcus capsulatus str. Bath Nitrosococcus oceani ATCC 19707 Photobacterium profundum SS9 Pseudoalteromonas haloplanktis TAC125 Pseudomonas aeruginosa PAO1 Pseudomonas fluorescens Pf-5 Pseudomonas fluorescens PfO-1

Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria

β-Proteobacteria β-Proteobacteria β-Proteobacteria β-Proteobacteria β-Proteobacteria β-Proteobacteria β-Proteobacteria β-Proteobacteria β-Proteobacteria β-Proteobacteria β-Proteobacteria β-Proteobacteria β-Proteobacteria β-Proteobacteria β-Proteobacteria Cyanobacteria Cyanobacteria Cyanobacteria Cyanobacteria Cyanobacteria Cyanobacteria Cyanobacteria Cyanobacteria Cyanobacteria Cyanobacteria Cyanobacteria Cyanobacteria Cyanobacteria Cyanobacteria Cyanobacteria Cyanobacteria Cyanobacteria Deinococcus-Thermus Deinococcus-Thermus Deinococcus-Thermus δ-Proteobacteria δ-Proteobacteria δ-Proteobacteria δ-Proteobacteria δ-Proteobacteria δ-Proteobacteria δ-Proteobacteria -Proteobacteria -Proteobacteria -Proteobacteria -Proteobacteria Firmicutes Firmicutes Firmicutes Firmicutes Firmicutes Firmicutes Firmicutes Firmicutes Firmicutes Firmicutes Firmicutes Firmicutes Firmicutes Firmicutes γ-Proteobacteria γ-Proteobacteria γ-Proteobacteria γ-Proteobacteria γ-Proteobacteria γ-Proteobacteria γ-Proteobacteria γ-Proteobacteria γ-Proteobacteria γ-Proteobacteria γ-Proteobacteria γ-Proteobacteria γ-Proteobacteria

5 6 7 3 5 3 5 1 1 1 3 5 3 6 2 None None None None None None None None None None None None None None None None None 2 1 1 2 3 1 1 1 4 1 1 1 None None 1 1 1 1 1 1 2 2 1 1 1 1 3 1 2 2 3 2 2 2 3 3 2 4 2 3 6

2 4 4 3 3 1 1 None None None 2 4 4 3 1 2 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 None None None None 1 1 1 1 1 1 2 2 1 1 1 1 3 2 1 1 1 1 1 1 2 4 2 1 1 1 1

1 1 1 None 1 None 2 None None None None None None 1 1 None None None None None None None None None None None None None None None None None None None None 1 None None None None None None None None 2 1 None None None None None None None None None None None None None None 1 1 None None None None None None 1 None 1 None None

1 2 2 2 1 1 1 None None None None 3 3 2 1 None None None None None None None None None None None None None None None None None None 1 1 2 1 1 1 1 1 1 None None None None 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1

1 1 1 1 1 1 1 None None None 1 1 1 1 1 None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None 1 None 1 1 1 1 1 1 1 1 1 1 1

Journal of Proteome Research • Vol. 6, No. 4, 2007 1573

research articles

Banci et al.

Table 1. (Continued) genome

domain

group

Pseudomonas putida KT2440 Pseudomonas syringae pv. phaseolicola 1448A Pseudomonas syringae pv. syringae B728a Pseudomonas syringae pv. tomato str. DC3000 Shewanella oneidensis MR-1 Thiomicrospira crunogena XCL-2 Vibrio cholerae O1 biovar eltor str. N16961 Vibrio fischeri ES114 Vibrio parahemeolyticus RIMD 2210633 Vibrio vulnificus CMCP6 Vibrio vulnificus YJ016 Xanthomonas axonopodis pv. citri str. 306 Xanthomonas campestris pv. campestris str. 8004 Xanthomonas campestris pv. campestris str. ATCC 33913 Xanthomonas campestris pv. vesicatoria str. 85-10 Xanthomonas oryzae pv. oryzae KACC10331 Xylella fastidiosa 9a5c Xylella fastidiosa Temecula1 Rhodopirellula baltica SH 1 Leptospira interrogans serovar Copenhageni str. Fiocruz L1-130 Leptospira interrogans serovar Lai str. 56601

Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria Bacteria

γ-Proteobacteria γ-Proteobacteria γ-Proteobacteria γ-Proteobacteria γ-Proteobacteria γ-Proteobacteria γ-Proteobacteria γ-Proteobacteria γ-Proteobacteria γ-Proteobacteria γ-Proteobacteria γ-Proteobacteria γ-Proteobacteria γ-Proteobacteria γ-Proteobacteria γ-Proteobacteria γ-Proteobacteria γ-Proteobacteria Planctomycetes Spirochaetes Spirochaetes

Sco

COX2

NosZ

caa3

Cox11

6 2 2 3 2 1 1 1 2 2 2 2 3 3 2 2 1 1 2 3 3

1 None None None 1 None None None 1 1 1 1 1 1 1 1 None None 2 1 1

None None None None None None None None None None None None None None None None None None None None None

1 None None None 1 None None None 1 1 1 None None None None None None None 2 1 1

1 1 1 1 1 None None None 1 1 1 1 1 1 1 1 None None None None None

a For each genome, the table reports: (i) organism name, (ii) domain to which the organism belongs, (iii) group to which the organism belongs, (iv) number of Sco homologues identified, (v) number of CuA-containing subunits II of COX identified (COX2), (vi) number of nitrous oxide reductases (NosZ) identified, (vii) number of caa3 oxidases identified, and (viii) number of Cox11 homologues identified. Genomes where no Sco, COX2, NosZ, or Cox11 proteins were identified are listed in Table S1.

Table 2. Genome Classification Based on the Number of Sco and COX2 Genes Identified (see Table 1 and Table S1)a number of Sco genes

number of COX2 genes

0 0 1 1 More than 1

0 1 or more 1 More than 1 As many as Sco genes More than Sco genes 0 1 or more, but less than Sco genes

More than 1 1 or more More than 1

genome type

minimal correlation

number of genomes

Regular Defective Regular Defective Regular

Yes No Yes Yes Yes

145 38 36 12 12

Defective

Yes

3

Redundant Redundant

No Yes

18 47

a A minimal correlation exists when Sco and COX2 genes are either both present or both absent, independently of their number.

a COX2 gene, which provided a classification basis for a part of the Sco proteins identified (Table S4 of Supporting Information and Figure 2). We then mapped the operon types on a dendrogram obtained from the multiple alignment of the amino acid sequences of Sco domains, with the aim of evaluating whether, and to what extent, the different operon types are correlated to sequence subgroups in the family (Figure 3). The topology of the tree substantially reflected the operon-based grouping. Note that this tree, which hereafter will be referred to as the ‘Sco tree’, is only a useful way to represent the similarity among the sequences included in the multiple alignment, but is not a phylogenetic tree, and should not be interpreted as such. Most of the identified operons that encode Sco contain also a gene for COX2 (Figure 2). This occurred in 56 organisms out of 110 that contained both Sco and COX2 genes. A first kind of operons of this type was found in β- and γ-proteobacteria. In its most complete form, it encoded (i) subunits I, II, and III of COX; (ii) homologues of the known COX assembly factors Cox10 (corresponding to COG0109 in the COG database), Cox11 (COG3175), Cox15 (COG1612), and Surf1 (COG3346); and (iii) 1574

Journal of Proteome Research • Vol. 6, No. 4, 2007

two Sco proteins. Because of the latter feature, we labeled this operon type as ‘Double Sco’ (even though in a very few cases the second Sco is not present). The two Sco proteins formed two distinct groups, respectively, ‘Double Sco-1’ and ‘Double Sco-2’, because of their remarkable difference in primary sequences, with an average pairwise identity of 16 ( 3% (Figure 3). ‘Double Sco-2’ proteins contained (almost) all highly conserved residues. Instead, sequences of Sco included in the ‘Double Sco-1’ group were quite divergent from the typical Sco sequence. Although all the ‘Double Sco-1’ sequences had the CXXXC motif, they lacked in most cases the histidine involved in copper binding. Additionally, they lacked the two highly conserved aspartates in the DXXXD motif. The differences between the sequences of these two Sco proteins suggest that if these two proteins are actually both involved in COX assembly, they should perform different functions. A second kind of operon containing both COX2 and Sco had only a single Sco protein, and was dubbed ‘COX2-associated’. These operons were found in several phylogenetically distant organisms, from both bacteria and archaea, and typically encoded subunits I-IV of COX, a single Sco, and the Cox10 assembly factor (Figure 2). The Sco proteins in these operons have 30% ( 10% average sequence identity, and mapped to a separate branch of the Sco tree (Figure 3). These two observations may indicate that these proteins share a common functional role. A distinctive feature is that nearly all of these proteins have a DXX[D,E] variant of the DXXXD motif. A third kind of operon containing both Sco and COX2 genes was identified in the R-proteobacterium Silicibacter pomeroyi and in the firmicute Geobacillus kaustophilus, as well as in a few β- and γ-proteobacteria having also a ‘Double Sco’ operon. In this third type, a single Sco is encoded together with subunits I and II of COX and a hypothetical protein (called HP1 hereafter). The latter is approximately 200 residues long and is predicted by TMHMM to be transmembrane. Notably, the genome of one of the above-mentioned β-proteobacteria, Azoarcus sp. EbN1, contains a further operon where Sco and HP1 are found not with COX, but with a cbb3-type cytochrome

Functions of Sco Proteins from Genome-Based Analysis

research articles

Figure 2. Scheme depicting the operon types identified for Sco genes. Only genes present in more than half of the genomes where the operon was identified are shown. COX subunits (COX1, COX2, COX3, COX4) are indicated in capital letters, at variance with COX assembly factors (Cox10, Cox11, Cox15). COG identifiers not cited in the text include COG0117 (pyrimidine deaminase), COG0226 (ABC-type phosphate transport system, periplasmic component), COG0307 (riboflavin synthase alpha chain), COG0688 (phosphatidylserine decarboxylase), COG0760 (parvulin-like peptidyl-prolyl isomerase), COG0785 (cytochrome c biogenesis protein), COG1062 (Zn-dependent alcohol dehydrogenases, class III), COG1262 (uncharacterized conserved protein), COG1451 (predicted metal-dependent hydrolase), COG1985 (pyrimidine reductase, riboflavin biosynthesis), COG2264 (ribosomal protein L11 methylase), COG2983 (uncharacterized conserved protein), and NOG13183 (nonsupervised orthologous group). FixN and FixO are two components of cbb3 oxidase. Hypothetical proteins are indicated as HP1, HP2, and HP3. Other abbreviations are cytc for cytochrome c and CCP for cytochrome c peroxidase. The symbol ‘+’ indicates gene fusion. The scheme does not include Sco found in operons occurring in only one of the prokaryotic genomes analyzed.

c oxidase that lacks a CuA center (Table S4, Supporting Information). A similar gene arrangement occurs also in Ralstonia solanacearum. Regardless of the association with either the aa3 or the cbb3 oxidase, all these Sco have quite similar sequences (average pairwise identity is 39% ( 8%); therefore, we grouped them together (‘HP1-associated’ in Figure 2). Such a high similarity suggests that they might have the same function and possibly work together with the HP1 protein in copper delivery to the CuB site, which is present in

both aa3 and cbb3 oxidases. As mentioned in the Introduction, this hypothesis had been put forward for R. capsulatus32, an organism that has a cbb3 but not an aa3 oxidase, and might in part explain the presence of Sco genes in COX2-lacking genomes. Twelve out of 18 organisms containing Sco but not COX2 indeed have cbb3 oxidase genes in their genome. To obtain more insight into the possible role of Sco in the assembly of the CuB site, we analyzed the distribution of genes encoding Cox11 proteins, which in eukaryotes are required for Journal of Proteome Research • Vol. 6, No. 4, 2007 1575

research articles

Banci et al.

Figure 3. Tree-like representation of the dendrogram obtained from the multiple alignment of the amino acid sequences of Sco domains. Mapping of the operon types identified is shown. Sequences from Bacilli include the COG3336-associated sequence from B. halodurans.

the formation of the CuB center.51,52 We found Cox11 homologues exclusively in the R, β, and γ subgroups of proteobacteria, for a total of 69 organisms. The presence of Cox11 genes in genomes was strongly correlated to the presence of both COX2 and Sco genes (Table 1): 65 genomes encode Sco, COX2, and Cox11, 1 (the Candidatus pelagibacter ubique HTCC1062 genome) encodes COX2 and Cox11 but not Sco, and 3 (the three genomes of Pseudomonas syringae strains) encode Sco and Cox11 but not COX2. These results suggest that only R-, β- and γ-proteobacteria evolved an eukaryotic-like COX assembly system that employs a Cox11 protein to specifically deliver copper to the CuB site. All these organisms encode a single Cox11 protein except Nitrobacter winogradskyi, whose genome encodes two Cox11 proteins that are 100% identical in sequence. Cox11 thus appears to be a highly specialized protein, fully devoted to the assembly of the CuB site. All the prokaryotes that encode COX (76 instances) and do not have any Cox11 must have a different pathway for the delivery of copper to the CuB site. It is possible that the 39 of them that encode Sco use it to build both the CuA and the CuB centers. This might be the case also for the six R-, β- and γ-proteobacterial organisms (Table 1) that have both COX and Sco, yet lack Cox11. Because of the high correlation between the presence of COX2 and Cox11, it appears unlikely that the latter has a general role in the assembly of the CuB site in other oxidases, such as cbb3 oxidase (note that 12 genomes encode Sco and a cbb3 oxidase, of which only the three P. syringae strains have Cox11 as well). COX is not the only copper-containing enzyme that is found encoded in close proximity to Sco proteins. We identified several operons containing genes encoding Sco and other, different copper enzymes, although these cases were usually restricted to a few species. A relatively common operon of this type, that we identified in 12 different species, encodes Sco and multicopper blue proteins (MCBPs). MCBPs are a large family of multidomain enzymes varying in their domain organization and functions, that rely on copper for their 1576

Journal of Proteome Research • Vol. 6, No. 4, 2007

catalytic activity and include nitrite reductases (generally designated as NirK proteins) and the diverse group of multicopper oxidases (MCOs).53 Sco proteins found in operons with MCBPs were included in the ‘MCBP-associated’ group, regardless of the specific type of MCBP. However, they mapped to different positions in the Sco tree depending on the type of the associated enzyme, suggesting that their sequences are specifically adapted to interact with each MCBP type. In particular, the nine Sco encoded with NirK proteins (average sequence identity >40%) are located closely together in the tree, but separated from those encoded with other MCBPs. In all but one Burkholderia species, we identified operons where Sco is encoded together with copper enzymes related to galactose oxidase. Galactose oxidase is an enzyme that catalyzes the conversion of primary alcohols to aldehydes and contains an unusual active site in which a copper ion is coordinated by a protein radical.54 This radical-copper motif is shared by a growing family of enzymes that catalyze distinct reactions and are collectively called radical-copper oxidases (RCOs).55 Similarly to what was observed for MCBP-associated Sco proteins, Sco associated to different types of RCOs map to different branches in the Sco tree, reflecting the sequence specificity necessary for the interaction with each partner. In Magnetospirillum magneticum, a Sco protein is encoded together with a nitrous oxide reductase (NosZ), an enzyme that catalyzes the reduction of nitrous oxide to dinitrogen and contains six copper ions.41,56 The NosZ C-terminal domain is homologous to the COX2 domain and contains a CuA site.41 Therefore, if Sco delivers copper to COX2, it is reasonable that it might carry out the same service for NosZ.57 We identified NosZ proteins in 27 organisms, and 25 of them encode both Sco and COX as well (Table 1). In addition, all of them, except Salinibacter ruber, also encode in the same operon of NosZ a proposed copper chaperone called NosL. Both NosL and Sco are not essential for assembly of NosZ,58 opening the possibility that in vivo they may interchange, depending on environmental conditions.

Functions of Sco Proteins from Genome-Based Analysis

The essential link between Sco proteins and copper emerges from the association not only with copper enzymes, but also with other copper-binding proteins. A widespread operon encodes Sco together with a protein (corresponding to COG2847 in the COG database) that has been structurally characterized and demonstrated to be a copper-binding protein by our group.59 A few organisms, such as Campylobacter jejuni, P. syringae, and Vibrio cholerae, do not have COX2 genes, but their genomes encode Sco and COG2847, suggesting that they may be jointly involved in copper trafficking also outside the process of COX assembly. We included all Sco from Bacilli in the ‘Bacilli’ group due to their high sequence similarity (average pairwise identity 43% ( 16%). Bacillus halodurans contained a Sco protein encoded in an operon together with a CtaG homologue (COG3336). The B. subtilis CtaG protein has been shown to be essential for the assembly of caa3 oxidase.60 An association between Sco and CtaG homologues was detected in several species, also in close proximity to ‘COG2847-associated’ operons. The latter gene clusters may further indicate a role for CtaG in copper transport across the membrane, since these proteins are predicted to be mostly transmembrane. Finally, we have detected two instances of fusions between Sco and cytochrome c domains. The association between Sco and cytochrome c is likely to be much tighter than what was suggested by these sporadic cases. In fact, the large majority of operons encoding Sco together with a copper enzyme include also a cytochrome c domain, that is most commonly fused to the enzyme. In particular, we identified cytochrome c genes (i) in 32 of 39 ‘Double Sco’ and 14 of 19 ‘COX2-associated’ operons, where it is almost invariably fused to COX2; (ii) in all ‘MCBP-associated’ operons, where it is fused either to Sco or to the enzyme; and (iii) in 6 of 8 ‘RCO-associated’ operons, where the cytochrome c domain is encoded by a distinct gene. Additionally, cytochrome c genes are present in ‘HP1-associated’ operons encoding cbb3 oxidases, which in turn contain cytochrome c subunits. The fusion between COX2 and cytochrome c observed in most ‘Double Sco’ and ‘COX-associated’ operons is the hallmark of cytochrome c oxidases of the caa3 type. The species distribution of caa3 oxidases revealed that the association of these enzymes to Sco is tighter than that for CuA-containing COX in general. Indeed, all the organisms that encode a caa3 oxidase also encode at least one Sco. In contrast, as already mentioned, there are organisms encoding non-caa3 COX and lacking Sco and organisms encoding Sco and lacking any COX, even of caa3 type. Therefore, a possible hypothesis to be confirmed experimentally is that Sco is necessary for assembly of caa3 oxidases. More generally, the presence of cytochrome c genes in different Sco operons suggests that cytochrome c might be crucial for Sco functioning in a variety of contexts. A General Model for Sco Function. As described, the genomic context of Sco genes is remarkably varied, indicating that Sco proteins in prokaryotes can take part in different biological processes. The classical view of Sco as a factor in copper incorporation during the assembly of COX, mainly derived from the studies on the eukaryotic homologues, is clearly supported by the wide species distribution of operons encoding Sco and COX together, although a number of bacteria can carry out COX maturation without Sco. In addition, the identification of operons that encode Sco with copper enzymes other than COX supports the contention that Sco proteins can deliver copper to a variety of such enzymes. Different operons

research articles

Figure 4. Scheme depicting the two possible mechanisms discussed in the text for Sco functioning as a redox-dependent copper transporter. In the first mechanism (left panel), copper release is dependent on the oxidation of the cysteine thiols of the CXXXC motif to the disulfide form, but not on the oxidation state of the metal. In the second mechanism (right panel), copper is more easily oxidized than are the CXXXC cysteines (which may still bind the metal after its oxidation, as indicated by dashed lines in the intermediate picture), and eventually released in its oxidized form, leaving the cysteines in the reduced state.

generally correspond to distinct sequence subgroups. However, highly conserved residues are shared by the large majority of sequences, suggesting that other residues are responsible for tuning the functionality of Sco proteins around a generally conserved mode of action. Here, we suggest a speculative model for Sco functioning that may explain the different roles of these proteins as functional variations on a general theme based on the insertion of a metal within a redox-active protein framework. The basic assumption of our model is that Sco proteins evolved from proteins endowed with a thiol/disulfide oxidoreductase activity. This assumption relies on the longrecognized similarity of Sco proteins to peroxiredoxins and thiol/disulfide oxidoreductases, in terms of both sequence16 and structure.20 Furthermore, the available experimental data for B. subtilis and R. sphaeroides homologues indicate that the two cysteines of the CXXXC motif can actually undergo reversible oxidation.50,61 As mentioned, the CXXXC motif is also able to bind a copper(I) ion. This double character might allow Sco proteins to function as redox-dependent metal transporters, through a mechanism that is known to act in other proteins: 62,63 the metal is coordinated by the cysteines of the CXXXC motif in the reduced state, and is released upon oxidation of the cysteine thiols to the disulfide form (Figure 4, left panel). An alternative mechanism is possible if the metal itself is a redox-active species, for example, copper. In this case, the metal ion might be preferentially oxidized instead of the cysteines, preserving them in the reduced state and keeping them available for thiol/disulfide exchange reactions (Figure 4, right panel). This mechanism may require that the protein is able to rearrange the coordination sphere of the metal upon oxidation of the latter, and does not exclude that the oxidized metal is eventually transferred to protein partners that have higher affinity than that of Sco for the oxidized metal ion and lower affinity for the reduced metal. The experimental observaJournal of Proteome Research • Vol. 6, No. 4, 2007 1577

research articles tion that Sco proteins can bind Cu2+ ions as well as Cu+ ions22,26,64 suggests that this mechanism may indeed be active. In both mechanisms, the function of Sco is potentially twofold, because it can carry out metal transfer together with redox protection. These two roles may be indeed combined in Sco homologues that deliver copper to the CuA center of COX, since in that case metal transfer may occur through a thiol/disulfide exchange reaction involving the CXXXC motif and the CuA cysteines that would be thus reduced to permit metal coordination. Other Sco homologues, however, may have only one of the two functions. Sco proteins associated to non-COX copper enzymes could function as pure copper chaperones, because these enzymes do not use cysteine pairs for copper coordination. Still, copper transfer might involve oxidation either of the metal or of the CXXXC cysteines by a third protein, such as cytochrome c. It is possible that metal transfer dependent on the oxidation state of the metal is the working mechanism also for Sco homologues that have lost either or both cysteines of the CXXXC motif (Table S2, Supporting Information), and use other amino acid side chains to coordinate the metal (not necessarily only copper). Note that a third possible mode of action for some Sco proteins could be to simply transfer copper(I) to the binding site on a partner protein, without any redox chemistry taking place, as observed for other copper chaperones.65 Within the present model, Sco proteins are able to sense the redox state of their cellular environment through both the CXXXC cysteines and the metal. They are thus intrinsically redox sensors and can correspondingly convey a signal by releasing the metal. If the metal is transferred to a regulatory protein whose activity depends on metal binding, the signal conveyed by Sco will ultimately affect the processes controlled by that protein. Indeed, the activity of the sensor kinase RegB, which is involved in regulating a variety of metabolic processes in R. capsulatus, is metal-dependent, and the metal is most likely copper.66 In this frame, sequence variations can result in fine structural changes and, thus, influence the reduction potentials of the CXXXC cysteines and the metal, thereby determining the specific mechanism of action and overall physiological role of Sco proteins.

Supporting Information Available: Table S1, list of prokaryotic genomes where no Sco, COX2, NosZ or Cox11 proteins were identified. For each genome, the table reports: (i) genome name, (ii) domain to which the organism belongs, and (iii) group to which the organism belongs. Table S2, list of the 254 Sco proteins identified in prokaryotic genomes. For each protein, the table reports: (i) GI code, (ii) available annotation, (iii) genome where the protein was identified, (iv) residue length of the protein, (v) and (vi) residue length and range of the Sco domain as inferred by HMMER, (vii) residue ranges of transmembrane regions as predicted by TMHMM, (viii) HMMER E-value of sequence matching to the Pfam SCO1SenC HMM, (ix) COG assignment yielded by COGnitor, (x) occurrence and variants of the CXXXC pattern in the sequence, and (xi) additional notes regarding the identification, if needed. Table S3, multiple sequence alignment of Sco domains obtained by aligning the sequences to the Pfam SCO1-SenC HMM with the HMMALIGN program of HMMER. Table S4, composition of the groups based on the genomic context analysis of Sco. For each Sco, the table reports: (i) GI code, (ii) genome where the gene was identified,and (iii) additional notes, if in 1578

Journal of Proteome Research • Vol. 6, No. 4, 2007

Banci et al.

case. This material is available free of charge via the Internet at http://pubs.acs.org.

References (1) Schulze, M.; Rodel, G. Mol. Gen. Genet. 1988, 211, 492-498. (2) Glerum, D. M.; Shtanko, A.; Tzagoloff, A. J. Biol. Chem. 1996, 271, 20531-20535. (3) Cobine, P. A.; Pierrel, F.; Winge, D. R. Biochim. Biophys. Acta 2006, 1763, 759-772. (4) Herrmann, J. M.; Funes, S. Gene 2005, 354, 43-52. (5) Lode, A.; Kuschel, M.; Paret, C.; Rodel, G. FEBS Lett. 2000, 485, 19-24. (6) Nittis, T.; George, G. N.; Winge, D. R. J. Biol. Chem. 2001, 276, 42520-42526. (7) Jaksch, M.; Ogilvie, I.; Yao, J.; Kortenhaus, G.; Bresser, H. G.; Gerbitz, K. D.; Shoubridge, E. A. Hum. Mol. Genet. 2000, 9, 795801. (8) Leary, S. C.; Mattman, A.; Wai, T.; Koehn, D. C.; Clarke, L. A.; Chan, S.; Lomax, B.; Eydoux, P.; Vallance, H. D.; Shoubridge, E. A. Mol. Genet. Metab. 2006, 89, 129-133. (9) Papadopoulou, L. C.; Sue, C. M.; Davidson, M. M.; Tanji, K.; Nishino, I.; Sadlock, J. E.; Krishna, S.; Walker, W.; Selby, J.; Glerum, D. M.; Coster, R. V.; Lyon, G.; Scalais, E.; Lebel, R.; Kaplan, P.; Shanske, S.; De, V. D.; Bonilla, E.; Hirano, M.; DiMauro, S.; Schon, E. A. Nat. Genet. 1999, 23, 333-337. (10) Salviati, L.; Sacconi, S.; Rasalan, M. M.; Kronn, D. F.; Braun, A.; Canoll, P.; Davidson, M.; Shanske, S.; Bonilla, E.; Hays, A. P.; Schon, E. A.; DiMauro, S. Arch. Neurol. 2002, 59, 862-865. (11) Shoubridge, E. A. Am. J. Med. Genet. 2001, 106, 46-52. (12) Tarnopolsky, M. A.; Bourgeois, J. M.; Fu, M. H.; Kataeva, G.; Shah, J.; Simon, D. K.; Mahoney, D.; Johns, D.; MacKay, N.; Robinson, B. H. Am. J. Med. Genet. A 2004, 125, 310-314. (13) Valnot, I.; Osmond, S.; Gigarel, N.; Mehaye, B.; Amiel, J.; CormierDaire, V.; Munnich, A.; Bonnefont, J. P.; Rustin, P.; Rotig, A. Am. J. Hum. Genet. 2000, 67, 1104-1109. (14) Leary, S. C.; Kaufman, B. A.; Pellecchia, G.; Guercin, G. H.; Mattman, A.; Jaksch, M.; Shoubridge, E. A. Hum. Mol. Genet. 2004, 13, 1839-1848. (15) Matoba, S.; Kang, J. G.; Patino, W. D.; Wragg, A.; Boehm, M.; Gavrilova, O.; Hurley, P. J.; Bunz, F.; Hwang, P. M. Science 2006, 312, 1650-1653. (16) Chinenov, Y. V. J. Mol. Med. 2000, 78, 239-242. (17) Garcia-Horsman, J. A.; Barquera, B.; Rumbley, J.; Ma, J.; Gennis, R. B. J. Bacteriol. 1994, 176, 5587-5600. (18) Pereira, M. M.; Santana, M.; Teixeira, M. Biochim. Biophys. Acta 2001, 1505, 185-208. (19) Gray, M. W.; Burger, G.; Lang, B. F. Science 1999, 283, 14761481. (20) Balatri, E.; Banci, L.; Bertini, I.; Cantini, F.; Ciofi-Baffoni, S. Structure 2003, 11, 1431-1443. (21) Williams, J. C.; Sue, C.; Banting, G. S.; Yang, H.; Glerum, D. M.; Hendrickson, W. A.; Schon, E. A. J. Biol. Chem. 2005, 280, 1520215211. (22) Banci, L.; Bertini, I.; Calderone, V.; Ciofi-Baffoni, S.; Mangani, S.; Martinelli, M.; Palumaa, P.; Wang, S. Proc. Natl. Acad. Sci. U.S.A. 2006, 103, 8595-8600. (23) Abajian, C.; Rosenzweig, A. C. J. Biol. Inorg. Chem. 2006, 11, 459466. (24) Mattatall, N. R.; Jazairi, J.; Hill, B. C. J. Biol. Chem. 2000, 275, 28802-28809. (25) Andruzzi, L.; Nakano, M.; Nilges, M. J.; Blackburn, N. J. J. Am. Chem. Soc. 2005, 127, 16548-16558. (26) Imriskova-Sosova, I.; Andrews, D.; Yam, K.; Davidson, D.; Yachnin, Y.; Hill, B. C. Biochemistry 2006, 44, 16949-16956. (27) Eraso, J. M.; Kaplan, S. J. Bacteriol. 1995, 177, 2695-2706. (28) Buggy, J.; Bauer, C. E. J. Bacteriol. 1995, 177, 6958-6965. (29) Eraso, J. M.; Kaplan, S. Biochemistry 2000, 39, 2052-2062. (30) Oh, J. I.; Kaplan, S. Mol. Microbiol. 2001, 39, 1116-1123. (31) Oh, J. I.; Ko, I. J.; Kaplan, S. Biochemistry 2004, 43, 7915-7923. (32) Swem, D. L.; Swem, L. R.; Setterdahl, A.; Bauer, C. E. J. Bacteriol. 2005, 187, 8081-8087. (33) Seib, K. L.; Jennings, M. P.; McEwan, A. G. FEBS Lett. 2003, 546, 411-415. (34) Borsetti, F.; Tremaroli, V.; Michelacci, F.; Borghese, R.; Winterstein, C.; Daldal, F.; Zannoni, D. Res. Microbiol. 2005, 156, 807813. (35) Pruitt, K. D.; Tatusova, T.; Maglott, D. R. Nucleic Acids Res. 2005, 33, D501-D504.

research articles

Functions of Sco Proteins from Genome-Based Analysis (36) Finn, R. D.; Mistry, J.; Schuster-Bockler, B.; Griffiths-Jones, S.; Hollich, V.; Lassmann, T.; Moxon, S.; Marshall, M.; Khanna, A.; Durbin, R.; Eddy, S. R.; Sonnhammer, E. L.; Bateman, A. Nucleic Acids Res. 2006, 34, D247-D251. (37) Tatusov, R. L.; Galperin, M. Y.; Natale, D. A.; Koonin, E. V. Nucleic Acids Res. 2000, 28, 33-36. (38) Tatusov, R. L.; Natale, D. A.; Garkavtsev, I. V.; Tatusova, T. A.; Shankavaram, U. T.; Rao, B. S.; Kiryutin, B.; Galperin, M. Y.; Fedorova, R. D.; Koonin, E. V. Nucleic Acids Res. 2001, 29, 2228. (39) Tatusov, R. L.; Koonin, E. V.; Lipman, D. J. Science 1997, 278, 631-637. (40) Altschul, S. F.; Madden, T. L.; Schaeffer, A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D. J. Nucleic Acids Res. 1997, 25, 3389-3402. (41) Brown, K.; Tegoni, M.; Prudencio, M.; Pereira, A. S.; Besson, S.; Moura, J. J.; Moura, I.; Cambillau, C. Nat. Struct. Biol. 2000, 7, 191-195. (42) Krogh, A.; Larsson, B.; von Heijne, G.; Sonnhammer, E. L. L. J. Mol. Biol. 2001, 305, 567-580. (43) Schuster-Bockler, B.; Schultz, J.; Rahmann, S. BMC Bioinf.. 2004, 5, 7. (44) Hubbard, S. J.; Thornton, J. M. NACCESS Computer Program; Department of Biochemistry and Molecular Biology, University College London: London, England, 1993. (45) Thompson, J. D.; Higgins, D. G.; Gibson, T. J. Nucleic Acids Res. 1994, 22, 4673-4680. (46) Felsenstein, J. Cladistics 1989, 5, 164-166. (47) van Bakel, H.; Huynen, M.; Wijmenga, C. Bioinformatics 2004, 20, 2644-2655. (48) Moreno-Hagelsieb, G.; Collado-Vides, J. Bioinformatics 2002, 18 Suppl. 1, S329-S336. (49) Arnesano, F.; Banci, L.; Bertini, I.; Martinelli, M. J. Proteome Res. 2005, 4, 63-70. (50) Ye, Q.; Imriskova-Sosova, I.; Hill, B. C.; Jia, Z. Biochemistry 2005, 44, 2934-2942.

(51) Hiser, L.; Di Valentin, M.; Hamer, A. G.; Hosler, J. P. J. Biol. Chem. 2000, 275, 619-623. (52) Banci, L.; Bertini, I.; Cantini, F.; Ciofi-Baffoni, S.; Gonnelli, L.; Mangani, S. J. Biol. Chem. 2004, 279, 34833-34839. (53) Nakamura, K.; Go, N. Cell. Mol. Life Sci. 2005, 62, 2050-2066. (54) Ito, N.; Phillips, S. E.; Stevens, C.; Ogel, Z. B.; McPherson, M. J.; Keen, J. N.; Yadav, K. D.; Knowles, P. F. Nature 1991, 350, 8790. (55) Whittaker, J. W. Arch. Biochem. Biophys. 2005, 433, 227-239. (56) Brown, K.; Djinovic-Carugo, K.; Haltia, T.; Cabrito, I.; Saraste, M.; Moura, J. J.; Moura, I.; Tegoni, M.; Cambillau, C. J. Biol. Chem. 2000, 275, 41133-41136. (57) Wunsch, P.; Herb, M.; Wieland, H.; Schiek, U. M.; Zumft, W. G. J. Bacteriol. 2003, 185, 887-896. (58) Zumft, W. G. J. Mol. Microbiol. Biotechnol. 2005, 10, 154-166. (59) Banci, L.; Bertini, I.; Ciofi-Baffoni, S.; Katsari, E.; Katsaros, N.; Kubicek, K.; Mangani, S. Proc. Natl. Acad. Sci. U.S.A. 2005, 102, 3994-3999. (60) Bengtsson, J.; Von Wachenfeldt, C.; Winstedt, L.; Nygaard, P.; Hederstedt, L. Microbiology 2004, 150, 415-425. (61) McEwan, A. G.; Lewin, A.; Davy, S. L.; Boetzel, R.; Leech, A.; Walker, D.; Wood, T.; Moore, G. R. FEBS Lett. 2002, 518, 10-16. (62) Graf, P. C.; Jakob, U. Cell. Mol. Life Sci. 2002, 59, 1624-1631. (63) Maret, W. Biochemistry 2004, 43, 3301-3309. (64) Horng, Y. C.; Leary, S. C.; Cobine, P. A.; Young, F. B.; George, G. N.; Shoubridge, E. A.; Winge, D. R. J. Biol. Chem. 2005, 280, 34113-34122. (65) Banci, L.; Rosato, A. Acc. Chem. Res. 2003, 36, 215-221. (66) Swem, L. R.; Kraft, B. J.; Swem, D. L.; Setterdahl, A. T.; Masuda, S.; Knaff, D. B.; Zaleski, J. M.; Bauer, C. E. EMBO J. 2003, 22, 4699-4708.

PR060538P

Journal of Proteome Research • Vol. 6, No. 4, 2007 1579