Determination and Comparison of the Francisella tularensis subsp

Apr 2, 2008 - This analysis revealed two previously unidentified protein coding open reading frames and validated 50% of the proteins annotated as hyp...
0 downloads 8 Views 1MB Size
Determination and Comparison of the Francisella tularensis subsp.novicida U112 Proteome to Other Bacterial Proteomes Laurence Rohmer,*,† Tina Guina,⊥ Jinzhi Chen,| Byron Gallis,| Greg K. Taylor,| Scott A. Shaffer,| Samuel I. Miller,†,‡,§ Mitchell J. Brittnacher,† and David R. Goodlett| Department of Genome Sciences, Microbiology, Medicine, Medicinal Chemistry, and Department of Pediatrics, Division of Infectious Diseases, University of Washington, Seattle, Washington 98195 Received November 15, 2007

The proteins expressed by Francisella tularensis subsp. novicida U112 grown to midexponential phase were surveyed by nanoLC-tandem mass spectrometry (LC-MS/MS). To improve annotation of the genome and develop a technology to provide high-throughput analysis of the Francisella proteome in multiple conditions, we sought to establish a fast and simple analysis that would reduce as much as possible the false discovery rate. Our survey detected expression of 63.0% of the predicted proteome from the stable condition of growth in rich medium available at (www.francisella.org). On the basis of detection of essential proteins, we estimated coverage to be approximately 80% of the actual expressed proteome. This suggests that no less than 70% of the proteins could be expressed in this condition. This analysis revealed two previously unidentified protein coding open reading frames and validated 50% of the proteins annotated as hypothetical. On the basis of results of the screen to detect essential proteins, not all proteins expressed provide a measurable contribution to F.t. novicida growth in this condition. Comparison of this protein profile with other profiles previously published suggested that the genome size and number of genes involved in regulation have little effect on the number of proteins expressed in a given stable condition. Keywords: bacterial proteome • tandem mass spectrometry • LC-MS/MS • Sequest criteria • Francisella tularensis

1. Introduction The proteome profile of bacteria has been used for many purposes, such as drug and vaccine development,1,2 to visualize the metabolic networks of bacteria grown in relevant conditions3–7 and to improve the genome annotation that was based on bioinformatics predictions. For example, high-throughput proteomics validated hypothetical proteins,3,8–13 validated or corrected translation start sites,14,15 validated operons,16 and uncovered coding regions not predicted by computational genome analysis.4 High-throughput proteomics has been greatly facilitated by the use of multidimensional liquid chromatography coupled with tandem mass spectrometry, and its use to complement genome annotation is bound to become widespread. Systems biology, which benefits greatly from data integration, also offers multiple applications for high-throughput proteomics. Strategies the cell uses to regulate distinct processes and the relationship between some of them, such as transcription and translation, can be investigated with data * Corresponding author: Laurence Rohmer, University of Washington, Department of Genome Sciences, 1959 NE Pacific St, Campus Box 357710, Seattle, WA 98195. Phone: 206-897-1720. Fax: 206-616-5109. Email: lrohmer@ u.washington.edu. † Department of Genome Sciences, University of Washington. ⊥ Department of Pediatrics, University of Washington. | Department of Medicinal Chemistry, University of Washington. ‡ Department of Microbiology, University of Washington. § Department of Medicine, University of Washington.

2016 Journal of Proteome Research 2008, 7, 2016–2024 Published on Web 04/02/2008

generated by high-throughput proteomics. For example, analysis of the proteome of Desulfovibrio vulgaris showed that the mRNA and protein abundance were correlated to some extent but modulated by numerous sequence features: codon and amino acid composition, stop codon context and ribosome binding site.17,18 The proteome of Shewanella oneidensis shed light on the extent of post-translational modifications taking place in the cell, which reveals an important part of cellular regulation.14 Francisella tularensis is a highly infectious pathogen which could potentially be used as a bioweapon, hence the need to develop drugs and vaccines against it. Proteomics is an important tool for this purpose, in part, because it can be used to identify proteins that are specific to virulent strains. The proteomes of F. tularensis strains showing various degrees of virulence have been compared using 2D gels,2,19,20 and significant differences were observed in the level of protein expression between these strains. The 2D gels were also used to investigate the proteome of Francisella strains in the context of infection: either from bacteria directly retrieved from the spleens of infected mice,21 or from bacteria grown in a medium mimicking aspects of the host environment such as oxidative stress and iron depletion.22,23 A novel MS-based proteomic method has also been used to uncover the proteins that could play a role in the infection by comparing the level of proteins expression in a wild-type Francisella tularensis subsp. novicida 10.1021/pr700760z CCC: $40.75

 2008 American Chemical Society

Baseline Proteome of Francisella tularensis subsp. novicida U112 24

and a strain lacking a functional MglA regulator, that extensively contributes to F. tularensis ssp. virulence. These proteomic results, combined with a study of protein immunoreactivity, are useful to establish a list of candidates that could be combined in a component vaccine.1,25,26 This study is designed to implement a technology pipeline that allows a comprehensive survey of Francisella proteomes in relevant conditions, using the power of multidimensional liquid chromatography coupled with tandem mass spectrometry and gas-phase fractionation.27 We explored strategies to optimize yield and analytical quality of the tandem mass spectra. These results were used to refine the genome annotation, in particular to validate hypothetical proteins, and to provide insight into the relationship between regulation of protein expression and diverse factors such as genome size and ability of the organism to respond to its environment. As a simple starting point, we used the nonvirulent strain U112, for which the genome sequence is available, grown to midlog phase in a “non-challenging” rich medium.

2. Experimental Section 2.1. F.t. novicida Growth and Protein Sample Preparation. F.t. novicida U112 was obtained from Francis Nano (University of Victoria, Victoria, Canada). Fifteen individual cultures of F.t. novicida were grown in rich medium, Tryptic soy broth (Difco, Detroit, MI) supplemented with 0.1% cysteine (TSB-C), at 37 °C. Bacterial cells were collected during the midlogarithmic phase of growth when optical absorbance (OD 600 nm) of all cultures was 0.6-0.7, which corresponds to ∼6 × 108 cfu/mL for each strain. Cells were collected by sedimentation at 10 000g for 15 min at 4 °C. Cells were then resuspended in 1:100 vol of ice-cold 50 mM Tris, pH 8.3, and stored at -80 °C until further processing. 2.1.2. Preparation of Whole Cell Extracts. Frozen cell pellets were thawed and lysed by sonication in an ice water bath. Unbroken cells were removed by sedimentation at 5000g for 15 min at 4 °C, and the extract was stored at -80 °C. 2.1.3. Separation of Soluble and Insoluble Fractions from a Whole Cell Lysate. Sonicated whole cell lysates were sedimented at 120 000g for 2 h at 4°C to separate a soluble from an insoluble, membrane-enriched protein fraction. Insoluble proteins were homogenized in an ice-cold buffer of 50 mM Tris, pH 8.3, and frozen. 2.1.4. Ammonium Sulfate Fractionation of Whole Cell Francisella Lysates. The starting amount of protein for the lysate was 2 mg in 500 µL of PBS. Ammonium sulfate was added to 30% saturation, vortexed gently to dissolve the salt, and placed on ice for 30 min. the lysate was centrifuged at 10 000g for 30 min at 4°C. the supernatant was removed, ammonium sulfate was added to 50%, and the process was repeated. Ammonium sulfate was then added successively to 70% and 90% saturation and the sample treated as above. Following centrifugation of the 30%, 50%, 70%, and 90% precipitates, each protein pellet was dissolved in 50 mM ammonium bicarbonate and frozen. 2.1.5. Fractionation of Inner and Outer Membranes. Inner and outer membranes were prepared from 2 L of log-phase grown Francisella by the method of Osborn and colleagues.28 2.1.6. Protein Assay and Proteolytic Digestion of Protein Fractions. Protein concentrations of each of the fractions was determined with the Bradford assay (Pierce, Rockford, IL). All protein fractions were processed for digestion with sequencing-

research articles

grade trypsin (Promega, Madison, WI) or endoglu-C (Roche, Indianapolis, IN) at a 50:1 protein to enzyme ratio as described in detail.29 2.2. Mass Spectrometric Peptide Analysis. Peptide digests of each cellular fraction (triplicates of whole-cell, membrane and soluble fractions of F.t. novicida wild-type and mglA mutant) were analyzed in quadruplicate by microcapillary HPLC electrospray ionization mass spectrometry (LC-ESI-MS) on a linear ion trap-Fourier transform-ion cyclotron resonance mass spectrometer (LTQ-FT; Thermo Electron, San Jose, CA). The high-pressure liquid chromatography (HPLC) system (Paradigm MS4; Michrom Bioresources, Auburn, CA) was configured as described30 with few modifications. Briefly, 5 mg of protein digest was loaded onto a precolumn (100 mm i.d. packed with 1.5 cm of 200 Å pore Magic C18AQ beads; Michrom Bioresources, CA) and washed with solvent A (5% acetonitrile and 0.1% formic acid) at a flow rate of 50 µL/min for 5 min. Peptides were then eluted with a linear gradient of 5–35% solvent B (100% acetonitrile, 0.1% formic acid) over 60 min with an analytical column (75 mm i.d. packed with 11 cm of 100 Å Magic C18AQ (Michrom Bioresources)) at ∼200 nL/ min to the mass spectrometer. The LTQ-FT was operated in the data-dependent mode and alternated between MS and MS/ MS acquisition. Precursor ion (MS) scans over the range of m/z 400–1800 were acquired in the FT-ICR cell, while the five most intense ions were sequentially isolated and subjected to collision-induced dissociation (CID) in the linear ion trap. The mass spectrometric conditions were optimized using a tuning solution composed of caffeine (Sigma, St. Louis, MO), MRFA (Bachem, King of Prussia, PA) and Ultramark 1621 (Lancaster Synthesis, Windham, NH). For MS, the resolution of the ICR cell was set to 100 000 (m/z 400). For MS/MS, the normalized collision energy was set to 30%. In addition, samples were also analyzed by gas-phase fractionation (GPF).31 This improved peptide identification by allowing the data-dependent process to select precursor ions from one of four m/z ranges (600–1000, 900–1250, 1200–1650, 1500–2000) collected on successive runs. 2.3. Estimation of the Peptides FDR. SEQUEST provides two quality scores for each peptide identified: the crosscorrelation score (X-corr), a measure of the similarity between the spectrum and the predicted matching peptide, and the delta-correlation score (D-corr), a measure of the difference of X-corr between the 2 first best peptide hits in the database. The peptide mass tolerance in the SEQUEST parameter file was set to 2.1 Da. No amino acid modification was taken into account. They were compared against a comprehensive database of possible peptides generated by SEQUEST from the translation of the coding regions of F.t. subsp. novicida as well as the 6 frames translation of all intergenic region, and against a decoy database: the genome database for which each sequence has been reversed. For given thresholds of X-corr and D-corr in each experiment and for a given peptide ion charge (ion charges >3 were not considered), the peptide false discovery rate was estimated as follows. Estimated peptide false discovery rate ) (number of peptides identified from the reverse database/number of peptides generated from the forward database) × 100. If the selected X-corr and D-corr threshold for each experiment is so that 1 peptide maximum would be generated from the reverse database for 1000 peptides generated from the forward database, the estimated false discovery rate is about 0.1%. When multiple X-corr and D-corr thresholds resulted in Journal of Proteome Research • Vol. 7, No. 5, 2008 2017

research articles

Rohmer et al.

Table 1. Number of Proteins Identified Relative to the Various Filters Applied for the Identification of Peptides BEFORE filtering with enzyme cleavage sites

AFTER filtering with enzyme cleavage sites

estimated peptide FDR (%)

proteins identified from the forward database

proteins identified from the reverse database

estimated peptide FDR (%)

proteins identified from the forward database

proteins identified from the reverse database

1.00 0.50 0.10 0.02 0.01

1209 1158 1091 1025 1026

275 118 21 5 1

1.00 0.50 0.10 0.02 0.01

1164 1137 1083 1024 1024

0 0 0 0 0

a similar FDR, the thresholds for which the highest number of peptides was identified were selected. 2.4. Codon Adaptation Index. The Codon Adaptation Index (CAI) was calculated with the EMBOSS package.32 First, the 200 most expressed proteins (in terms of number of peptide hits) were used to generate a codon usage table using the application cusp. The table was then used to calculate the CAI for each gene of the F.t. novicida genome with the application CAI. 2.5. Identification of Ribosome Binding Sites. To identify the likely ribosome binding site (RBS) sequences, a positionspecific weight matrix model was constructed from 96 published RBS sequences of Legionella pneumophila Lens, the closest annotated genome at the time. This model was then used to identify potential RBS sites 10–13 bp upstream of the start sites predicted by Glimmer 2.0. The highest scoring sequences found in the genome of Francisella were then used to construct a new position-specific weight matrix model. The entire U112 genome sequence was screened by this model to find all potential RBS sites not taking into account their location with respect to predicted start sites in U112. 2.6. Assignment to COG Categories. The proteins of the genome were assigned to a COG category based on their homology to models of the preformatted COG matrices database, assessed with the rps-blast algorithm and the CDD database.33 Only the first hit was considered and was selected only when the e-value was below 0.1. 2.7. Protein Cellular Localization Prediction. The cellular localization of proteins was predicted with PSORTb v2.0.34

3. Results and Discussion 3.1. High-Throughput Survey of the Expressed Proteins of F.t. novicida U112 Grown to the Exponential Phase. 3.1.1. Experimental Setup. Proteins were extracted from bacteria grown to midlogarithmic phase in the standard artificial medium at 37 °C and extracted from 15 independent cultures. In an attempt to maximize proteome coverage, proteins were fractionated following several different strategies (Experimental Section). For the same reason, while 14 of the samples were digested with trypsin, one sample was digested with Glu-C instead, and all analyzed by nano-HPLC tandem mass spectrometry (nLC-MS/MS). Peptide tandem mass spectra were acquired by nanoLC electrospray ionization tandem mass spectrometry (nLC-ESI-MS) on a linear ion trap-Fourier transform-ion cyclotron resonance mass spectrometer (LTQFT-ICR-MS). Each of the 15 samples was analyzed in quadruplicatewithgas-phasefractionationusedtoincreaseidentifications. 3.1.2. Identification of Peptides and Estimation of the False Discovery Rate (FDR). Multiple approaches have been used to identify peptides from tandem mass spectra (for review: see ref 35). Since we were interested in validating hypothetical proteins and identifying undetected coding regions, we desired a method that would circumvent other types of genomic 2018

Journal of Proteome Research • Vol. 7, No. 5, 2008

information (hence, unbiased with regard to coding region properties). Consequently, we estimated the false discovery rate by comparing the number of peptides identified from a search of the genome database to the number of peptides identified from a search of the decoy database, as described by Peng and colleagues.36 The genome database was made from translation of the coding regions of F.t. subsp. novicida predicted by annotation as well as the six frames translation of all intergenic regions (a total of 4099 protein sequences). These sequences were reversed to form the decoy database. Tandem mass spectra were matched to peptide sequences using the software SEQUEST (http://fields.scripps.edu/sequest/index.html). Hereafter, confident SEQUEST hits are referred to as identified peptides. SEQUEST provides two scores for each spectrum analyzed: X-corr and D-corr. Generally, the higher the scores, the more reliable the peptide prediction. For a given X-corr and D-corr, the FDR is estimated by the following formula: (peptides in decoy database)/(peptides in real database) × 100. The FDR must be determined independently for each of the 15 experiments. Hence, for a given FDR estimate, the X-corr and D-corr thresholds will be different for each experiment. Across the 15 experiments, over 3 × 106 spectra were examined. The total number of identified peptides varied from 3.0 × 105 for a FDR of 0.01% to 3.5 × 105 for a FDR of 0.1% and 4.3 × 105 for a FDR of 1%. To get a perspective on the peptide identifications, we compared these results with the results of the software PeptideProphet from the package ProteinProphet.30 Peptide Prophet identified from 20% less to 26% more peptides than the decoy database searching (average, 15% more; standard deviation, 11.5%) for a FDR ) 0.1%. On average, PeptideProphet identifies 90% of spectra identified by the method of decoy database searching. Since the results are similar, but because we were concerned about using genomic information to identify proteins, we chose to use the decoy database approach. 3.1.3. Identification of 1083 Expressed Proteins Using FDR ) 0.1% and a Cleavage Site Filter. Following a conservative approach, we considered a protein identified and thus expressed if at least two occurrences of a matching peptide could be identified from the forward database. It could be either the same peptide in two unique fractions or two unique peptides, possibly in the same fraction. Using this selection criterion, we identified the expressed proteins with several peptides FDR (Table 1: category “Not filtered for enzyme cleavage sites”). In the set of peptides identified from the decoy database meeting the X-corr and D-corr threshold requirements, several of them hit the same decoy protein sequence. Even with an FDR of ∼0.1%, 21 protein sequences in the decoy database were matched by several hits (up to 12). Thus, for the sampling of this analysis, when the peptides FDR is ∼0.1%, the approximate FDR for the expressed proteins is ∼2%.

Baseline Proteome of Francisella tularensis subsp. novicida U112

research articles

Figure 1. Percentage of the 395 proteins essential for the survival of U112 in similar conditions for which expression was detected in the present survey relative to their predicted cellular localization.

Ultimately, the tolerance for false positives depends on the endusage of the data. Using a protein FDR of ∼2%, we may falsely predict the expression of up to 82 regions out of the 4099 in the database. Since one of the goals of this study is to validate annotation based on bioinformatics predictions, we sought to lower the estimated FDR of expressed proteins, by only considering the peptides containing at least one cleavage site (Trypsin, lysine or arginine; or Glu-C, glutamic or aspartic acid). No protein sequence in the decoy database was matched by more than one spectrum (Table 1). All told, this resulted in very few peptides being filtered out (i.e., 584 out of 3.53 × 105 for a FDR of 0.1%). A peptide FDR of ∼1% yields 81 (7%) expressed proteins more than the FDR of ∼0.1%. However, detection of only 259 out of 8 × 104 (0.3%) new peptide hits accounted for that difference, while the other additional peptides identified with a FDR of ∼1% are localized in regions that were already determined as coding with a FDR of ∼0.1%. Seeking to validate bioinformatics predictions, we conjectured that use of a FDR of ∼1% might not be stringent enough. For a FDR of ∼1%, 4000 false identifications are expected, and among them, 480 peptides with a cleavage site, because lysine and arginine (trypsin cleavage sites), represent nearly 12% of the amino acids in the database. Seeking unarguable protein identification that could result from a sampling bias, we therefore chose to use the data generated with a peptide FDR of ∼0.1% which resulted in 3.53 × 105 peptides identified after filtering. These peptide hits gave rise to detection of 1083 expressed proteins (listed in Supplemental Table 1 in Supporting Information, details provided in the F.t. novicida U112 genome browser at www.francisella.org) which represents 63.0% of the predicted proteome. 3.2. Analysis of the Expressed Proteome Identified in the Survey. 3.2.1. Assessing the Coverage of the Experiment Reveals a Bias against Membrane Proteins. The analysis of the peptides generated above determined that 63.0% of the predicted proteome of F.t. novicida was expressed in rich conditions. It is not clear however whether the 63% represent all the

proteins expressed or whether a portion of the expressed proteins went undetected. Therefore, we attempted to estimate the coverage of the experiment. Out of 59 predicted ribosomal proteins, 55 were detected by this assay (93%). Similarly to what has been observed by others, the undetected ribosomal proteins were very small from 37 to 66 amino acids long. Such small molecular weight proteins have generally fewer proteolytic cleavage sites and fewer peptides than larger proteins, both of which bias the method against detection of small protein.4 To further the investigation of the experimental coverage, we examined the expression of the 395 proteins essential for the survival of U112 in similar conditions as determined by testing an insertion mutant library.37 As these proteins are essential, we expect them to be expressed, but could detect only 77.8% of them (Figure 1). There seems to be an experimental bias against membrane proteins. While 88% of the essential proteins predicted by PSORTb34 to localize in the cytoplasm are detected, only 44% of the essential proteins predicted in the inner membrane are detected. Similarly to the cytoplasmic proteins, 100% of the proteins predicted in the periplasm, in the outer membrane or extracellular are detected. This bias has already been observed in similar high-throughput studies.4,5 If the coverage of the essential genes reflects the overall coverage of the expressed proteome, this experimental coverage could be approximately 80%. However, the coverage for proteins that do not play an essential role in the bacteria may be lower. Our experiment detected the expression of 63.0% of the predicted proteome (1083 proteins). If these 1083 proteins represent at most 80% of the proteins actually expressed, the total of proteins actually expressed may account for over 70% of the proteins predicted by the annotation. 3.2.2. The Survey Validates 250 of the 500 Genes Encoding Hypothetical Proteins (50.0%) and Identifies Two New Potentially Coding ORFs. With the exception of two open reading frames (see below), all peptides matched a predicted coding sequence. This suggests that the software (Glimmer v2.0) used to predict the coding regions was extremely efficient Journal of Proteome Research • Vol. 7, No. 5, 2008 2019

research articles on this A-T rich genome. This is not unexpected since the density of stop codons in the noncoding sequences is high with one every 70 bp on average. Reliable peptides were detected for 50% of the genes encoding hypothetical proteins. The proportion of expressed novel hypothetical proteins (i.e., no homologues exist in other bacteria) is notably different from the proportion of expressed conserved hypothetical proteins. Conserved hypothetical proteins are expressed at 58.7%, a proportion close to the rest of the proteome at 63.0%, but only 46.2% of the novel hypothetical proteins are. Possibly, the software predicting coding regions may have predicted coding sequences in regions where there were none. Alternatively, these novel hypothetical proteins for which no homologues exist in other bacteria may play a restricted and targeted role in the life cycle and may hence be less likely to be expressed in standard rich medium. Some of the peptides in the survey match two previously unidentified coding regions which are not part of the published annotation (NC_008601). We offer here several arguments suggesting that these peptides (and their parent proteins) are not false positives. In the first region (NC_008601: 485459-485707), three distinct peptides were identified across the 15 experiments (25 hits in eight experiments). Furthermore, the transcription of this region was demonstrated (Dr. Lawrence Gallagher, personal communication) using transcriptional fusion generated by transposon insertion. Directly upstream of this region, a 34 codon long CDS on another frame in the same direction is predicted, with a very well-defined ribosomal binding site (NC_008601: 485416-485517). Hence, peptides observed from this region could result from some translational frameshift occurring over the two ORFs, perhaps by chance. A similar hypothesis has already been put forward to explain similar observations in S. oneidensis.14 Likewise, the same translational frameshift hypothesis could be made for the other region (NC_008601: 15587-15462) for which 20 hits were observed in four distinct experiments. The translation of this region could be due to a possible ribosomal binding site detected closely upstream and that could initially control the translation a small ORF in another frame from 15 630 to 15 635. The function of these regions remains unknown. They do not share homology with any protein sequence currently available in public databases. 3.2.3. Little Evidence of a Genetic Signature in Expressed Genes Sequences. We have determined that at least 63.0% of the proteome is expressed in rich medium. As pictured by the rather regular alternation between expressed (white) and nonexpressed (gray) proteins in Supplemental Table 1 in Supporting Information, it does not appear that any large region of the genome is silent. Determining more subtle differences is rendered hazardous by the experimental bias against the membrane and small proteins. We could identify no correlation between the transcriptional direction of a gene and its expression (corr ) 0.007). To assess whether the expressed portion of the proteome exhibits typical genomic features as opposed to the nonexpressed portion, we investigated the codon usage of the cytoplasmic proteins expressed (408) and not expressed (120), as well as their predicted Shine-Dalgarno sequence. The analysis was restricted to the cytoplasmic proteins since the expressed proteome coverage of this cellular fraction is expected to be higher than the overall coverage. The genome of F. tularensis encodes 30 tRNAs. They cover all possible amino acids, but obviously, many codons are not 2020

Journal of Proteome Research • Vol. 7, No. 5, 2008

Rohmer et al. perfectly represented. To test whether there is any difference in codon composition between expressed proteins and nonexpressed proteins, we calculated the Codon Adaptation Index (CAI) that measures the synonymous codon usage bias. No significant difference could be found between the CAI distribution of expressed cytoplasmic proteins (mean CAI, 0.693; standard deviation, 0.037) and nonexpressed cytoplasmic proteins (mean CAI, 0.682; standard deviation, 0.046). In the genome of U112, 848 ribosomal binding sites (RBS) were identified. Among multiple variations, the most recurrent is based on AGGA[GA][AT]. Out of 528 predicted cytoplasmic proteins, 287 were encoded by a gene with an obvious ribosomal binding site (54.3%). The distribution of RBS may differ slightly between expressed and nonexpressed genes: a RBS is detected for 35% of the genes not expressed and for 60% of the expressed genes. We investigated whether the specific sequence of the RBS was important for the control of translation by examining the distribution of the RBS sequence variants among expressed and nonexpressed genes. The number of expressed and nonexpressed proteins under control of one given RBS sequence variant is only partially correlated (r ) 0.69). This suggests that some RBS sequences may favor expression over others. In conclusion, while the RBS is not a decisive factor of the expression of a gene, it may play a role in the regulation of protein translation in F. tularensis. If these findings are accurate, they suggest that the control of translation is, to some extent, taking place at the initiation level, while no control could be detected in the elongation phase. This is contrary to findings in other bacterial species where the elongation phase seemed to play a role more important than the initiation phase.14,18,35 So it may be that each bacterial species developed its own distinct strategy of control. Alternatively, another reason could explain that the codon usage appears different in these bacteria between expressed and nonexpressed proteins. For example, genes that have been acquired recently by horizontal transfer (likely to differ in codon usage) may have been fixed in a genome because they provide the ability to adapt to a specific environment;38 hence, they may not be expressed in the standard conditions in which the proteome surveys were performed. 3.2.4. Distinct Types of Biological Processes Show Different Protein Profiles. When possible, predicted proteins were assigned to COG categories (Experimental Section). The proportion of expressed proteins in each COG category is shown in Figure 2a. Because of the survey’s bias against membrane proteins, these numbers may not accurately reflect the proportion of proteins expressed in each category. For example, in the category ‘Amino acid transport and metabolism’, the proteins involved in the transport of the amino acids are likely to be localized in the membrane, while the biosynthesis of amino acids likely involves cytoplasmic proteins. Considering the bias against membrane proteins observed in this study and by others using the same or similar techniques, the COG categories may not be the best fit to identify a significant difference in the number of proteins expressed across different functions. During the annotation of the genome of U112, we had created a distinct classification and genes were manually assigned to categories.39 This classification attempted to differentiate between functions that are involved in core cellular mechanisms such as metabolisms, and functions expected to be involved in the interaction with or in response to the environment such as cell motility, transport, and signal transduction.39 Figure 2b shows the distribution of expressed

Baseline Proteome of Francisella tularensis subsp. novicida U112

research articles

Figure 2. (a) Proportion of proteins for which expression is observed in each COG functional category. Dark gray columns represent the proportion of predicted cytoplasmic proteins for which expression is detected. Light gray columns represent the proportion of expressed proteins in all cell locations. The COG categories are E, Amino acid transport and metabolism; G, Carbohydrate transport and metabolism; D, Cell division and chromosome partitioning; M, Cell envelope biogenesis, outer membrane; N, Cell motility and secretion; H, Coenzyme metabolism; V, Defense mechanisms; L, DNA replication, recombination, and repair; C, Energy production and conversion; S, Function unknown; R, General function prediction only; P, Inorganic ion transport and metabolism; U, Intracellular trafficking and secretion; I, Lipid metabolism; F, Nucleotide transport and metabolism; O, Post-translational modification, protein turnover, chaperones; A, RNA processing and modification; Q, Secondary metabolites biosynthesis, transport, and catabolism; T, Signal transduction mechanisms; K, Transcription; J, Translation, ribosomal structure and biogenesis. (b) Proportion of expressed cytoplasmic proteins in the custom-made functional categories. In blue are represented the categories involved in interaction of the bacteria with the environment (E01, cell wall/LPS/capsule; E02, motility, attachment and secretion structure; E03, signal transduction and regulation; U01, hypothetical; U02, uncharacterized function). In yellow are represented the categories involved in metabolism (M01, amino acid metabolism; M02, carbohydrate metabolism; M03, cell cycle; M04, cofactors, prosthetic groups, electron carriers metabolism; M05, DNA replication, recombination, modification and repair; M06, energy metabolism; M07, fatty acids and lipids metabolism; M08, nucleotides and nucleosides metabolism; M09, other metabolism; M10, post-translational modification, protein turnover, chaperones; M11, transcription; M12, translation, ribosomal structure and biogenesis). Journal of Proteome Research • Vol. 7, No. 5, 2008 2021

research articles

Rohmer et al.

Table 2. Comparison or the Genome and Expressed Proteome of Seven Selected Organisms

bacteria species

total number of protein coding genesa

number of proteins expressed

percentage of proteins expressed

number of genes in signal transduction (COG)a

proportion of genes in signal transduction

size of the genome

source

Mycoplasma pneumoniae Haemophilus influenzae Francisella tularensis Desulfovibrio vulgaris Escherichia coli Shewanella oneidensis Rhodopseudomonas palustris

689 1657 1720 3379 4254 4318 4836

557 1055 1083 796 1147 797 915b

81.00% 63.67% 62.97% 23.56% 26.96% 18.46% 18.92%

6 37 49 256 158 243 240

0.87% 2.23% 2.85% 7.58% 3.71% 5.63% 4.96%

0.81 Mb 1.83 Mb 1.91 Mb 3.57 Mb 4.64 Mb 4.97 Mb 5.46 Mb

11 4 this work 16 5 9 6

a

From the NCBI Web site.

b

Average highly similar numbers obtained from different physiological conditions.

cytoplasmic proteins in these manually curated categories. We excluded the ‘transport’ category because only seven proteins out of 198 were predicted cytoplasmic and proteins in the “mobile elements category” because there were only nine proteins in this category. The proteins involved in central metabolism were expressed in great majority (over 80% of the proteins in these categories). Two categories with genes partaking in core cellular mechanisms show a lesser proportion of expressed proteins: “cofactors, prosthetic groups, electron carriers metabolism” (79%), and “DNA replication, recombination, modification and repair” (73%). This can be moderated by the fact that these two categories are the most likely to encompass genes responding to changes in the environment (stress resulting in DNA damage for example). On the other hand, categories not involved in core mechanisms contain relatively few, between 46% and 80%, expressed proteins. At 80% expression, the category “ cell wall/ LPS/capsule” is known to also contain genes involved in core mechanisms like cell structure. Interestingly, the category most involved in interaction with the environment “signal transduction and transcriptional regulation” is the category with the fewest proteins expressed (46%). It is not surprising that a difference is observed between core metabolism and response to environment, but this high-throughput approach offers a global view on the respective extent of the regulation of these distinct functions in the bacteria life cycle. 3.3. Increase of Genome Size and Number of Genes Involved in Response to Changes May Not Result in the Increase of the Number of Proteins Expressed in a Stable Growth Condition. While the number of genes in different protein superfamilies is constant or linearly associated with genome size, superfamilies involved in signal transduction and regulation (e.g., transcriptional regulators, two-component systems) seem to have a quadratic relationship with genome size.40,41 Hence, as the genome size increases, the total number of genes in proportion to the number of genes involved in signal transduction/regulation gets lower. This observation could lead to two alternative implications regarding how the regulation of proteins expression is managed in bacterial cells. First, the proportion of genes regulated by a one regulating unit decreases as genome size increases. Second, a “core” set of genes is regulated by a similar amount of regulators in all bacteria and the additional genes are regulated in smaller sets by the additional regulators, in response to specific signals from the environment. To discern which alternative approached the most reality, we investigated the proportion of expressed bacteria in bacteria of different genome sizes obtained in similar conditions (stable standard laboratory condition) as in our survey. If the number of expressed proteins increases 2022

Journal of Proteome Research • Vol. 7, No. 5, 2008

proportionally to genome size, the number of regulators regulating genes in one stable condition increases. If the number of proteins remains stable, no matter the genome size, the results of this experimental approach would provide support to the hypothesis that a large set of genes involved in core cellular mechanisms is regulated by few regulators, while the functions involved in response to the environment are controlled in smaller genes sets. For comparison, we used results of the present survey and from similar surveys of six different organisms (Table 2). These surveys were chosen because they were closest to our survey in terms of methodology, growth in similar conditions (standard laboratory condition). Most of these publications reported surveys for other conditions as well. Even though other very valuable surveys of bacterial proteomes have been published, we avoided using them when we could not be certain we were comparing a similar data set, for example, when we were not able to infer the number of proteins expressed in the standard condition from the numbers given for pooled conditions.3,42,43 Despite obvious inaccuracies, we made the assumption that all data sets were generated with a similar efficiency and suffered similar biases some of which were reported in many of the publications and could thus be compared at a very broad level. Doing so, we found that no matter how many genes were predicted in the genome investigated, around 1000 proteins were always detected (Table 2). Several arguments suggest that this stable number does not result from mere technical limitations. First, protein surveys of more complex organisms using the same methodology yield more proteins, as observed by others 36,44 and by us with the same laboratory equipment. In addition, the authors of the compared surveys reported similar expressed proteome coverage where the most common assessment was the number of ribosomal proteins identified. Not surprisingly, if the number of expressed proteins does not change, the proportion of expressed genes is negatively correlated to the number of proteins in the COG category “signal transduction” (r ) -0.96). The COG classification system makes it difficult to compare the number of transcriptional regulators since they are assigned to the same category as the genes encoding the transcriptional machinery. However, the relationship between genome size and number of transcriptional regulators is very similar to that of signal transduction genes.40 This suggests that the increase in signal transduction genes and by analogy, in transcriptional regulators, results in little or no more proteins expressed in a given stable environment. This could support the hypothesis that a significant portion of the central cell functions are controlled by very few regulators, and that all functions that are environment-

Baseline Proteome of Francisella tularensis subsp. novicida U112 specific are tightly controlled by a series of regulators, each of which targets only a small number of genes.

(8)

4. Concluding Remarks This paper presents the protein profile of F.t. subsp. novicida grown under standard rich medium conditions. This allowed a partial validation of the genome annotation, and offered insight on the biochemical functions utilized by this organism under growth in rich medium. Most proteins involved in core metabolisms were expressed, but beyond these, far fewer proteins playing a role in the response to or interaction with the environment were observed. Proteins are the end results of gene expression and thus are significant indicators of gene regulation. When the relationship between genome size and number of proteins expressed in a stable condition was explored in various bacteria including our data, it appeared that the number of proteins expressed remains similar independent of genome size and proteins involved in regulation. These observations may contribute to the investigation of regulatory networks in proteobacteria.

Acknowledgment. This study was funded by the NIAID award for the Northwest RCE (NWRCE), grant U54AIO57141. The authors thank Dr. Brook Nunn for help in manuscript preparation.

(9)

(10)

(11) (12)

(13)

Supporting Information Available: Supplemental Table 1: List of proteins encoded in the genome of F.t. novicida U112 sorted by genome location and the total number of hits (number of spectra) matching each protein over the 15 experiments. Proteins for which expression was not observed are shaded in gray. The functional categories were developed and assigned during the annotation of the genome of U112. This material is available free of charge via the Internet at http:// pubs.acs.org.

(14)

References

(17)

(1) Janovska, S.; Pavkova, I.; Hubalek, M.; Lenco, J.; Macela, A.; Stulik, J. Identification of immunoreactive antigens in membrane proteins enriched fraction from Francisella tularensis LVS. Immunol. Lett. 2007, 108 (2), 151–9. (2) Pavkova, I.; Reichelova, M.; Larsson, P.; Hubalek, M.; Vackova, J.; Forsberg, A.; Stulik, J. Comparative proteome analysis of fractions enriched for membrane-associated proteins from Francisella tularensis subsp. tularensis and F. tularensis subsp. holarctica strains. J. Proteome Res. 2006, 5 (11), 3125–34. (3) Wang, R.; Prince, J. T.; Marcotte, E. M. Mass spectrometry of the M. smegmatis proteome: protein expression levels correlate with function, operons, and codon bias. Genome Res. 2005, 15 (8), 1118– 26. (4) Kolker, E.; Purvine, S.; Galperin, M. Y.; Stolyar, S.; Goodlett, D. R.; Nesvizhskii, A. I.; Keller, A.; Xie, T.; Eng, J. K.; Yi, E.; Hood, L.; Picone, A. F.; Cherny, T.; Tjaden, B. C.; Siegel, A. F.; Reilly, T. J.; Makarova, K. S.; Palsson, B. O.; Smith, A. L. Initial proteome analysis of model microorganism Haemophilus influenzae strain Rd KW20. J. Bacteriol. 2003, 185 (15), 4593–602. (5) Corbin, R. W.; Paliy, O.; Yang, F.; Shabanowitz, J.; Platt, M.; Lyons, C. E., Jr.; Root, K.; McAuliffe, J.; Jordan, M. I.; Kustu, S.; Soupene, E.; Hunt, D. F. Toward a protein profile of Escherichia coli: comparison to its transcription profile. Proc. Natl. Acad. Sci. U.S.A. 2003, 100 (16), 9232–7. (6) VerBerkmoes, N. C.; Shah, M. B.; Lankford, P. K.; Pelletier, D. A.; Strader, M. B.; Tabb, D. L.; McDonald, W. H.; Barton, J. W.; Hurst, G. B.; Hauser, L.; Davison, B. H.; Beatty, J. T.; Harwood, C. S.; Tabita, F. R.; Hettich, R. L.; Larimer, F. W. Determination and comparison of the baseline proteomes of the versatile microbe Rhodopseudomonas palustris under its major metabolic states. J. Proteome Res. 2006, 5 (2), 287–98. (7) Eymann, C.; Dreisbach, A.; Albrecht, D.; Bernhardt, J.; Becher, D.; Gentner, S.; Tam le, T.; Buttner, K.; Buurman, G.; Scharf, C.; Venz,

(15)

(16)

(18)

(19)

(20)

(21)

(22)

(23)

(24)

research articles

S.; Volker, U.; Hecker, M. A comprehensive proteome map of growing Bacillus subtilis cells. Proteomics 2004, 4 (10), 2849–76. Tanner, S.; Shen, Z.; Ng, J.; Florea, L.; Guigo, R.; Briggs, S. P.; Bafna, V. Improving gene annotation using peptide mass spectrometry. Genome Res. 2007, 17 (2), 231–9. Elias, D. A.; Monroe, M. E.; Smith, R. D.; Fredrickson, J. K.; Lipton, M. S. Confirmation of the expression of a large set of conserved hypothetical proteins in Shewanella oneidensis MR-1. J. Microbiol. Methods 2006, 66 (2), 223–33. Kolker, E.; Picone, A. F.; Galperin, M. Y.; Romine, M. F.; Higdon, R.; Makarova, K. S.; Kolker, N.; Anderson, G. A.; Qiu, X.; Auberry, K. J.; Babnigg, G.; Beliaev, A. S.; Edlefsen, P.; Elias, D. A.; Gorby, Y. A.; Holzman, T.; Klappenbach, J. A.; Konstantinidis, K. T.; Land, M. L.; Lipton, M. S.; McCue, L. A.; Monroe, M.; Pasa-Tolic, L.; Pinchuk, G.; Purvine, S.; Serres, M. H.; Tsapin, S.; Zakrajsek, B. A.; Zhu, W.; Zhou, J.; Larimer, F. W.; Lawrence, C. E.; Riley, M.; Collart, F. R.; Yates, J. R., III; Smith, R. D.; Giometti, C. S.; Nealson, K. H.; Fredrickson, J. K.; Tiedje, J. M. Global profiling of Shewanella oneidensis MR-1: expression of hypothetical genes and improved functional annotations. Proc. Natl. Acad. Sci. U.S.A. 2005, 102 (6), 2099–104. Jaffe, J. D.; Berg, H. C.; Church, G. M. Proteogenomic mapping as a complementary method to perform genome annotation. Proteomics 2004, 4 (1), 59–77. Zhang, W.; Culley, D. E.; Gritsenko, M. A.; Moore, R. J.; Nie, L.; Scholten, J. C.; Petritis, K.; Strittmatter, E. F.; Camp, D. G., II; Smith, R. D.; Brockman, F. J. LC-MS/MS based proteomic analysis and functional inference of hypothetical proteins in Desulfovibrio vulgaris. Biochem. Biophys. Res. Commun. 2006, 349 (4), 1412–9. Romine, M. F.; Elias, D. A.; Monroe, M. E.; Auberry, K.; Fang, R.; Fredrickson, J. K.; Anderson, G. A.; Smith, R. D.; Lipton, M. S. Validation of Shewanella oneidensis MR-1 small proteins by AMT tag-based proteome analysis. OMICS 2004, 8 (3), 239–54. Gupta, N.; Tanner, S.; Jaitly, N.; Adkins, J. N.; Lipton, M.; Edwards, R.; Romine, M.; Osterman, A.; Bafna, V.; Smith, R. D.; Pevzner, P. A. Whole proteome analysis of post-translational modifications: Applications of mass-spectrometry for proteogenomic annotation. Genome Res. 2007, 17 (9), 1362–77. Rison, S. C.; Mattow, J.; Jungblut, P. R.; Stoker, N. G. Experimental determination of translational starts using peptide mass mapping and tandem mass spectrometry within the proteome of Mycobacterium tuberculosis. Microbiology 2007, 153 (Pt 2), 521–8. Zhang, W.; Gritsenko, M. A.; Moore, R. J.; Culley, D. E.; Nie, L.; Petritis, K.; Strittmatter, E. F.; Camp, D. G., 2nd; Smith, R. D.; Brockman, F. J. A proteomic view of Desulfovibrio vulgaris metabolism as determined by liquid chromatography coupled with tandem mass spectrometry. Proteomics 2006, 6 (15), 4286–99. Nie, L.; Wu, G.; Brockman, F. J.; Zhang, W. Integrated analysis of transcriptomic and proteomic data of Desulfovibrio vulgaris: zeroinflated Poisson regression models to predict abundance of undetected proteins. Bioinformatics 2006, 22 (13), 1641–7. Nie, L.; Wu, G.; Zhang, W. Correlation of mRNA expression and protein abundance affected by multiple sequence features related to translational efficiency in Desulfovibrio vulgaris: a quantitative analysis. Genetics 2006, 174 (4), 2229–43. Hubalek, M.; Hernychova, L.; Brychta, M.; Lenco, J.; Zechovska, J.; Stulik, J. Comparative proteome analysis of cellular proteins extracted from highly virulent Francisella tularensis ssp. tularensis and less virulent F. tularensis ssp. holarctica and F. tularensis ssp. mediaasiatica. Proteomics 2004, 4 (10), 3048–60. Twine, S. M.; Mykytczuk, N. C.; Petit, M.; Tremblay, T. L.; Lanthier, P.; Conlan, J. W.; Kelly, J. F. Francisella tularensis proteome: low levels of ASB-14 facilitate the visualization of membrane proteins in total protein extracts. J. Proteome Res. 2005, 4 (5), 1848–54. Twine, S. M.; Mykytczuk, N. C.; Petit, M. D.; Shen, H.; Sjostedt, A.; Wayne Conlan, J.; Kelly, J. F. In vivo proteomic analysis of the intracellular bacterial pathogen, Francisella tularensis, isolated from mouse spleen. Biochem. Biophys. Res. Commun. 2006, 345 (4), 1621–33. Lenco, J.; Pavkova, I.; Hubalek, M.; Stulik, J. Insights into the oxidative stress response in Francisella tularensis LVS and its mutant DeltaiglC1 + 2 by proteomics analysis. FEMS Microbiol. Lett. 2005, 246 (1), 47–54. Lenco, J.; Hubalek, M.; Larsson, P.; Fucikova, A.; Brychta, M.; Macela, A.; Stulik, J. Proteomics analysis of the Francisella tularensis LVS response to iron restriction: induction of the F. tularensis pathogenicity island proteins IglABC. FEMS Microbiol. Lett. 2007, 269 (1), 11–21. Guina, T.; Radulovic, D.; Bahrami, A. J.; Bolton, D. L.; Rohmer, L.; Jones-Isaac, K. A.; Chen, J.; Gallagher, L. A.; Gallis, B.; Ryu, S.; Taylor, G. K.; Brittnacher, M. J.; Manoil, C.; Goodlett, D. R. MglA

Journal of Proteome Research • Vol. 7, No. 5, 2008 2023

research articles

(25)

(26)

(27)

(28)

(29)

(30)

(31)

(32) (33)

(34)

2024

regulates Francisella tularensis subsp. novicida (Francisella novicida) response to starvation and oxidative stress. J. Bacteriol. 2007, 189 (18), 6580–6. Havlasova, J.; Hernychova, L.; Halada, P.; Pellantova, V.; Krejsek, J.; Stulik, J.; Macela, A.; Jungblut, P. R.; Larsson, P.; Forsman, M. Mapping of immunoreactive antigens of Francisella tularensis live vaccine strain. Proteomics 2002, 2 (7), 857–67. Eyles, J. E.; Unal, B.; Hartley, M. G.; Newstead, S. L.; Flick-Smith, H.; Prior, J. L.; Oyston, P. C.; Randall, A.; Mu, Y.; Hirst, S.; Molina, D. M.; Davies, D. H.; Milne, T.; Griffin, K. F.; Baldi, P.; Titball, R. W.; Felgner, P. L. Immunodominant Francisella tularensis antigens identified using proteome microarray. Crown Copyright 2007 Dstl. Proteomics 2007, 7 (13), 2172–83. Yi, E. C.; Marelli, M.; Lee, H.; Purvine, S. O.; Aebersold, R.; Aitchison, J. D.; Goodlett, D. R. Approaching complete peroxisome characterization by gas-phase fractionation. Electrophoresis 2002, 23 (18), 3205–16. Osborn, M. J.; Gander, J. E.; Parisi, E.; Carson, J. Mechinism of assembly of the outer membrane of Salmonella typhimurium. Isolation and characterization of cytoplasmic and outer membrane. J. Biol. Chem. 1972, 247 (12), 3962–72. Nunn, B. L.; Shaffer, S. A.; Scherl, A.; Gallis, B.; Wu, M.; Miller, S. I.; Goodlett, D. R. Comparison of a Salmonella typhimurium proteome defined by shotgun proteomics directly on an LTQ-FT and by proteome pre-fractionation on an LCQ-DUO. Briefings Funct. Genomics Proteomics 2006, 5 (2), 154–68. Keller, A.; Nesvizhskii, A. I.; Kolker, E.; Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 2002, 74 (20), 5383–92. Scherl, A.; Shaffer, S. A.; Taylor, G. K.; Kulasekara, H. D.; Miller, S. I.; Goodlett, D. R. Genome-specific gas-phase fractionation strategy for improved shotgun proteomic profiling of proteotypic peptides. Anal. Chem. 2008, 80 (4), 1182–91. Rice, P.; Longden, I.; Bleasby, A. EMBOSS: the European molecular biology open software suite. Trends Genet. 2000, 16 (6), 276–7. Marchler-Bauer, A.; Anderson, J. B.; DeWeese-Scott, C.; Fedorova, N. D.; Geer, L. Y.; He, S.; Hurwitz, D. I.; Jackson, J. D.; Jacobs, A. R.; Lanczycki, C. J.; Liebert, C. A.; Liu, C.; Madej, T.; Marchler, G. H.; Mazumder, R.; Nikolskaya, A. N.; Panchenko, A. R.; Rao, B. S.; Shoemaker, B. A.; Simonyan, V.; Song, J. S.; Thiessen, P. A.; Vasudevan, S.; Wang, Y.; Yamashita, R. A.; Yin, J. J.; Bryant, S. H. CDD: a curated Entrez database of conserved domain alignments. Nucleic Acids Res. 2003, 31 (1), 383–7. Gardy, J. L.; Laird, M. R.; Chen, F.; Rey, S.; Walsh, C. J.; Ester, M.; Brinkman, F. S. PSORTb v.2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis. Bioinformatics 2005, 21 (5), 617–23.

Journal of Proteome Research • Vol. 7, No. 5, 2008

Rohmer et al. (35) Marcotte, E. M. How do shotgun proteomics algorithms identify proteins. Nat. Biotechnol. 2007, 25 (7), 755–7. (36) Peng, J.; Elias, J. E.; Thoreen, C. C.; Licklider, L. J.; Gygi, S. P. Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. J. Proteome Res. 2003, 2 (1), 43–50. (37) Gallagher, L. A.; Ramage, E.; Jacobs, M. A.; Kaul, R.; Brittnacher, M.; Manoil, C. A comprehensive transposon mutant library of Francisella novicida, a bioweapon surrogate. Proc. Natl. Acad. Sci. U.S.A. 2007, 104 (3), 1009–14. (38) Koonin, E. V.; Makarova, K. S.; Aravind, L. Horizontal gene transfer in prokaryotes: quantification and classification. Annu. Rev. Microbiol. 2001, 55, 709–42. (39) Rohmer, L.; Fong, C.; Abmayr, S.; Wasnick, M.; Larson Freeman, T. J.; Radey, M.; Guina, T.; Svensson, K.; Hayden, H. S.; Jacobs, M.; Gallagher, L. A.; Manoil, C.; Ernst, R. K.; Drees, B.; Buckley, D.; Haugen, E.; Bovee, D.; Zhou, Y.; Chang, J.; Levy, R.; Lim, R.; Gillett, W.; Guenthener, D.; Kang, A.; Shaffer, S. A.; Taylor, G.; Chen, J.; Gallis, B.; D’Argenio, D. A.; Forsman, M.; Olson, M. V.; Goodlett, D. R.; Kaul, R.; Miller, S. I.; Brittnacher, M. J. Comparison of Francisella tularensis genomes reveals evolutionary events associated with the emergence of human pathogenic strains. GenomeBiology 2007, 8 (6), R102. (40) Ranea, J. A.; Buchan, D. W.; Thornton, J. M.; Orengo, C. A. Evolution of protein superfamilies and bacterial genome size. J. Mol. Biol. 2004, 336 (4), 871–87. (41) van Nimwegen, E. Scaling laws in the functional content of genomes. Trends Genet. 2003, 19 (9), 479–84. (42) Lipton, M. S.; Pasa-Tolic, L.; Anderson, G. A.; Anderson, D. J.; Auberry, D. L.; Battista, J. R.; Daly, M. J.; Fredrickson, J.; Hixson, K. K.; Kostandarithes, H.; Masselon, C.; Markillie, L. M.; Moore, R. J.; Romine, M. F.; Shen, Y.; Stritmatter, E.; Tolic, N.; Udseth, H. R.; Venkateswaran, A.; Wong, K. K.; Zhao, R.; Smith, R. D. Global analysis of the Deinococcus radiodurans proteome by using accurate mass tags. Proc. Natl. Acad. Sci. U.S.A. 2002, 99 (17), 11049–54. (43) Guina, T.; Wu, M.; Miller, S. I.; Purvine, S. O.; Yi, E. C.; Eng, J.; Goodlett, D. R.; Aebersold, R.; Ernst, R. K.; Lee, K. A. Proteomic analysis of Pseudomonas aeruginosa grown under magnesium limitation. J. Am. Soc. Mass Spectrom. 2003, 14 (7), 742–51. (44) Desiere, F.; Deutsch, E. W.; Nesvizhskii, A. I.; Mallick, P.; King, N. L.; Eng, J. K.; Aderem, A.; Boyle, R.; Brunner, E.; Donohoe, S.; Fausto, N.; Hafen, E.; Hood, L.; Katze, M. G.; Kennedy, K. A.; Kregenow, F.; Lee, H.; Lin, B.; Martin, D.; Ranish, J. A.; Rawlings, D. J.; Samelson, L. E.; Shiio, Y.; Watts, J. D.; Wollscheid, B.; Wright, M. E.; Yan, W.; Yang, L.; Yi, E. C.; Zhang, H.; Aebersold, R. Integration with the human genome of peptide sequences obtained by highthroughput mass spectrometry. GenomeBiology 2005, 6 (1), R9.

PR700760Z