Top-Down Proteomic Analysis of the Soluble Sub-Proteome of the

Feb 24, 2006 - Synopsis. We report the first analysis of the soluble sub-proteome of the obligate thermophile, Geobacillus thermoleovorans T80, utiliz...
0 downloads 7 Views 125KB Size
Top-Down Proteomic Analysis of the Soluble Sub-Proteome of the Obligate Thermophile, Geobacillus thermoleovorans T80: Insights into Its Cellular Processes Robert Leslie James Graham,* Catherine E. Pollock, Nigel G. Ternan, and Geoff McMullan School of Biomedical Sciences, University of Ulster, Coleraine, County Londonderry, BT52 1SA, United Kingdom Received December 15, 2005

We report the first analysis of the soluble sub-proteome of the obligate thermophile, Geobacillus thermoleovorans T80, utilizing a robust multidimensional protein identification protocol. A total of 1336 proteins were initially identified utilizing automated MS/MS identification software. Intensive manual curation resulted in a final list containing a total of 294 unique proteins. Physiochemical characterization and functional classification of the soluble sub-proteome was carried out. The strategy has allowed us to gain an insight into the cellular processes of this obligate thermophile, identifying a variety of proteins known to play a role in stress response. Included within these were a number of sigma factors such as σA that initiate transcription of the heat shock operons controlled by the HrcA-CIRCE complex within gram positive bacteria. In addition, it has enabled us to assign a degree of functionality to 29 out of 36 gene products detected in this study that were hitherto described as being only hypothetical conserved proteins. Keywords: proteomics • 2D LC-MS/MS • Geobacillus thermoleovorans • thermophile

Introduction The gathering of genomic information from a plethora of bacterial and archaeal species continues apace. One group of bacteria particularly well represented is the Bacillaceae with the Comprehensive Microbial Resource database at The Institute for Genomic Research (www.tigr.org) giving details on the completed genome sequencing projects for 20 members of this family, including: Bacillus anthracis (9 strains); B. cereus (3 strains); B. licheniformis (2 strains); B. subtilis; and B. thuringiensis konkukian. While these bacterial species represent a number of organisms of major importance in medical and food research1,2 it is perhaps within the genomes of extremophilic bacilli that discoveries of significance for biotechnology and evolutionary theory will be made. Genome sequences have been completed by both commercial and government agencies within Japan for alkaliphilic and halophilic organisms represented by Oceanobacillus iheyensis HTE831, B. clausii KSMK16, and B. halodurans C-125.3,4 Of particular interest to us however lies within the thermophilic bacilli represented by organisms such as Geobacillus kaustophilus HTA426.5 Certain thermophilic aerobic spore-forming bacteria with growth optima in the range 45 to 70 °C were known to be classified into the genera Alicyclobacillus, Brevibacillus, Aneurinibacillus, Sulfobacillus, Thermoactinomyces, and Thermobacillus.6-8 Molecular analysis, however, showed that the majority of such thermophilic bacteria described in the litera* To whom correspondence should be addressed. Tel: +44(0)2870 323227. Fax: +44(0)2870 324965. E-mail: [email protected].

822

Journal of Proteome Research 2006, 5, 822-828

Published on Web 02/24/2006

ture belonged to the genus Bacillus genetic groups 1 and 5.9,10 Subsequently, group 5 isolates were found to be a phenotypically and phylogenetically coherent group of thermophilic bacilli with a high 16S rDNA sequence similarity (98.5-99.2%).11 As a consequence of this work, the thermophilic bacteria belonging to Bacillus genetic group 5 were reclassified in 2001 as being members of Geobacillus gen. nov., meaning earth or soil Bacillus, with the well-known Geobacillus (Bacillus) stearothermophilus being assigned as the type strain.11 Thermophilic bacilli, including Geobacillus spp. are widely distributed and have been successfully isolated from all continents where geothermal areas occur.12 Geobacilli are also isolated from shallow marine hot springs and from deep sea hydrothermal vents, with Maugeri et al.13 describing the isolation of three novel halotolerant and thermophilic Geobacillus strains from three separate shallow marine vents off the Eolian Islands, Italy. High-temperature oilfields have also yielded strains of Geobacillus with Nazina et al.11,14 reporting two novel species G. subterraneus and G. uzenensis, isolated from the Uzen oilfield in Kazakhstan. In addition, Geobacillus spp. have also been recovered from artificial hot environments such as hot water pipelines, heat exchangers, waste treatment plants, burning coal refuse piles, and bioremediation biopiles.15,16 Whereas most work has concentrated upon the recovery of Geobacillus isolates from natural and artificial high-temperature biotopes, our attention has focused upon strains of Geobacillus readily isolated from temperate soil environments.17,18 Industrial interest in Geobacillus species has arisen from their potential applications in biotechnological processes, for ex10.1021/pr0504642 CCC: $33.50

 2006 American Chemical Society

Sub-Proteome of G. thermoleovorans T80

ample as sources of various thermostable enzymes, such as proteases,19 amylases,20 lipases,21 and pullanases.22 Geobacillus spp. also have potential for the generation of products for industrial uses such as exopolysaccharides.23 In addition, two strains of G. thermoleovorans have been described as producing large bacteriocins that exhibited a lytic activity toward other strains of G. thermoleovorans and also a range of bacteria of medical importance including Salmonella typhimurium.24 A variety of potential environmental biotechnology applications involving Geobacillus spp. have been described, which is perhaps unsurprising given the seemingly ubiquitous ability of Geobacillus spp. to metabolize hydrocarbons.25 Our group has identified two novel applications for Geobacillus spp.sfirst in herbicide metabolism, therefore being potential sources of genes for use in agricultural biotechnology, and second, the ability to disrupt quorum sensing in certain gram negative bacteria.16,26 With the prevalence of genomic data for Bacillus and related genera, it has become possible to adopt a systems biology approach to understand how these organisms adapt to extreme environmental conditions. Takami et al.4 employed a comparative genomic analysis of the alkaliphilic and extremely halotolerant O. iheyensis, the alkaliphilic and moderately halotolerant B. halodurans and the neutrophilic and moderately halotolerant B. subtilis. This approach allowed the identification of a number of candidate genes of importance in adaptation to highly alkaline environments. One problem with reliance on purely genome-based data, however, is that those organisms whose genomes have been sequenced to date contain large portions of predicted protein-coding regions encoding polypeptides of unknown biological function. This is exemplified by the recent description of G. kaustophilus HTA426, 31% of whose genome contains genes annotated as encoding hypothetical conserved proteins.5 The challenge for researchers is to build a comprehensive cellular map that includes genome sequence and proteomic data, enabling the assignment of molecular and cellular functions to the thousands of these predicted gene products.27 To meet this challenge, there must be systematic analysis of the changes in the protein content of any cellular system concurrent with any genomic investigation.28 We now report for the first time the analysis of the soluble sub-proteome of the obligate thermophile, G. thermoleovorans T80 utilizing a robust multidimensional protein identification protocol. To date, no global proteomic analysis has been carried out on any species of Geobacillus. While 2D-PAGE has been utilized to separate complex protein mixtures from G. stearothermophilus, protein identification has focused upon specific enzymes of interest in stress response such as superoxide dismutase29 and peroxiredoxin.30 The strategy that we describe utilizes for the first time a gel-free approach, allowing us to gain an insight into the cellular processes of this obligate thermophile. In addition, it has enabled us to assign a degree of functionality to gene products hitherto described only as being hypothetical conserved proteins.

Experimental Procedures Reagents. All reagents were purchased from Sigma-Aldrich (Poole, UK) with the exception of mass spectrometry grade water and acetonitrile, which were purchased from Romil (Cambridge, UK) and Trypsin, which was purchased from Promega (Southampton, UK). Cell Culture and Growth Conditions. G. thermoleovorans T80 was maintained at 60 °C as previously described by Obojska

research articles et al.16 Routine growth of the organism involved the inoculation of nutrient broth (50 mL in 250 mL Erlenmeyer flasks) with a loop of fresh, actively growing (16 h) culture from agar plates. Flasks were incubated aerobically at 60 °C with orbital shaking at 200 rpm in an Innova 4230 refrigerated incubator shaker (New Brunswick Scientific, NJ). Growth was monitored by the increase in culture attenuance at 600 nm. Protein Extraction and Quantification. G. thermoleovorans T80 cultures were harvested in the mid-log phase (D600 ) 0.8) of growth by centrifugation at 9000 × g for 10 min at 3-5 °C. The cell pellet was weighed and resuspended in 10 mM PBS (pH 7.8) at a ratio of 1 g cells to 2 mL buffer. The cells were then broken mechanically using an MSK cell homogenizer (B. Braun Biotech, Shwarzenberger, Germany) using a modification of the method of Ternan and McMullan.31 Briefly, an equal volume of 0.1 mm diameter glass beads was added to the cell suspension in the homogenizer chamber, and disruption was carried out for a total time of 2 min (10 s homogenization with 2 min cooling), following which the homogenate was pipetted into a centrifuge tube. The glass beads were washed with 5 mL PBS and the washings added to the centrifuge tube. The soluble proteome fraction was isolated by centrifugation of the homogenate at 25 000 × g for 30 min at 3-5 °C (Beckman J2-HS, Beckman Instruments, CA) followed by ultracentrifugation at 150 000 × g for 2 h at 3-5 °C (Beckman L8-M, Beckman Instruments, CA) to sediment the insoluble fraction. The supernatant was decanted and stored frozen in 1 mL aliquots at -70 °C until required. The total soluble protein content was measured using the Bradford assay.32 Strong Anion Exchange Perfusion Chromatography. An aliquot of the soluble protein fraction (8 mg) was loaded onto a Porous HQ/20 4.6 mm × 100 mm (1.662 mL column volume) strong anion exchange column (SAX) (Applied Biosystems, Foster City, USA) connected to a Biocad Vision workstation (Applied Biosystems/MDS SCIEX, Toronto, Canada). Buffers used for protein elution were Buffer A (50 mM Tris-HCl, pH 8.0) and Buffer B (50 mM Tris-HCl, 1M NaCl, pH 8.0). Protein elution was performed using a gradient of 0-100% Buffer B over 20 column volumes, at a flow rate of 5 mL min-1, with a further 10 column volumes of 100% Buffer B. Fractions (1 mL) were collected using an AFC 2000 automated fraction collector and then concentrated and desalted using 3 kDa Amicon microcon filters (Millipore corporation, Bedford, MA) as per the manufacturer’s instructions. Tryptic Digestion. Trypsin (2 µg, Promega, Southampton, UK) in 50 mM NH4HCO3, pH 7.8 was added to 100 µL of the desalted samples and incubated overnight at 37 °C, following which the reactions were frozen at -70 °C until required. Liquid Chromatography-Mass Spectrometric (LC-MS) Analysis. Mass spectrometry was performed using a 3200 Q-TRAP Hybrid ESI Quadropole linear ion trap mass spectrometer, ESI-Q-q-Qlinear ion trap-MS/MS (Applied Biosystems/MDS SCIEX, Toronto, Canada) with a nanospray interface, coupled with an online Ultimate 3000 nanoflow liquid chromatography system (Dionex/LC Packings, Amsterdam, The Netherlands). A µ-Precolumn Cartridge of (300 µm × 5 mm, 5 µm particle size) was placed prior to the C18 capillary column (75 µm × 15 cm, 3 µm particle size) to enable desalting and filtering. Both columns consisted of the reverse phase material PepMAP 100 C18 silica-based, with a 100 Å pore size (Dionex/ LC Packings). The buffers used in the gradient were Buffer A (0.1% formic acid in 2% acetonitrile) and Buffer B (0.1% formic acid in 80% acetonitrile). The nanoLC gradient was 75 min in Journal of Proteome Research • Vol. 5, No. 4, 2006 823

research articles

Graham et al.

Figure 2. Two-dimensional visualization of the predicted molecular mass and pI of proteins identified within the G. thermoleovorans T80 soluble sub-proteome. Figure 1. Protein distribution in each collected fraction from strong anion exchange chromatography (0 to 1 M NaCl) with respect to pI.

length: 0-55% B in 50 min, hold at 55% B for 10 min, 10 min at 90% B followed by 5 min at 100% A. The flow rate of the gradient was 300 nLmin-1. The detector mass range was set at 400-1800 m/z. MS data acquisition was performed in positive ion mode. During MS acquisition peptides with2+ and3+ charge state were selected for fragmentation. Database Searching and Protein Identification. Protein identification was carried out using an internal MASCOT server (version 1.9; Matrix Science, London, UK) searching against the MSDB database (latest version at the time of processing). Peptide tolerance was set at (1.2 Da with MS/MS tolerance set at (0.6 Da and the search set to allow for 1 missed cleavage. Only identifications with a MASCOT MOWSE score g 49 were regarded as significant hits regardless of the number of peptides.

Results and Discussion Comprehensive Analysis of G. thermoleovorans T80 Soluble Sub-Proteome. In this study we report the first proteomic analysis of the thermophilic bacterium G. thermoleovorans T80 and using our robust multidimensional protein identification system 294 proteins from within the soluble sub-proteome were identified. This expressed gene product subset represents an estimated 8% of the total proteome, employing data publicly available for G. kaustophilus HTA426, the closest phylogenetically related organism to G. thermoleovorans T80.25 No data is currently available in the literature on the expected distribution of proteins within sub-proteomic fractions of Geobacillus spp. As a benchmark, however, analysis of the model gram positive bacterium B. subtilis has shown that when grown aerobically under optimal conditions, some 693 soluble proteins were identified within its soluble sub-proteome, equivalent to only 17% of the predicted proteome.33 The veracity of our multidimensional protein identification approach was justified by the identification of proteins with a broad pI range in each of the 50 fractions obtained (Figure 1). Due to the complexity of peptide mixtures within each fraction, the separation capabilities of LC-MS systems is often exceeded. This, coupled to the limitations of the data dependent acquisition for the selection of peptides for MS/MS, requires that samples be run more than once.34,35 Thus, we analyzed the same peptide fractions three separate times increasing our overall protein identification by 32%. This is in close agreement 824

Journal of Proteome Research • Vol. 5, No. 4, 2006

Table 1. Analysis of Total Time Required for Multidimensional Protein Separation, Protein Identification and Manual Data Curation of the Soluble Sub-Proteome of G. thermoleovorans T80 workflow

time (hours)

no. injections no. fractions collected per injection total no. proteins labor time (hours) MS data analysis (hours) total curation time (hours)

3 50 294 240 720 960

with the recent work on Sulfolobus solfataricus P2 by Chong and Wright36 who increased their overall protein coverage by 40% when samples were run in triplicate. In the current study, our multidimensional protein identification protocol initially provided a total of 1336 positive identifications. After intensive manual curation (Table 1) that involved filtering out any proteins identified more than once, the removal of both redundant peptides from any single protein and obvious false positives resulted in a final list containing a total of 294 unique proteins. The average number of peptides per protein was 4 and the average MOWSE score was 196. Characterization of G. thermoleovorans T80 Soluble SubProteome. The protein subset identified in this study (see supplementary data) contained a wide range of proteins with respect, for example, to the physio-chemical properties of pI and molecular mass (Mr) (Figure 2). This 2D visualization showed that the smallest protein identified was 50S ribosomal protein L30 (Mr ) 7049 Da), and the largest was DNA directed RNA polymerase β subunit (Mr ) 134796 Da). The most acidic protein identified was inositol-monophosphate dehydrogenase (pI ) 3.13), whereas the most basic was a histone-like protein (pI ) 12.04). Analysis of origin of the identified proteins in this study bears out phylogentic relationships for the Geobacilli as predicted by16S rDNA sequence analysis.25 Of the 294 proteins identified, 84% had a closest match with the equivalent data for the predicted protein coding sequences from G. kaustophilus HTA426. Proteins from other Geobacilli accounted for 6% of the total number of proteins identified, a further 2% came from the Bacillus species, with the remainder being distributed among other bacterial species. Of the 294 proteins detected in this study, functional roles for 258 proteins (78%) were known or could be predicted from database analysis. Proteins within this soluble sub-proteome were assigned to functional categories utilizing the SubtiList

research articles

Sub-Proteome of G. thermoleovorans T80 Table 2. Functional Categorization of Proteins Identified within the Soluble Sub-Proteome of G. thermoleovorans T80 protein distribution (%) functional categories

cell wall transport, binding proteins and lipoproteins sensors (signal transduction) membrane bioenergetics (electron transport and ATP synthase) protein secretion cell division sporulation specific pathways main glycolytic pathway TCA cycle metabolism of amino acids and related molecules metabolism of nucelotides and nucleic acids metabolism of coenzymes and prosthetic groups metabolism of lipids metabolism of phosphate metabolism of sulfur DNA packaging and segregation RNA synthesis, elongation RNA synthesis, initiation RNA synthesis, regulation RNA synthesis, termination protein biosynthesis, ribosomal proteins protein synthesis, aminoacyltRNA synthetases protein biosynthesis elongation protein modification protein folding adaptation to atypical conditions detoxification similar to unknown proteins

no. of current G. kaustophilus proteins study annotated genome

4 10

1.4 3.4

1.5 7.3

6 10

2 3.4

0.7 2.4

1 4 2 17 11 17 29

‘ 1.4 0.7 5.8 3.7 5.8 9.9

0.4 0.5 2.5 5.1 0.9 0.5 5.4

24

8.2

2

22

7.5

3.1

12 1 2 4 6 8 5 3 26

4.1 0.3 0.7 1.4 2 2.7 1.7 1 8.8

2.7 0.2 0.1 0.3 0.2 0.4 3.8 0.1 1.7

9

3.1

0.7

7

2.4

0.1

2 6 2

0.7 2 0.7

0.7 0.2 1

8 36

2.7 12.2

0.9 31

database (www.pasteur.fr/Bio/SubtiList) as previously described by Takami et al.5 Table 2 shows that the largest category (12.2%) were classified as Similar to Unknown Proteins (hypothetical conserved proteins). The remaining proteins were distributed among other functional categories. A comparison of functional annotation with the equivalent data for the predicted protein coding sequences from the G. kaustophilus HTA426 genome confirmed that hypothetical conserved proteins, belonging to the Similar to Unknown Proteins functional category, were the most prevalent (31%). Hypothetical Conserved Proteins. As we move toward a systems biology understanding of what is occurring within an organism, it is essential to know how the structure and function of proteins allows them to contribute to cellular processes. As with many bacterial genomes, Takami et al. (2004)5 reported that a significant percentage of the genome (31%) of G. kaustophilus HTA426 was designated as genes encoding hypothetical conserved proteins. The absence of any functional property for nearly one-third of an organism’s predicted protein complement represents a considerable obstacle in our efforts to fully understand how these obligate thermophiles function. Examination of the data from our present study identified 36 proteins (35 of which were best matches to G. kaustophilus) annotated as hypothetical conserved proteins (see Supporting Information). The identification of such proteins within the cellextract of G. thermoleovorans T80 establishes the biological

functionality of these ‘hypothetical’ predicted protein coding sequences, elegantly demonstrating the potential of proteomic investigations to substantiate in silico experimentation. Having established the presence of such proteins within G. thermoleovorans T80 and wishing to understand how they contribute to functional processes we further examined them using NCBI BLASTp (www.ncbi.nlm.nih.gov/BLAST/). Such a bioinformatics approach allows for the identification within protein sequences of conserved domains37 thereby enabling a degree of functionality to be inferred. Using this methodology allowed us to assign putative functions to 29 of the 35 (83%) proteins (Table 3) thus giving us a deeper understanding of the processes occurring within the cell at the time of sampling. False Positive Proteins. Currently, data from large-scale proteomic studies is evaluated and validated by automated MS data interpretation algorithms. While this has had a significant impact on proteomics, there is currently no generally accepted standard for publication and validation of these MS based protein identification results.38 One of the challenges facing those involved in shotgun proteomics is the elimination of erroneous data that may lead to the identification of ‘phantom’ proteins within any given sample. Indeed Gygi’s group has reported the occurrence of false positives within protein identification at a level of 1% for their target database34 during such an investigation. Adopting a responsible approach to the curation of our data, we reduced the initial protein identification list from 1336 positive identifications to a total of 294 unique proteins. Within the proteins identified in the soluble sub-proteome, however, were a number of instances where the same protein was identified by database searching as coming from two separate bacteria. This can be explained by one of two possibilities. First, given the absence of G. thermoleovorans T80 genomic information in any publicly accessible database, the protein from G. thermoleovorans T80 could in fact be a chimeric version containing amino acids sequences from both bacterial proteins. Second, the protein identified as coming from the two distinct bacterial isolates is in fact the same protein and therefore one of the identified proteins is a false positive. Table 4 contains eleven proteins from the soluble subproteome that are identified by database searching as coming from two separate bacteria. Analysis of the peptides identified by MS/MS analysis of 50S ribosomal protein L2, 50S ribosomal protein L3, 30S ribosomal protein S4, elongation factor G and heat shock protein 60 that were identified as coming from both G. kaustophilus and G. stearothermophilus should in fact have been identified as originating only from G. kaustophilus due to the fact that peptides identified from both sequences were all found within the predicted protein sequence of this bacterium. Such an explanation can be used to also assign heat shock protein 10, catalase and acetyl CoA synthetase to G. kaustophilus. It is however possible that our observations may be as a result of the state of flux within the nomenclature of Geobacillus species. A recent report suggests that G. kaustophilus, G. thermoleovorans and many isolates of G. stearothermophilus may indeed belong to the same species.39 Succinate dehydrogenase (flavoprotein subunit) identified as coming from both G. kaustophilus and Symbiobacterium thermophilum and enolase, identified as coming from both G. kaustophilus and B. halodurans, in each case had only one unique peptide that seemingly was not in the G. kaustophilus genome dataset. However, upon investigation it was shown that these unique peptides were in fact present in G. kaustophilus Journal of Proteome Research • Vol. 5, No. 4, 2006 825

research articles

Graham et al.

Table 3. Functional Description of Hypothetical Conserved Proteins from G. thermoleovorans T80 Based on Conserved Domain Analysis code

NCBI-CDD

possible function

Q5KZL3•GEOKA Q5KWD8•GEOKA

25140 26374 11079 28636 28632 27899 25332 17193 23950 17072 25381 10363 None 27909 25604 None 14888 25932 26118 10491 25845 5390 11846 2429 25140 11175 8349 None None 10465 None 25558 3341 None 7651 16458 16458 9584 23308 25281

family of unknown function (DUF1028) M42 glutamyl aminopeptidase Cellulase M and related proteins (COG1363) Molybdopterin-binding oxidoreductase-like domains (MopB-3) Molybdopterin-binding oxidoreductase-like domains (MopB1) fructose-1,6-bisphosphatase Appr-1"-p processing enzyme Pirin-like protein Pirin•C, Pirin C-terminal region uncharacterized protein family Pyridine nucleotide-disulfide oxidoreductase. Thioredoxin reductase hypothetical conserved protein Rhodanese Homology Domain (RHOD) Metallo-beta-lactamase superfamily. hypothetical conserved protein GAF, domain present in phytochromes and cGMP-specific phosphodiesterase GatB/Yqey domain domain of unknown function (DUF299) 2-methylthioadenine synthetase Cobalamin adenosyltransferase Tetratricopeptide repeat domain containing protein uncharacterized conserved protein CbiX family of unknown function predicted kinase related to dihydroxyacetone kinase uncharacterized ACR, YfiH family hypothetical conserved protein hypothetical conserved protein predicted hydrolase of the metallo-beta-lactamase superfamily hypothetical conserved protein Acetyltransferase (GNAT) family 2′,5′ RNA ligase family hypothetical conserved protein Glycosyl transferases group 1 DinB family. DNA damage-inducible (din) genes DinB family. DNA damage-inducible (din) genes domain of unknown function (DUF370). predicted metal-dependent hydrolase NTPase/HAM1

Q5KU67•GEOKA Q5KUG8•GEOKA Q5KUT6•GEOKA Q5KVA7•GEOKA Q5KVD6•GEOKA Q5KVI1•GEOKA Q5KVZ8•GEOKA Q5KW10•GEOKA Q5KW29•GEOKA Q5KW38•GEOKA Q5KW51•GEOKA Q5KX05•GEOKA Q5KX17•GEOKA Q5KX01•GEOKA Q5KXP4•GEOKA Q5KXV6•GEOKA Q5KYZ7•GEOKA Q5KZL3•GEOKA Q5L0R2•GEOKA Q5L0W5•GEOKA Q5L118•GEOKA Q5L141•GEOKA Q5L142•GEOKA Q5L1B5•GEOKA Q5L1D1•GEOKA Q5L1N1•GEOKA Q5L2 × 7•GEOKA Q5L334•GEOKA Q5L3J2•GEOKA BAD74901 BAD75451 BAD76776 BAD76950

and therefore proved that the other two organisms were indeed false positives. In the case of RNA polymerase sigma factor 70 from Nocardia farcinica and Staphylococcus epidermidis, the peptide sequences identified by MS/MS are distinct for each organism. This evidence suggests that these identifications are not in fact false positives. When these peptides are examined, however, against the sequence of RNA polymerase sigma factor A from G. kaustophilus (the equivalent of RNA polymerase sigma factor 70) all the identified peptides from both Nocardia and Staphylococcus were present in the G. kaustophilus predicted protein sequence. Therefore they are indeed false positives and it remains unclear why the MS analysis software failed to assign these peptides to G. kaustophilus. Protein Identification. Using a proteomic approach, one would like to examine and identify all the proteins synthesized within a cell in order to gain a comprehensive understanding of how the cell functions under various conditions. However, at present, this is not practical given the problem of resolving the large numbers of proteins, with widely varying physiochemical properties, present in a cell at any one time. A more pragmatic approach is to examine the main sub-proteomic fractions within the cell such as acidic/alkaline, membrane bound and in our case, the soluble fraction, as in this way the majority of proteins synthesized in the bacterium can be identified. It should also be remembered that analysis of the 826

Journal of Proteome Research • Vol. 5, No. 4, 2006

entire proteome is not analogous to analysis of the entire genome, as under different physiological conditions only certain sets of gene products will be expressed. From a physiological standpoint examination of our soluble sub-proteome should present us with two sets of expressed proteins, namely those of growing cells (vegetative) and those of nongrowing cells. In the case of growing cells we can see a predominance of house-keeping proteins (Table 2) required for growth, those involved in protein production and biosynthesis (16%), those with a role in energy production and ATP production (3.4%), glycolysis (3.7%), TCA cycle (5.8%), amino acid synthesis (9.9%) and nucleotide synthesis (8.2%) as well as the translational apparatus of DNA and RNA synthesis (9%). Previous work has shown a positive relationship between the number of peptide hits for a protein and it’s abundance in the cell.40 This report confirms that the above-described housekeeping proteins are all abundantly expressed in the soluble sub-proteome of G. thermoleovorans T80. Within the sub proteome, all of the proteins required for the enzyme DNA-dependent RNA polymerase, the enzyme responsible for all cellular RNA synthesis41 have been identified. The core consists of Rββ′ω subunits and is capable of elongation and termination of transcription. When a further σ factor is bound to the RNA polymerase it produces a holoenzyme that increases the efficiency of transcription initiation and determines specific promoter recognition.41

research articles

Sub-Proteome of G. thermoleovorans T80

Table 4. Proteins within the Soluble Sub-Proteome of G. thermoleovorans T80 Identified as the Same Protein but Originating from Distinct Bacterial Isolates code

protein

organism

Q5L3Z4•GEOKA CAA38737

50S ribosomal protein L2 50S ribosomal protein L2

G. kaustophilus G. stearothermophilus

Q5L3Z7•GEOKA S24363

50S ribosomal protein L3 50S ribosomal protein L3

G. kaustophilus G. stearothermophilus

BAD77087 RS4•BACST

30S ribosomal protein S4 30S ribosomal protein S4

G. kaustophilus G. stearothermophilus

BAD62690 Q9F4B2•BACST

Translation elongation factor G Translation elongation factor G

G. kaustophilus G. stearothermophilus

BAD74534 Q9EZV4•BACST

Heat shock protein 60 Heat shock protein 60

G. kaustophilus G. stearothermophilus

Q5L3E7•GEOKA JC1479

Heat shock protein 10 Heat shock protein 10

G. kaustophilus thermophilic bacterium PS3

Q5KZ91•GEOKA Q9R7J5•BACST

Catalase Catalase

G. kaustophilus G. stearothermophilus

Q5KW45•GEOKA Q98JC6•RHIL0

Acetyl-CoA synthetase Acetyl-CoA synthetase

G. kaustophilus Mesorhizobium loti

Q5KWH9•GEOKA Q67JJ3•SYMTH

succinate dehydrogenase succinate dehydrogenase

G. kaustophilus Symbiobacterium thermophilum

BAD77339 D84094

enolase enolase

G. kaustophilus Bacillus halodurans

Q5YT84•NOCFA Q5HNY7•STAEQ

RNA polymerase sigma RNA polymerase sigma

Nocardia farcinica Staphylococcus epidermidis

In the case of nongrowing cells proteins are produced as a direct result of stress responses such as heat shock, nutrient limitation and oxidative stress. A number of sigma factors, σ37, σW, σ70, and σ43, of which the later two belong to the σA class, have been identified in our investigation. The σA factor transcribes the heat shock operons controlled by the HrcA-CIRCE complex within gram positive bacteria41 leading to the production of many proteins including the following proteins identified in our investigation: clpX; GroEL (heat shock protein 60); GroES (heat shock protein 10) and peptidyl-prolyl cis trans isomerase. σ37is responsible for the induction of genes encoding general stress proteins following heat, ethanol, salt or acid stress, or during energy depletion.42 In the sub-proteome, we have also identified histidine protein kinase, stage 0 sporulation protein F and stage V sporulation protein G all of which are known to be involved in the sporulation process.42 The σW factor also identified in the sub-proteome is involved in the detoxification process when cells enter stationary phase where it initiates transcription of genes encoding various ABC transporters.43 General stress response proteins, such as those involved in the cold-shock response of gram positive bacteria, i.e., cold shock protein D, in addition to ribosomal proteins S6 and L7/12 and glyceraldehyde dehydrogenase42,44 were also identified. Oxidative stress can be considered to be one of the most detrimental stresses to the cell causing damage at both the molecular and metabolic levels.30 To limit cellular damage enzymes that scavenge the free radicals responsible are expressed. In our current study, we identified superoxide dismutase, peroxiredoxin, thioredoxin, glutathione peroxidase, ferredoxin, nitrogen fixation protein (NifU protein), and catalase.29,30,42

Conclusion Proteomics is particularly useful for bacterial investigations as a large number of proteins can be identified from various fractionated sub-proteomes. As more proteomic workflows are developed for protein identification it becomes increasingly important to be able to evaluate the data provided by these systems. On initial inspection of the MS data, 1336 positive identifications were made. Upon careful manual examination of the data to exclude false positives and duplicate proteins we were left with the identification of 294 proteins with high MOWSE scores. The temptation for researchers in this area is perhaps to avoid the labor intensive repeat runs and manual curation necessary in order to obtain good quality protein identifications that can be relied on with a high degree of confidence to allow elucidation of the biochemistry occurring within a cell. We suggest, that as in many other areas of scientific research, a basic set of criteria is essential for all proteomic analysis. As demonstrated in this study, samples should be analyzed at least in triplicate and manual curation carried out routinely to remove false positives and duplicate identifications. An attempt should be made to assign some functionality to hypothetical proteins, and in that way one can asses the biochemical processes occurring at that point of time in the cell rather than obtaining a list of hypothetical proteins from genomic data.

Acknowledgment. We would like to thank the Wellcome Trust for funding the vacation studentship of Miss Catherine E. Pollock (VS/05/ULS/A2). We thank Dr. Phil Jackson of Applied Biosystems for technical help and advice. Thanks are given to Mr G.G. Din for valuable discussions of our data. Journal of Proteome Research • Vol. 5, No. 4, 2006 827

research articles Supporting Information Available: The master list of proteins identified. This material is available free of charge via the Internet at http://pubs.acs.org. References (1) Altermann, E. R.; Altermann, W. M.; Azcarate-Peril, M. A.; Barrangou, R.; Buck, B. L.; McAuliffe, O.; Souther, N.; Dobson, A.; Duong, T.; Callanan, M.; Lick, S.; Hamrick, A.; Cano, R.; Klaenhammer, T. R. PNAS 2005, 102, 3906-3912. (2) Read, T. D. P.; Tourasse, N.; Baillie, L. W.; Paulsen, I. T.; Nelson, K. E.; Tettelin, H.; Fouts, D. E.; Eisen, J. A.; Gill, E. K.; Okstad, O. A.; Helgason, E.; Rilstone, J.; Wu, M.; Kolonay, J. F.; Beanan, M. J.; Dodson, R. J.; Brinkac, L. M.; Gwinn, M.; DeBoy, R. T.; Madpu, R.; Daugherty, S. C.; Durkin, A. S.; Haft, D. H.; Nelson, W. C.; Peterson, J. D.; Pop, M.; Khouri, H. M.; Radune, D.; Benton, J. L.; Mahamoud, Y.; Jiang, L.; Hance, I. R.; Weidman, J. F.; Berry, K. J.; Plaut, R. D.; Wolf, A. M.; Watkins, K. L.; Nierman, W. C.; Hazen, A.; Cline, R.; Redmond, C.; Thwaite, J. E.; White, O.; Salzberg, S. L.; Thomason, B.; Friedlander, A. M.; Koehler, T. M.; Hanna, P. C.; Kolsto, A. B.; Fraser, C. M. Nature 2003, 423, 81-86. (3) Takami, H. N., K.; Takaki, Y.; Maeno, G.; Sasaki, R.; Masui, N.; Fuji, F.; Hirama, C.; Nakamura, Y.; Ogasawara, N.; Kuhara, S.; Horikoshi, K. Nucl. Acids Res. 2000, 28, 4317-4331. (4) Takami, H.; Takaki Y.; Uchiyama, I. Nucl. Acids Res. 2002, 30, 3927-3935. (5) Takami, H.; Takaki, Y.; Chee, G. J.; Nishi, S.; Shimamura, S.; Suzuki, H.; Matsui, S.; Uchiyama, I. Nucl. Acids Res. 2004, 32, 6292-6303. (6) Wisotzkey, J. D.; Jurtshuk, P., Jr.; Fox, G. E.; Deinhard, G.; Poralla, K. Int. J. Syst. Bacteriol. 1992, 42, 263-269. (7) Dufresne, S.; Bousquet, J.; Boissinot, M.; Guay, R. Int. J. Syst. Bacteriol. 1996, 46, 1056-1064. (8) Touzel, J. P.; O’Donohue, M.; Debeire, P.; Samain, E.; Breton, C. Int. J. Syst. Evol. Microbiol. 2002, 50, 315-320. (9) Ash, C.; Farrow, J. A. E.; Wallbanks, S.; Collins, M. D. Lett. Appl. Microbiol. 1991, 13, 202-206. (10) Rainey, F. A.; Fritze, D.; Stackebrandt, E. FEMS Microbiol. Lett. 1994, 115, 205-211. (11) Nazina, T. N.; Tourova, T. P.; Poltaraus, A. B.; Novikova, E. V.; Ivanova, A. E.; Grigoryan, A. A.; Lysenko, A. M.; Belyaev, S. S. Microbiology 2000, 69, 96-102. (12) Sharp, R. J.; Riley, P. W.; White, D. In Thermophilic Bacilli; Kristjansson, J. K., Ed.; CRC Press: New York, 1992; pp 19-50. (13) Maugeri, T. L.; Gugliandolo, C.; Caccamo, D.; Stackebrandt, E. Syst. Appl. Microbiol. 2002, 25, 450-455. (14) Nazina, T. N.; Tourova, T. P.; Poltaraus, A. B.; Novikova, E. V.; Grigoryan, A. A.; Ivanova, A. E.; Lysenko, A. M.; Petrunyaka, V. V.; Osipov, G. A.; Belyaev, S. S.; Ivanov, M. V. Int. J. Syst. Evol. Microbiol. 2001, 51, 433-446. (15) Maugeri, T. L.; Gugliandolo, C.; Caccamo, D.; Stackebrandt, E. Syst. Appl. Microbiol. 2002, 24, 572-587. (16) Obojska, A.; Ternan, N. G.; Lejczak, B.; Kafarski, P.; McMullan, G. Appl. Environ. Microbiol. 2002, 68, 2081-2084. (17) Marchant, R.; Banat, I. M.; Rahman, T. J.; Berzano, M. Trends Microbiol. 2001, 10, 120-121.

828

Journal of Proteome Research • Vol. 5, No. 4, 2006

Graham et al. (18) Marchant, R.; Banat, I. M.; Rahman, T. J.; Berzano, M. Environ. Microbiol. 2002, 4, 595-602. (19) Sookkheo, B.; Sinchaikul, S.; Phutrakul, S.; Chen, S.-T. Protein Expres. Purif. 2000, 20, 142-151. (20) Uma Maheswar Rao, J. L.; Satyanarayana, T. Lett. Appl. Microbiol. 2003, 36, 191-196. (21) Lee, D.-W.; Kim, H.-W.; Lee, K.-W.; Kim, B.-C.; Choe, E.-A.; Lee, H. S.; Kim, D.-S.; Pyun, Y. R. Enzyme Microb. Technol. 2001, 29, 363-371. (22) Ben Messaoud, E.; Ben Ammar, Y.; Mellouli, L.; Bejar, S. Enzyme Microb. Technol. 2002, 31, 827-832. (23) Schiano Moriello, V.; Lama, L.; Poli, A.; Gugliandolo, C.; Maugeri, T. L.; Gamacorta, A.; Nicolaus, B. J. Ind. Microbiol. Biotechnol. 2003, 30, 95-101. (24) Novotny, J. F.; Perry, J. J. Appl. Environ. Microbiol 1992, 58, 23932396. (25) Bustard, M. T.; Whiting, S.; Cowan, D. A.; Wright, P. C. Extremophiles 2002, 6, 319-323. (26) McMullan, G.; Christie, J. M.; Rahman, T. J.; Banat, I. M.; Ternan, N. G.; Marchant, R. Biochem. Soc. Trans. 2004, 32, 214-217. (27) Bader, G. D.; Heilbut, A.; Andrews, B.; Tyers, M.; Hughes, T.; Boone, C. Trends Cell Biol. 2003, 13, 344-356. (28) Kim, H.; Page, G. P.; Barnes, S. Nutrition 2004, 20, 155-165. (29) Sookkheo, B.; Sinchaikul, S.; Thannan, H.; Thongprasong, O.; Phutrakul, S.; Chen, S. T. Proteomics 2002, 2, 1311-1315. (30) Topanurak, S.; Sinchaikul, S.; Phutrakul, S.; Sookkheo, B.; Chen, S. T. Proteomics 2005, 5, 3722-3730. (31) Ternan, N. G.; McMullan, G. Biochem. Biophys. Res. Commun. 2002, 290, 802-805. (32) Bradford, M. M. Anal. Biochem. 1976, 72, 248-254. (33) Eymann, C.; Dreisbach, A.; Albrecht, D.; Bernhardt, J.; Becher, D.; Gentner, S.; Tam, L. T.; Bu ¨ ttner, K.; Buurman, G.; Scharf, G.; Venz, S.; Vo¨lker, U.; Hecker, M. Proteomics 2004, 4, 2849-2876. (34) Elias, J. E.; Haas, W.; Faherty, B. K.; Gygi, S. P. Nat. Methods 2005, 2, 667-675. (35) Schaefer, H.; Chervet, J. P.; Bunse, C.; Joppich, C.; Meyer, H. E.; Marcus, K. Proteomics 2004, 4, 2541-2544. (36) Chong, P. K.; Wright, P. C. J. Proteome Res. 2005, 4, 1789-1798. (37) Marchler-Bauer, A.; Bryant, S. H. Nucl. Acids Res. 2004, 32, W 327-331. (38) Chamrad, D.; Meyer, H. E. Nat. Methods 2005, 2, 647-648. (39) Zeigler, D. R. Int. J. Syst. Evol. Microbiol. 2005, 55, 1171-1179. (40) Mawuenyega, K. G.; Kaji, H.; Yamauchi, Y.; Shinkawa, T.; Saito, H.; Taoka, M.; Takahashi, N.; Isobe, T. J. Proteome Res. 2003, 2, 23-35. (41) Rosen, R.; Ron, E. Z. Mass Spectrom. Rev. 2002, 21, 244-265. (42) Storz, G.; Hengge-Aronis, R. Bacterial Stress Responses; American Society for Microbiology Press: Washington, 2000. (43) Huang, X.; Gaballa, A.; Cao, M.; Helmann, J. D. Mol. Microbiol. 1999, 31, 361-371. (44) Sinchaikul, S.; Sookkheo, B.; Phutrakul, S.; Pan, F. M.; Chen, S. T. Proteomics 2002, 2, 1316-1324.

PR0504642