A Combined Shotgun and Multidimensional ... - ACS Publications

Aug 3, 2006 - A Combined Shotgun and Multidimensional Proteomic Analysis of the Insoluble Subproteome of the Obligate Thermophile, Geobacillus ...
1 downloads 0 Views 524KB Size
A Combined Shotgun and Multidimensional Proteomic Analysis of the Insoluble Subproteome of the Obligate Thermophile, Geobacillus thermoleovorans T80 Robert Leslie James Graham,*,† S. Naomi O’Loughlin,† Catherine E. Pollock,† Nigel G. Ternan,† D. Brent Weatherly,‡ Philip J. Jackson,§ Rick L. Tarleton,‡ and Geoff McMullan† School of Biomedical Sciences, University of Ulster, Coleraine, County Londonderry BT52 1SA, United Kingdom, The Center for Tropical and Emerging Global Diseases, University of Georgia, Athens, Georgia 30605, and Applied Biosystems, Lingley House, 120 Birchwood Boulevard, Warrington WA3 7QH, United Kingdom Received May 22, 2006

To further our understanding of the biology of the thermophilic bacterium Geobacillus thermoleovorans T80, we now report the first proteomic analysis of the insoluble subproteome of this isolate. A combination of both shotgun and multidimensional methodologies were utilized, and a total of 8628 peptides was initially identified by automated MS/MS identification software. Curation of these peptides led to a final list of 184 positive protein identifications. The proteins from this insoluble subproteome were functionally classified, and physiochemical characterization was carried out. Of 15 hypothetical conserved proteins identified, we have assigned function to all but four. A total of 31 proteins were predicted to possess signal peptides. In silico investigation of these proteins allowed us to identify four of the five bacterial classes of signal peptide, namely, (i) twin-arginine translocation; (ii) Sec-type; (iii) lipoprotein, and (iv) ABC transport. In addition, a number of proteins were identified that are known to be involved in the transport of compatible solutes, known to be important in microbial stress responses. Keywords: proteomics • shotgun • 1D LC-MS/MS • Geobacillus thermoleovorans • thermophile • membrane • insoluble

Introduction Proteomic analysis aims to systematically identify and assess the entire protein complement of any given cell and thus allow a greater understanding of how these protein networks enable the cells to function under various conditions.1,2 However, unless the proteome is effectively fractionated into less complex subproteomes such as, for example, the cytosolic and insoluble fractions, only the most abundant proteins will be identified. Even with initial reduction in sample complexity, further protein separation is recommended in order to obtain a deeper penetration into an organism’s proteome. Currently, a variety of separation methodologies are employed including both one(1-DE) and two-dimensional (2-DE) gel electrophoresissthese are powerful tools, especially in comparative physiological studies, and are the methods of choice for many researchers. However, these methodologies have inherent limitations result* Address for manuscript correspondence: Dr. Robert Graham, School of Biomedical Science, University of Ulster, Cromore Road, Coleraine, Co. Londonderry, BT52 1SA, U.K. E-mail: [email protected]. Telephone: +44(0)2870 323227. Fax: +44(0)2870 324375. † University of Ulster. ‡ University of Georgia. § Applied Biosystems. 10.1021/pr0602444 CCC: $33.50

 2006 American Chemical Society

ing in certain categories of proteins, such as those at the extremes of pI and molecular mass as well as the highly hydrophobic proteins, being unattainable.3,4 Membrane proteins typically possess a high degree of hydrophobicity thus making them difficult to solubilize when utilizing traditional proteomic preparative techniques. As a result of these intrinsic problems, alternative techniques must be employed to enable the identification of these membrane proteins. The establishment of gel-free proteomics has allowed researchers to complement these robust gel-based techniques and allow for more efficient detection of ‘problematic’ proteins. Complete genomic sequences are now available for a large number of prokaryotic and eukaryotic organisms. The Bacillaceae are particularly well-represented among those genomes sequenced to date, with the Comprehensive Microbial Resource database at The Institute for Genomic Research (www.tigr.org) containing sequences for 23 members of this family, including: Bacillus anthracis (9 strains), Bacillus cereus (3 strains), Bacillus licheniformis (2 strains), Bacillus subtilis, and Bacillus thuringiensis subsp. konkukian. These bacterial species represent microorganisms of major importance in both medical and food research;5,6 however, it is perhaps within the genomes of extremophilic bacilli that discoveries of evolutionJournal of Proteome Research 2006, 5, 2465-2473

2465

Published on Web 08/03/2006

research articles ary and biotechnological significance shall be made. Japanese research organizations have been responsible for the sequencing of genomes from alkaliphilic and halophilic organisms represented by Oceanobacillus iheyensis HTE831, Bacillus clausii KSM-K16, and Bacillus halodurans C-125,7,8 but of particular interest to us, however, are the thermophilic bacilli represented by organisms such as Geobacillus kaustophilus HTA426.9 A variety of Bacillaceae from a number of genera were known to be thermophilic, having growth optima in the range 45-70 °C;10-12 however, molecular phylogenetic analysis showed that the majority of such thermophilic bacteria belonged to the genus Bacillus genetic groups 1 and 5.13,14 Group 5 isolates were subsequently identified as a phenotypically and phylogenetically coherent group with high 16S rDNA sequence similarity (98.5-99.2%)15 and, as a consequence, were reclassified in 2001 as being members of Geobacillus gen. nov., meaning earth or soil Bacillus, with Geobacillus (Bacillus) stearothermophilus being assigned as the type strain.15 As a result of the widespread availability of genomic data for Bacillus and related genera, it is now possible to take a systems biology approach to understanding how these organisms adapt to extreme environmental conditions. Takami et al.8 employed a comparative genomic analysis of the alkaliphilic and extremely halotolerant O. iheyensis, the alkaliphilic and moderately halotolerant B. halodurans, and the neutrophilic and moderately halotolerant B. subtilis that allowed the identification of a number of candidate genes of importance in adaptation to highly alkaline environments. To meet the challenges of systems biology, however, there must be a comprehensive analysis of individual organisms which links data obtained from various genome-wide approaches with that generated from full-scale proteomic investigations.1 We previously reported the first global proteomic analysis of the soluble subproteome of the highly thermophilic aerobic eubacterium, Geobacillus thermoleovorans T80, in which we employed a robust multidimensional protein identification protocol.16 The identification and assignment of biological function to some 287 proteins allowed us to gain insight into cellular processes within the cytosol of this bacterium. For example, we identified a number of sigma factors, such as σA, that initiate transcription of the heat shock operons controlled by the HrcA-CIRCE complex within Gram-positive bacteria. To further our understanding of the biology of G. thermoleovorans, we now report the first comprehensive analysis of its insoluble subproteome. The strategy that we describe utilizes bottom-up and top-down gel-free and gel-based methodological approaches to deal with sample complexity. This has allowed the identification of membrane-associated proteins, secreted proteins, and those integral to the membrane, with functionalities including transport, osmoregulation, and heat shock response.

Experimental Procedures Reagents. All reagents were purchased from Sigma-Aldrich (Poole, U.K.) with the exception of mass spectrometry grade water and acetonitrile, which were purchased from Romil (Cambridge, U.K.); trypsin, which was purchased from Promega (Southampton, U.K.); and gel electrophoresis equipment and reagents, which were purchased from Invitrogen (Renfrewshire, U.K.). Cell Culture and Growth Conditions. G. thermoleovorans T80 was maintained at 60 °C as previously described by Obojska 2466

Journal of Proteome Research • Vol. 5, No. 9, 2006

Graham et al.

et al.17 Routine growth of the organism involved the inoculation of nutrient broth (50 mL in 250 mL Erlenmeyer flasks) with a loop of fresh, actively growing (16 h) culture from nutrient agar plates. Flasks were incubated aerobically at 60 °C with orbital shaking at 200 rpm in an Innova 4230 refrigerated incubator shaker (New Brunswick Scientific, NJ). Growth was monitored by the increase in culture attenuance at 600 nm. Protein Extraction. G. thermoleovorans T80 cultures were harvested in the mid-log phase (D600 ) 0.8) of growth by centrifugation at 9000g for 10min at 3-5 °C. The cell pellet was weighed and resuspended in 10 mM PBS (pH 7.8) at a ratio of 1 g cells/2 mL buffer. The cells were then broken mechanically using an MSK cell homogenizer (B. Braun Biotech, Shwarzenberger, Germany) using the method of Graham et al.16 Unbroken cells and cellular debris were removed by centrifugation of the homogenate at 25 000g for 30 min at 3-5 °C (Beckman J2-HS, Beckman Instruments, CA). The supernatant was then centrifuged at 150 000g for 2 h at 3-5 °C (Beckman L8-M, Beckman Instruments, CA) to obtain a pelleted insoluble fraction which was stored frozen at -70 °C until required. Solubilization and Quantification of the Pellet Proteins. The pellet was solubilized by sonication in 1 mL of resolubilization buffer containing 7 M guanadine hydrochloride, 500 mM Tris-HCl (pH 8.0), and 10 mM EDTA as described by Mawuenyega et al.18 The protein concentration, measured using the Bradford assay,19 was 9 mg mL-1. Dithithreitol (final concentration 10 mM) was added and the solution incubated in the dark at room temperature for 5 min prior to the addition of iodoacetamide (final concentration 10 mM) and a further incubation in the dark at room temperature for 5 min. The sample was then stored frozen at -70 °C in 100 µL aliquots until required. Shotgun Tryptic Digestion. Trypsin (4 µg, Promega, Southampton, U.K.) in 50 mM NH4HCO3, pH 7.8 was added to 100 µL of sample and incubated overnight at 37 °C, following which the reactions were frozen at -70 °C until required. One-Dimensional Gel Electrophoresis (1-DE). An aliquot of the resolubilized pellet protein sample was diluted 10-fold with deionized water; 10 µL of this diluted sample was added to 10 µL of Tris-Glycine SDS sample loading buffer (Invitrogen, Renfrewshire, U.K.) and boiled for 5 min. The sample (20 µL) was loaded onto a 1 mm thick Nu-Page 4-12% Bis-Tris gel (Invitrogen, Renfrewshire, U.K.). SeeBlue Plus 2 (Invitrogen, Renfrewshire, U.K.) was used as a protein molecular mass marker. The gel was electrophoresed, using MES SDS running buffer, in an X-Cell II mini gel system (Invitrogen, Renfrewshire, U.K.) at 200 V, 120 mA, 25 W per gel for 35 min. Proteins were visualized using SimplyBlue Safestain (Invitrogen, Renfrewshire, U.K.). The entire lane was excised from the gel and cut into seven fractions based on molecular mass as depicted in Figure 2. In-Gel Tryptic Digestion. Excised gel fractions were washed for 30 min in 200 mM NH4HCO3, pH 7.8, at 37 °C. These fractions were then dehydrated by incubation for 30 min in 200 mM NH4HCO3 pH 7.8/MeCN (4:6 v/v) at 37 °C, followed by rehydration for 30 min in 50 mM NH4HCO3, pH 7.8, at 37 °C. Following incubation in 100% acetonitrile for 2 min, 0.1 µg of trypsin in 50 mM NH4HCO3, pH 7.8, was added to each sample, which was then incubated overnight at 37 °C. The supernatant was subsequently recovered into microcentrifuge tubes, and a second peptide extraction from these gel pieces was carried out (0.1% TFA in 60% acetonitrile for 5 min). Peptide-containing liquid fractions were pooled, dried under

Shotgun and Multidimensional Proteomic Analysis

vacuum, and re-suspended in 20 µL of 0.1% formic acid in 2% acetonitrile prior to storage at -70 °C until required. Liquid Chromatography-Mass Spectrometric Analysis (LCMS). Mass spectrometry was performed using a 3200 Q-TRAP Hybrid ESI Quadropole linear ion trap mass spectrometer, ESIQ-q-Qlinear ion trap-MS/MS (Applied Biosystems/MDS SCIEX, Toronto, Canada) with a nanospray interface, coupled with an online Ultimate 3000 nanoflow liquid chromatography system (Dionex/LC Packings, Amsterdam, The Netherlands). A µ-Precolumn Cartridge (300 µm × 5 mm, 5 µm particle size) was placed prior to the C18 capillary column (75 µm × 150 mm, 3 µm particle size) to enable desalting and filtering. Both columns contained the reversed phase material PepMAP 100 (C18 silicabased) with a 100 Å pore size (Dionex/LC Packings). The elution buffers used in the gradient were Buffer A (0.1% formic acid in 2% acetonitrile) and Buffer B (0.1% formic acid in 80% acetonitrile). For shotgun analysis, the nanoLC gradient used was 180 min in length: 0-55% B in 150 min, 15 min at 90% B, followed by 15 min at 100% A. In our multidimensional analysis protocol, the nanoLC gradient used was 60 min in length: 0-55% B in 45 min, 10 min at 90% B, followed by 5 min at 100% A. The flow rate of the gradient for both approaches was 300 nL min-1. The detector mass range was set at 400-1800 m/z. MS data acquisition was performed in positive ion mode. During MS acquisition, peptides with 2+ and 3+ charge state were selected for fragmentation. Database Searching, Protein Identification, and PROVALT Analysis. Protein identification was carried out using an internal MASCOT server (version 1.9; Matrix Science, London, U.K.) searching against the MSDB database (latest version at the time of processing). Peptide tolerance was set at (1.2 Da with MS/MS tolerance set at (0.6 Da and the search set to allow for 1 missed cleavage. Only identifications with a MASCOT MOWSE score > 43 were regarded as significant hits regardless of the number of peptides. To expedite the curation of the identified protein list from MASCOT, the result files were reanalyzed against the MSDB database using the heuristic method known as the protein validation tool PROVALT.20 This automated program takes large proteomic MS datasets and reorganizes them taking multiple MASCOT results and identifies peptides that match. Redundant peptides are removed, and related peptides are grouped together associated with their predicted matching protein; the program dramatically reduces this portion of the curation process. For identification purposes, the minimum peptide length was set at 6 amino acids, the minimum peptide MOWSE score was set at 25, and the minimum high quality peptide MOWSE score was set at 49. Again only identifications with scores > 43 were regarded as significant hits regardless of the number of peptides. Bioinformatics. PSORTb version 2.0.421 (http://www. psort.org/psortb/index.html) was used for the prediction of bacterial protein subcellular localization. SignalP 3.022 (http:// www.cbs.dtu.dk/services/SignalP/) was used to predict the presence and location of signal peptide cleavage sites in amino acid sequences, for classically secreted proteins. SecretomeP 2.023 (http://www.cbs.dtu.dk/services/SecretomeP/) was used for the prediction of nonclassical, that is, not signal peptide triggered protein secretion.

Results and Discussion Comprehensive Analysis of the G. thermoleovorans T80 Insoluble Subproteome. In this study, we report the first proteomic analysis of the insoluble subproteome of the ther-

research articles mophilic bacterium G. thermoleovorans T80. Using a combination of gel-free, bottom-up shotgun analysis and top-down, multidimensional, gel-based and gel-free analysis, a total of 184 unique proteins from the insoluble subproteome was identified. This expressed gene product subset represents an estimated 5% of the total proteome, employing data publicly available for G. kaustophilus HTA426 which is the closest phylogenetically related organism to G. thermoleovorans T80.9,15 As a benchmark, a recent study on the insoluble proteome of the model Gram-positive microorganism, B. subtilis, identified 637 proteins representing 16.5% of its total predicted proteome.24 Comparison of the soluble16 and insoluble proteomic fractions for G. thermoleovorans T80 showed that 65 proteins were common to both; however, this study identified a further 119 proteins that were found to be unique to the insoluble subproteome. To date, therefore, we have identified a total of 413 unique proteins from G. thermoleovorans T80 representing some 11% of the predicted proteome for G. kaustophilus HTA426, the closest phylogenetically related organism. In the current study, our combined approach initially identified 8628 peptides using automated MS/MS analysis software. Automated curation of this initial list of identified peptides by the heuristic bioinformatic tool PROVALT20 reduced this list to 884 identified peptides. Manual curation of this list, involving discarding any peptides identified more than once along with the removal of redundant peptides from any single protein and the removal of obvious false positives, produced a final list of 709 unique peptides that led to the positive identification of 184 unique proteins. The average number of peptides per protein was 3, and the average MOWSE score was 171. Characterization of the G. thermoleovorans T80 Insoluble Subproteome. 1. Shotgun Analysis. Bottom-up shotgun investigation of the insoluble subproteome followed by automated MS/MS software analysis of the data resulted in 2632 peptide identifications. PROVALT curation reduced this to 250 identified peptides, with manual curation leading to a final list of 212 unique peptides. These 212 peptides equated to a final protein list of 85 proteins (see Supporting Information) encompassing a wide range of proteins with respect to the physiochemical properties of pI and molecular mass (Mr). A 2D visualization (Figure 1) showed that the smallest protein identified was the 50S ribosomal protein L29 (Mr ) 7796 Da), and the largest was DNA directed RNA polymerase Beta′ subunit (Mr ) 134 880 Da). The most acidic protein identified was conserved protein YqhY (pI ) 4.75), while the most basic was the 50S ribosomal protein L20 (pI ) 11.58). 2. Multidimensional Analysis. This top-down multidimensional analysis involved the insoluble subproteome being first separated by 1-DE (NU-PAGE). The resultant gel was then cut into seven fractions based on the SeeBlue Plus 2 molecular mass markers. Each gel fraction was then trypsinized, and the extracted peptides were separated on a reversed phase C18 column over a 90 min time period prior to being introduced onto the mass spectrometer.1,25,26 Automated MS/MS software analysis gave an initial identification of 5996 peptides. PROVALT curation led to this number being reduced to 634 peptide identifications. Manual curation further reduced this list to 497 uniquely identified peptides. A total of 157 proteins was subsequently identified from these 497 peptides. Figure 2A shows the distribution, based on molecular mass, of proteins identified within the seven different fractions of the NU-PAGE gel, while Figure 2B shows a theoretical 2D visualization of the Journal of Proteome Research • Vol. 5, No. 9, 2006 2467

research articles

Graham et al.

Figure 1. Two-dimensional visualization of the predicted molecular mass and pI of proteins identified within the G. thermoleovorans T80 insoluble subproteome by shotgun analysis.

proteins identified via this method. Once again, we identified a wide distribution of proteins with respect to pI and Mr. The 2D visualization showed that the smallest protein identified was the 50S ribosomal protein L30 (Mr ) 7055 Da), and the largest was a hypothetical protein (Mr ) 123 180 Da). The most acidic protein identified was a hypothetical conserved protein (pI ) 4.61), while the most basic was the 50S ribosomal protein L35 (pI ) 11.92). In total, therefore, the two distinct methodologies of shotgun and multidimensional analysis of the insoluble subproteome identified 184 proteins. When the proteins identified by the two methods were compared, it was found that 58 proteins were identified by both methods, with 27 proteins being uniquely identified by the shotgun method and 99 proteins uniquely identified by multidimensional analysis. As can be seen from the 2D visualizations (Figure 1; Figure 2B), the distribution of pI and Mr for proteins identified by the two methodologies was comparable. In addition (Figure 2A), the majority of proteins identified within the NU-PAGE gel pieces corresponded to the range of molecular mass excised. As expected, the best methodology for total protein identification was the multidimensional approach. However, the shotgun analysis identified 27 proteins, representing 12% of the total identified in this study, which were not found in the multidimensional analysis. Thus, ideally, a combination of both shotgun and multidimensional approaches should be applied to gain the widest coverage of the proteome possible. As detailed previously, due to the chance nature of automated selection of peptides for MS/MS during ESI-MS/MS analysis, replicate injections of the same sample are essential in order to increase overall protein coverage.16,27 In the current study, all samples were analyzed in triplicate. For the shotgun analysis, this approach increased overall protein coverage by 42% (Figure 3A), and for the multidimensional analysis, as typified by fraction 6, the overall protein coverage increased by 41% (Figure 3B). Such an increase in overall protein coverage is within the range previously described in the literature.16,27 Analysis of the origin of the identified proteins in this study once again confirmed the phylogenetic relationship for the Geobacilli as predicted by 16S rDNA sequence analysis.15 The use of G. kaustophilus HTA426 as a model organism for 2468

Journal of Proteome Research • Vol. 5, No. 9, 2006

Figure 2. (A) Protein distribution within the G. thermoleovorans T80 insoluble subproteome following 1-DE. Seven fractions were excised from the gel according to the range of molecular weight shown. (B) Two-dimensional visualization of the predicted molecular mass and pI of proteins identified within the G. thermoleovorans T80 insoluble subproteome by multidimensional analysis.

comparison with G. thermoleovorans T80 has been further justified by the recent work of Zeigler,28 who suggested that these two organisms are genetically extremely similar. Of the 184 proteins identified, 84% had a closest match with the equivalent data for the predicted protein coding sequences from other Geobacilli. Proteins from Bacillus spp. accounted for a further 6.5%, with the remainder being distributed among other Gram-positive bacterial species. Of the 184 proteins identified in this study, functional roles for 169 proteins (92%) were known or could be predicted from database analysis. Proteins within this insoluble subproteome were assigned to functional categories utilizing the SubtiList database (www.pasteur.fr/Bio/SubtiList) as in our previous investigation.16 Table 1 shows the distribution of proteins identified within the shotgun, multidimensional analysis, and a combination of both analyses. Within the combined list, the largest number of proteins (32%) fell into the functional category of ‘Protein Biosynthesis’, followed by ‘TCA cycle proteins’ (8.7%), ‘Similar to Unkown Proteins’ (hypothetical conserved proteins) (8.2%), and ‘Transport proteins, binding proteins and lipoproteins’ (7.1%). The remaining identified proteins were distributed among other functional categories.

Shotgun and Multidimensional Proteomic Analysis

Figure 3. Multiple (three-run) analysis of the insoluble subproteome. (A) Shotgun workflow and (B) multidimensional workflow (results for fraction 6).

Recently, Bunai and Yamane24 have reviewed the effectiveness and limitations of 2-DE in bacterial membrane protein proteomics. Limitations identified included the inability of 1-DE or 2-DE to analyze very large or very small, acidic, basic, or highly hydrophobic proteins. Our current study, while using a combination of gel-free and 1-DE methods for proteome fractionation, would largely concur with these observations. One exception would be that our proteomic workflow seemed to enrich for a significant number of basic proteins with pI in excess of 10. In addition, it was clear that gel-free and gel-based proteomic techniques identified distinct proteins. It can therefore be concluded that the techniques employed by us should be seen as complementary to, rather than a replacement for, 2-DE proteomics. Protein Identification. Within any proteomic investigation, one seeks to examine and identify all cellular proteins synthesized under any given condition. However, at present, this is not practical given the technical problems associated with resolving large numbers of proteins with widely varying physiochemical properties that may be present in a given sample. A more pragmatic approach is to examine the major subproteomic fractions within the cell, fractions such as the soluble fraction, the acidic/alkaline fraction or, as in the present study, the insoluble fraction. Consequently, it is to be expected that a larger number of the proteins synthesized under a given condition will be identified as a result of decreasing the complexity of the sample.16

research articles From a physiological standpoint, one would expect the proteome of actively growing cells to contain high levels of proteins involved in housekeeping processes. In this study, we identified numerous proteins involved in core metabolic processes including glycolysis, the TCA cycle, and the pentose phosphate pathway among others. Additionally, a number of stress response proteins previously identified by us16 in the soluble fraction were also found to be present including GroEL, GroES, superoxide dismutase, nitrogen fixation protein (NifU), and catalase. By far, the largest category of proteins identified within the insoluble subproteome, however, was for proteins involved in Protein Biosynthesis (32%) including the ribosomal proteins. As can be seen from the protein master lists (Supporting Information), of the 21 predicted 30S ribosomal proteins,9 19 (90%) were identified with 15 (71%) of those being identified by shotgun analysis alone. Of the 35 predicted 50S ribosomal proteins,9 26 (74%) were identified in this study, with 20 (57%) being identified by shotgun analysis alone. Given that the insoluble subproteome will invariably contain not only integral, but also membrane-associated proteins, this finding is not unexpected, and indeed membrane-associated ribosomes have been observed in B. subtilis.29 The identification of proteins from functional categories of ‘Transport, binding proteins and lipoproteins’ (7.1%) and those involved in ‘Membrane bioenergetics’ (6.5%) is thus also unsurprising; these proteins would be expected to be membrane-associated or integral to the membrane in order to carry out their biological functions. Within the insoluble subproteome were a number of proteins (Q5L380_GEOKA, Q5L381_GEOKA, and Q837Z9_ENTFA) that are predicted to be involved in the transport of the compatible solutes glycine, betaine, carnitine, and choline. Accumulation of such compounds has been reported to occur in conjunction with stress imposed by salinity or heat shock.30 In addition, we detected the presence of a maltose/maltodextrin transport system protein (Q5L241_GEOKA), homologues of which have been implicated in the accumulation of compatible solutes such as trehalose in other thermophiles such as Thermococcus and Thermoanaerobacter.31 While thermophilic and hyperthermophilic bacteria and archaea usually accumulate novel compatible solutes, it is likely that within Geobacilli these conventional compatible solutes play some role in thermophily, possibly by providing proteins with protection against thermal denaturation. Subcellular Protein Localization. The use of subcellular localization prediction tools is important in allowing researchers to identify those proteins that are retained and exported from cells. They also have potential commercial application by enabling the identification of possible diagnostic and therapeutic targets as well as allowing inferences as to the functionality of a protein to be made.21 In the current study, a number of bioinformatics tools including PSortB,21 SignalP,22 and SecretomeP23 were utilized. These attempt to assign a subcellular location for each protein, based upon the prediction of known motifs or cleavage sites, through the use of a variety of computational algorithms and networks that analyze their amino acid composition. All 184 proteins identified in this study were analyzed using bioinfromatic tools and assigned cellular localization as shown in Figure 4. The proteins identified as possessing an N-terminal signal peptide were grouped by predicted functional category and further analyzed for the presence of lipobox, RR-motif, and signal peptide cleavage sites to allow assignment, where Journal of Proteome Research • Vol. 5, No. 9, 2006 2469

research articles

Graham et al.

Table 1. Functional Categorization of Proteins Identified within the Insoluble Subproteome of Geobacillus thermoleovorans T80 Protein distribution % functional category

cell wall transport, binding proteins, and lipoproteins sensors (signal transduction) membrane bioenergetics mobility and chemotaxis protein secretion cell division sporulation specific pathways main glycolytic pathway TCA cycle metabolism of amino acids and related molecules metabolism of nucleotides and nucleic acids metabolism of lipids metabolism of coenzymes and prosthetic groups metabolism of phosphate metabolism of sulfur DNA replication DNA recombination DNA packaging and segregation RNA synthesis RNA modification protein biosynthesis Protein modification protein folding adaptation to atypical conditions detoxification miscellaneous similar to unknown proteins

multidimensional analysis

shotgun analysis

combined analysis

soluble fraction analysis16

G. kaustophilus annotated genome9

1.3 7.6 0 7 0.6 0.6 1.3 0.6 5.1 3.2 7 7 1.9

0 3.5 0 4.7 3.5 0 0 2.4 2.4 4.7 9.4 1.2 2.4

1.1 7.1 0 6.5 1.6 0.5 1.1 1.1 4.3 3.3 8.7 6 2.2

1.4 3.4 2 3.4 0 0 1.4 0.7 5.8 3.7 5.8 9.9 8.2

1.5 7.3 0.7 2.4 1.4 0.4 0.5 3.1 5.1 0.9 0.5 5.4 2

1.9 1.9 0 0 0.6 0.6 0.6 2.5 0.6 35 0.6 1.9 0.6 1.3 1.3 8.3

1.2 3.5 0 0 0 2.4 0 2.4 1.2 43.6 0 4.7 1.2 3.5 0 2.4

2.2 2.2 0 0 0.5 1.1 0.5 2.7 0.5 32.6 0.5 2.2 1.1 2.2 1.1 8.2

4.1 7.5 0.3 0.7 0 0 1.4 7.4 0 14.3 0.7 2 0.7 2.7 0 12.2

2.8 3.1 0.2 0.1 0.5 0.5 0.3 4.5 0.9 2.9 0.7 0.2 1 0.9 0.5 31

possible, to a particular secretion pathway32 (Table 2). Of the eight proteins functionally classified as ABC transporters only onesan oligopeptide ABC transporter (Q5L1T4_GEOKA)s conformed to the architecture expected for such a protein, namely, a leader peptide containing an N- and C-domain

completely lacking an intervening hydrophobic domain, in addition to a double-glycine motif N-terminal of the signal peptide cleavage site. Notably, all other such transport proteins contained a significant hydrophobic domain between the Nand C-domain of the predicted signal peptide, in addition to a

Figure 4. Overview of identified proteins from the insoluble subproteome of G. thermoleovorans T80. Cellular localization was predicted based upon the use of PSortB v2.0.4,21 SignalP v3.0,22 and SecretomeP v2.0.23 2470

Journal of Proteome Research • Vol. 5, No. 9, 2006

Shotgun and Multidimensional Proteomic Analysis

research articles

Table 2. Proteins Identified within the Insoluble Subproteome of G. thermoleovorans T80 with Predicted Export Signalsa

a Putative signal peptides were predicted as described by Tjalsma et al.32,34 The hydrophobic H-domain is colored gray. The signal peptide cleavage sites are the last three amino acid residues and are in bold and underlined. For lipoproteins L, the lipobox is italicized and bold. Potential Tat pathway signal peptides T or those merely with a twin arginine motif TB are represented by bold amino acids enclosed within | |. S indicates proteins likely to be secreted via the Sec pathway. A indicates proteins identified as ABC transporters. Twin glycine motifs are indicated thus GG and positively charged amino acids are shown thus (K, R).

number of other motifs usually associated with the twinarginine translocation (Tat) or Sec secretion pathways. After

detailed analysis, it was discovered that none of the 31 proteins contained any C-terminus cell wall anchor motifs commonly Journal of Proteome Research • Vol. 5, No. 9, 2006 2471

research articles

Graham et al.

Table 3. Functional Analysis of Hypothetical Conserved Proteins from G. thermoleovorans T80 Based on Conserved Domain Analysis code

NCBI-CDD

possible function

Q5KZJ5_GEOKA Q5KUG8_GEOKA Q5L380_GEOKA Q5KUU3_GEOKA Q5KWR5_GEOKA Q5KZL3_GEOKA Q5KZW8_GEOKA Q5L076_GEOKA Q5L0R3_GEOKA Q5L312_GEOKA Q8EMU2_OCEIH G84095 Q8EUM1_MYCPE H64245 Q65ET7_BACLD

31754 30132 43985 none 32047 46158 42398 none 43699 48214 none 31670 30880 none 31891 46494

EmrA multidrug resistance efflux pump (COG1566) fructose-1,6-bisphosphatase, glpX-encoded substrate binding domain of ABC-type glycine betaine transport system conserved protein of unknown function preprotein translocase subunit YajC (COG1862) family of unknown function (DUF1028) FeoB, ferrous iron transport protein B protein of unknown function domain of unknown function (DUF322) band_7_2, a subgroup of the band 7 domain of flotillin (reggie) like proteins hypothetical conserved protein uncharacterized protein conserved in bacteria (COG1481) NorM, Na+-driven multidrug efflux pump conserved protein of unknown function FlgJ, muramidase (flagellum-specific) protein of unknown function (DUF1142)

found in Gram-positive bacteria, such as LPxTG or NPQTN.33 This would indicate, as previously reported for B. subtilis, that G. thermoleovorans T80 does not use this cell wall retention mechanism, or that a previously unrecognized cell wall sorting signal is present within these proteins.34 Unexpectedly, a number of ribosomal proteins were identified as having predicted signal peptide sequences and cleavage sites, thus, suggesting that they were secreted proteins. However, sequence analysis of these signal peptides demonstrated that, while they contained motifs resembling those necessary for Tat or Sec transport, they did not in fact have the full N-terminal architecture that would be required to allow us to classify them as secreted proteins. It has been reported that, while the initially developed versions of the bioinformatic tools we used allowed prediction of cellular localization for proteins in Gram-negative bacteria, these methods were however poor when applied to whole proteomes and did not extend to Grampositive bacteria.21 The development of new versions of PSortb and SignalP has sought to overcome these obstacles in proteomic research, and Psortb v2.0 attains a precision of 96% for Gram-positive and Gram-negative bacteria. However, in this study, only 10 of the 31 proteins predicted to contain a signal peptide sequence were identified by Psortb. While SignalP gave a much higher number of signal peptide predictions, within these were a number of ribosomal proteins. This paradox highlights potential problems that may be encountered when using bioinformatic tools and furthermore underscores the necessity for manual interpretation of the biology of such in silico predictions. Hypothetical Conserved Proteins. Nearly one-third of the genome of G. kaustophilus HTA426 is designated as encoding hypothetical conserved proteins.9 We previously assigned putative functionality to some 29 out of 36 gene products using bioinformatics.16 In the current investigation, hypothetical conserved proteins accounted for 8% of the total proteins identified in the insoluble fraction. Having once again established the presence of these proteins within G. thermoleovorans T80, as in our previous study, we wished to try and ascertain how they contributed to the functional processes within the cell. Using the bioinformatics tool NCBI BLASTp (www.ncbi.nml.nih.gov/BLAST/), we have been able to assign putative functions to all but four of these proteins (Table 3). The majority of these proteins were found to be involved in the transport of inorganic ions and small molecules across the cell membrane. One of these, the hypothetical conserved 2472

Journal of Proteome Research • Vol. 5, No. 9, 2006

protein Q5KZW8_GEOKA, contained a conserved FeoB domain, indicating that it is a protein involved in iron(II) transport. The FeoB family of proteins have been identified in a variety of bacterial genomes. They are predicted to be 700-800 amino acids in length and are integral membrane proteins with numerous transmembrane-spanning R-helices.35 The hypothetical conserved protein (Q5KZJ5_GEOKA) contains a conserved domain, suggesting that it is in fact an EmrA multidrug resistance efflux pump. This protein, which is normally anchored in the membrane of bacteria by an Nterminal transmembrane region (see Table 2), is reported to confer resistance to carbonylcyanide, nalidixic acid, and a number of other toxic compounds.36 In G. kaustophilus HTA426, the gene encoding this protein (GK1606) resides downstream of a MarR family transcriptional regulator (GK1605), likely to be involved in multiple antibiotic resistance via a nonspecific resistance system. The protein is also upstream of a multidrug resistance protein (GK1607) of the major facilitator superfamily of integral membrane proteins.9 While G. thermoleovorans T80 was not subject to antibiotic stress during our experiments, it is known that numerous environmental factors can induce the transcription of genes under control of MarR regulators.36

Conclusion By using a combination of gel-based and gel-free proteomic workflows, in combination with bioinformatic tools, we were able to identify a number of G. thermoleovorans T80 proteins with predicted signal peptides. Further in silico investigation of these proteins allowed us to identify signal peptide motifs from four of the five major bacterial classes, namely, (i) Tat; (ii) Sec-type; (iii) lipoprotein; and (iv) ABC transport. While a number of these proteins had the classical signal peptide architecture necessary for secretion by these pathways, many in our study appeared to contain atypical architectures characterized by the absence of some ‘essential’ domain/motif or the presence of domains/motifs from a number of bacterial signal peptide classes. This underlines the care that needs to be taken when dealing with the proteomics of organisms that may possess novel protein-processing strategies, that to date have not been included within the descriptor rules used in the training of motif search programs such as those used in this study. Again, we see the need for careful analysis of experimental proteomic data, not only in terms of ascertaining quality and accuracy of protein identification, but also in assigning broad functional categorization to these proteins in terms of

research articles

Shotgun and Multidimensional Proteomic Analysis

their cellular localizations and, if appropriate, their secretory export pathway. Only if such a robust and rigorous approach is taken can meaningful inferences be made, and thus, we may gain a deeper understanding of cellular processes.

Acknowledgment. We thank the Wellcome Trust for funding the vacation studentship of Miss Catherine E. Pollock (VS/05/ULS/A2). R. L. J. Graham was supported by the Northern Ireland Centre of Excellence in Functional Genomics, with funding from the European Union (EU) Program for Peace and Reconciliation, under the Technology Support for the Knowledge-Based Economy. Supporting Information Available: The master list of proteins identified and other relevant material. This material is available free of charge via the Internet at http://pubs.acs.org. References (1) Romijin, E. P.; Krijgsveld, J.; Heck, A. J. J. Chromatogr., A 2003, 1000, 589-608. (2) Petricoin, E. F.; Liotta, L. A. J. Nutr. 2003, 133, 2476S-2484S. (3) Aivaliotis, M.; Corvey, C.; Tsirogianni, I.; Karas, M.; Tsiotis, G. Electrophoresis 2004, 25, 3468-3474. (4) Hoper, D.; Bernhardt, J.; Hecker, M. Proteomics 2006, 6, 15501562. (5) Altermann, E. R.. W. M.; Azcarate-Peril, M. A.; Barrangou, R.; Buck, B. L.; McAuliffe, O.; Souther, N.; Dobson, A.; Duong, T.; Callanan, M.; Lick, S.; Hamrick, A.; Cano, R.; Klaenhammer, T. R. Proc. Natl. Acad. Sci. U.S.A.. 2005, 102, 3906-3912. (6) Read, T. D. P.; Tourasse, N.; Baillie, L. W.; Paulsen, I. T.; Nelson, K. E.; Tettelin, H.; Fouts, D. E.; Eisen, J. A.; Gill, E. K.; Okstad, O. A.; Helgason, E.; Rilstone, J.; Wu, M.; Kolonay, J. F.; Beanan, M. J.; Dodson, R. J.; Brinkac, L. M.; Gwinn, M.; DeBoy, R. T.; Madpu, R.; Daugherty, S. C.; Durkin, A. S.; Haft, D. H.; Nelson, W. C.; Peterson, J. D.; Pop, M.; Khouri, H. M.; Radune, D.; Benton, J. L.; Mahamoud, Y.; Jiang, L.; Hance, I. R.; Weidman, J. F.; Berry, K. J.; Plaut, R. D.; Wolf, A. M.; Watkins, K. L.; Nierman, W. C.; Hazen, A.; Cline, R.; Redmond, C.; Thwaite, J. E.; White, O.; Salzberg, S. L.; Thomason, B.; Friedlander, A. M.; Koehler, T. M.; Hanna, P. C.; Kolsto, A. B.; Fraser, C. M. Nature 2003, 423, 81-86. (7) Takami, H. N. K.; Takaki, Y.; Maeno, G.; Sasaki, R.; Masui, N.; Fuji, F.; Hirama, C.; Nakamura, Y.; Ogasawara, N.; Kuhara, S.; Horikoshi, K. Nucleic Acids Res. 2000, 28, 4317-4331. (8) Takami, H.; Takaki Y.; Uchiyama, I. Nucleic Acids Res. 2002, 30, 3927-3935. (9) Takami, H.; Takaki Y.; Chee, G. J.; Nishi, S.; Shimamura, S.; Suzuki, H.; Matsui, S.; Uchiyama, I. Nucleic Acids Res. 2004, 32, 62926303. (10) Wisotzkey, J. D.; Jurtshuk, P., Jr.; Fox, G. E.; Deinhard, G.; Poralla, K. Int. J. Syst. Bacteriol. 1992, 42, 263-269. (11) Dufresne, S.; Bousquet, J.; Boissinot, M.; Guay, R. Int. J. Syst. Bacteriol. 1996, 46, 1056-1064.

(12) Touzel, J. P.; O’Donohue, M.; Debeire, P.; Samain, E.; Breton, C. Int. J. Syst. Evol. Microbiol. 2002, 50, 315-320. (13) Ash, C.; Farrow, J. A. E.; Wallbanks, S.; Collins, M. D. Lett. Appl. Microbiol. 1991, 13, 202-206. (14) Rainey, F. A.; Fritze, D.; Stackebrandt, E. FEMS Microbiol. Lett. 1994, 115, 205-211. (15) Nazina, T. N.; Tourova, T. P.; Poltaraus, A. B.; Novikova, E. V.; Grigoryan, A. A.; Ivanova, A. E.; Lysenko, A. M.; Petrunyaka, V. V.; Osipov, G. A.; Belyaev, S. S.; Ivanov, M. V. Int. J. Syst. Evol. Microbiol. 2001, 51, 433-446. (16) Graham, R. L. J.; Pollock, C. E.; Ternan, N. G.; McMullan, G. J. Proteome Res. 2006, 5, 822-828. (17) Obojska, A.; Ternan, N. G.; Lejczak, B.; Kafarski, P.; McMullan, G. Appl. Environ. Microbiol. 2002, 68, 2081-2084. (18) Mawuenyega, K. W.; Kaji, H.; Yamauchi, Y.; Shinkawa, T.; Saito, H.; Taoka, M.; Takahashi, N.; Isobe, T. J. Proteome Res. 2003, 2, 23-35. (19) Bradford, M. M. Anal. Biochem. 1976, 72, 248-254. (20) Weatherly, D. B.; Atwood, J. A., III; Minning, T. A.; Cavola, C.; Tarleton, R. L.; Orlando, R. Mol. Cell. Proteomics 2005, 4, 762772. (21) Gardy, J. L.; Laird, M. R.; Chen, F.; Rey, S.; Walsh, C. J.; Ester, M.; Brinkman, F. S. L. Bioinformatics 2005, 21, 617-623. (22) Bendtsen, J. D.; Nielsen, H.; von Heijne, G.; Brunak, S. Mol. Biol. 2004, 340, 783-795. (23) Bendtsen, J. D.; Kiemer, L.; Fausbøll, A.; Brunak, S. Microbiology 2005, 5, 58-70. (24) Bunai, K.; Yamane, K. J. Chromatogr., B: Anal. Technol. Biomed. Life Sci. 2005, 815, 227-236. (25) Hookeun, L.; Yi, E. C.; Wen, B.; Reily, T. P.; Pohl, L.; Nelson, S.; Aebersold, R.; Goodlett, D. R. J. Chromatogr., B: Anal. Technol. Biomed. Life Sci. 2004, 803, 101-110. (26) Washburn, M. P.; Yates, J. R., III. Curr. Opin. Microbiol. 2000, 3, 292-297. (27) Chong, P. K.; Wright, P. C. J. Proteome Res. 2005, 4, 1789-1798. (28) Zeigler, D. R. Int. J. Syst. Evol. Microbiol. 2005, 55, 1171-1179. (29) Marty-Mazars, D.; Horiuchi, S.; Tai, P. C.; Davis, B. D. J. Bacteriol. 1983, 154, 1381-1388. (30) Empadinhas, N.; Albuquerque, L.; Costa, J.; Zinder, S. H.; Santos, M. A. S.; Santos, H.; da Costa, M. S. J. Bacteriol. 2004, 186, 40754084. (31) Lamosa, P.; Martins, L. O.; da Costa, M. S.; Santos, H. Appl. Environ. Microbiol. 1998, 64, 3591-3598. (32) Tjalsma, H.; Bolhuis, A.; Jongbloed, J. D. H.; Bron, S.; van Dijl, J. M. Microbiol. Mol. Biol. Rev. 2000, 64, 515-547. (33) Desvaux, M.; Dumas, E.; Chafsey, I.; Hebraud, M. FEMS Microbiol. Lett. 2006, 256, 1-15. (34) Tjalsma, H.; Antelmann, H.; Jongbloed, J. D. H.; Braun, P. G.; Darmon, E.; Dorenbos, R.; Dubois, J-Y. F.; Westers, H.; Zanen, G.; Quax, W. J.; Kuipers, O. P.; Bron, S.; Hecker, M.; van Dijl, J. M. Microbiol. Mol. Biol. Rev. 2004, 68, 207-233. (35) Dashper, S. G.; Butler, C. A.; Lissel, J. P.; Paolini, R. A.; Hoffman, B.; Veith, P. D.; O’Brien-Simpson, N. M.; Snelgrove, S. L.; Tsiros, J. T.; Reynolds, E. C. J. Biol. Chem. 2005, 280, 28095-28102. (36) Lewis, K. Trends Biochem Sci. 1994, 19, 119-123.

PR0602444

Journal of Proteome Research • Vol. 5, No. 9, 2006 2473