Identification and Characterization of the Sulfolobus s olfataricus P2

Apr 27, 2005 - The increasing number of genome sequencing projects provides a ...... de Jong, I.; Jeffries, A. C.; Kozera, C. J.; Medina, N.; Peng, X...
0 downloads 0 Views 379KB Size
Identification and Characterization of the Sulfolobus solfataricus P2 Proteome Poh Kuan Chong and Phillip C. Wright* Biological and Environmental Systems Group, Department of Chemical and Process Engineering, University of Sheffield, Mappin Street, Sheffield S1 3JD, United Kingdom Received April 27, 2005

Via combined separation approaches, a total of 1399 proteins were identified, representing 47% of the Sulfolobus solfataricus P2 theoretical proteome. This includes 1323 proteins from the soluble fraction, 44 from the insoluble fraction and 32 from the extra-cellular or secreted fraction. We used conventional 2-dimensional gel electrophoresis (2-DE) for the soluble fraction, and shotgun proteomics for all three cell fractions (soluble, insoluble, and secreted). Two gel-based fractionation methods were explored for shotgun proteomics, namely: (i) protein separation utilizing 1-dimensional gel electrophoresis (1DE) followed by peptide fractionation by iso-electric focusing (IEF), and (ii) protein and peptide fractionation both employing IEF. Results indicate that a 1D-IEF fractionation workflow with three replicate mass spectrometric analyses gave the best overall result for soluble protein identification. A greater than 50% increment in protein identification was achieved with three injections using LC-ESIMS/MS. Protein and peptide fractionation efficiency; together with the filtration criteria are also discussed. Keywords: 2-DE • shotgun • LC-MS/MS • multiple injections • pre-fractionation • S. solfataricus

Introduction The increasing number of genome sequencing projects provides a basis for even greater discovery. Together with the development of post-genomics tools such as proteomics, we are now in a position to piece together the various layers of information, ranging from the transcriptome through the proteome and metabolome toward a future modeling framework implied by systems biology. Rapid technological advances in mass spectrometry have dramatically aided proteomics research, thus enabling greater breakthroughs in the understanding of an organism’s physiology. However, despite these advances, there are still a number of hurdles in the way to achieve significant proteome coverage. We address several of these below. The classic proteomics approach is defined as the combination of two-dimensional gel electrophoresis (2-DE), mass spectrometry, and bioinformatics tools.1 This approach is commonly used in the identification of soluble,2 membrane3 and secreted proteins.4 However, the main drawback of 2-DE is that it is still not possible/practical to separate the whole proteome because of limitations in the detection of low abundance proteins; difficulties in resolving proteins with extreme pIs and the physio-chemical properties of hydrophobic proteins.1 Furthermore, 2-DE has a greater limitation in analyzing membrane bound proteins due to their poor solubility.3 Due to these limitations, shotgun proteomics has developed an alternative approach. Shotgun proteomics can be defined as a * To whom correspondence should be addressed. Tel.: +44(0)114 22 7577. Fax: +44(0)114 2227501. E-mail: [email protected]. 10.1021/pr0501214 CCC: $30.25

 2005 American Chemical Society

(2-D) gel-free approach5 used for direct analysis of a peptide mixture, that is facilitated by multidimensional liquid chromatographic separation prior to mass spectrometric analysis,5-11 although recently gel-based fractionation11,12 is also employed in shotgun proteomics to simplify the peptide mixture. The shotgun approach has shown some utility for analyzing membrane proteins,6-8 despite this however, the solubility remains a challenge. Various methods have been directed toward enriching membrane fractions, and include the use of strong detergents,7 high percentages of organic solvents8 and organic acids6 prior to proteolytic digestion. For example, Washburn et al.6 dissolved yeast’s membrane proteins in 90% formic acid in the presence of cyanogen bromide before tryptic digestion, with the result that 131 integral membrane proteins were successfully identified via Multidimensional Protein Identification Technology (MudPIT), utilizing strong cation exchange and reverse phase chromatography for peptide separation. The key toward a better understanding for a systems biology framework using a proteomics approach is to reduce the samples’ complexity prior to mass spectrometry. Pre-fractionation techniques prior to mass spectrometric analysis include multidimensional liquid chromatography5,9,10 using strong cation exchange (SCX) and reversed-phase (RP) columns, gelbased fractionation techniques,11,12 liquid-phase iso-electric focusing,13 and capillary electrophoresis14 have demonstrated improvements in the number of proteins identified. Furthermore, it was reported11 that a more than 2-fold increase in proteome coverage was obtained when multi-levels of fractionation were carried out, i.e., protein pre-fractionation folJournal of Proteome Research 2005, 4, 1789-1798

1789

Published on Web 09/07/2005

research articles lowed by peptide separation. Various combinations of protein/ peptide fractionation have been studied11,12 and compared to determine an optimal condition for the system. However, to date, there is no single ideal orthogonal fractionation combination to cover the whole proteome of an organism. Some general rules, though, are now appearing. For example, a study conducted by Gan et al.11 reports that at least two different protein-peptide separations are required to obtain a better proteome coverage, with a gel-based IEF-IEF fractionation combination being a better approach in that instance. The large number of proteins now being identified via shotgun proteomics has given rise to the challenge of eliminating an even greater number of false positives. Many researchers have tackled this by using different databases for searching and applying various filter criteria.15-17 Single genome databases, or smaller databases, have also been suggested to minimize the false positive and negative identification rates.11,16 Other alternatives include use of reverse databases9 to eliminate false positives identification. Some studies only considered proteins with significant database identification scores9,17 as a true hit. According to Zhu et al.18 only proteins with >2 peptides identified ought to be considered as being true hits, and proteins identified by a single peptide should be taken into account only if they had at least 2 MS/MS spectra found. The criteria suggested in these studies are logical, yet their reliability still needs to be proven more widely. We employed both classical 2-DE and gel-based shotgun proteomics approaches to study the proteome of the thermoacidophilic crenarcheon Sulfolobus solfataricus P2 (genome size of 3.0 Mbps and 3001 predicted ORFs).19 These organisms flourish at high temperature and in acidic environments, surviving on sulfur and hydrogen.20 To date, in Archaea, only shotgun proteomics of Methanococcus jannaschii has been carried out, with 963 proteins identified,18 representing 54% of the predicted total proteome (genome size of 1.66 Mbps with 1783 predicted ORFs).21 The aim of this study was to achieve maximum proteome coverage (including soluble, insoluble and secreted proteins) in S. solfataricus by comparing and contrasting two gel-based shotgun proteomics approaches i.e., 1-DE for protein separation, followed by IEF peptide separation (1DIEF), and IEF for both protein and peptide separation (IEFIEF), to the 2-DE approach. These multidimensional separations are then coupled with multiple mass spectrometer injections. Furthermore, the efficiency of protein and peptide separation was investigated to ensure the reliability of the identifications.

Experimental Procedures An overview of the experimental workflow is summarized in Figure 1. Microorganism Growth Condition. Sulfolobus solfataricus P2 was grown in pH 4.0 medium as described by Snijders et al.22 Each culture was aerobically cultivated at 80 °C in 50 mL of growth medium in 250 mL long neck volumetric flasks, placed in a horizontal gently shaking water bath, supplemented with 4.0 g/L of filtered sterilized glucose and 25 µL of Wolfe’s vitamins stock.23 All chemicals were purchased from SigmaAldrich (Gillingham, Dorset, UK) unless otherwise stated. Protein Extraction and Quantification. S. solfataricus was harvested in mid-exponential phase (OD530 at 1.0) by centrifugation at 5000 × g for 15 min at room temperature. The cell pellet was re-suspended with Tris buffer,24 consisting of 40 mM Tris-HCl (pH 8.7), 1 mM ascorbic acid, 5 mM MgCl2, 10 mM 1790

Journal of Proteome Research • Vol. 4, No. 5, 2005

Chong and Wright

Figure 1. Summary of the experimental workflows carried out in this study, including methods used for identification in soluble, insoluble, and secreted proteins.

PVPP, 1 mM DTT, and 5% of a protease inhibitor cocktail. The soluble protein was extracted using liquid nitrogen coupled with mechanical cracking as described by Gan et al.,11 with some modifications. These included re-suspending the remaining pellet in Tris buffer after recovering the soluble proteins, followed by sonication for 10 min at room temperature and centrifugation at 21 000 × g for 30 min at 4 °C. This additional step increased the yield of total soluble proteins, and as the washing step prior to insoluble protein extraction. The supernatant containing the soluble proteins was recovered in a micro-centrifuge tube and stored at -20 °C for further analysis. The total protein concentration from the cell extract was measured using the RC DC Protein Quantification Assay (BioRad, Hertfordshire, UK) according to the manufacturer’s protocol. The pellet remaining from the liquid nitrogen cracking was weighed, and 300 µg of the pellet was dissolved in 90% formic acid (FA)6 by vortexing, followed by sonication for 15 min at room temperature. The supernatant was recovered by centrifugation at 21 000 × g at 4 °C for 15 min. Cyanogen bromide was added to the sample at 5 volumes: 1 weight (cell pellet) ratio and incubated overnight at 37 °C in dark. All of the cyanogen bromide handling was done in a fume hood for safety reasons. Subsequently, the sample was dried completely in a vacuum concentrator (Model 5301, Eppendorf, Cambridgeshire UK) before re-suspension in 8 M urea. The sample was reduced with 10 mM DTT in 50 mM ammonium bicarbonate for 1 h at 56 °C, followed by alkylation with 55 mM iodoacetamide in 50 mM of ammonium bicarbonate for 30 min in the dark at 37 °C. Prior to tryptic digestion, the sample was diluted with 40 mM ammonium bicarbonate in 9% acetonitrile to give a final urea concentration of less than 1 M. Trypsin was added with weight ratio of 50 (cell pellet): 1 (trypsin) and the sample was incubated overnight at 37 °C. The digested peptides were dried

Proteomics Analysis of S. solfataricus P2

by vacuum concentration and re-suspended in 0.1% TFA followed by C18 cleanup, as discussed below. To obtain the secreted proteins, the media remaining from the cell harvest was recovered for trichloroacetic acid (TCA)/ acetone precipitation. At least two volumes of acetone, consisting of 20% v/v TCA, was added into the media and incubated at 4 °C overnight. The precipitated proteins were recovered by centrifugation at 10 000 × g at 4 °C for 30 min and washed with ice-cold acetone before re-suspension in 50 mM ammonium bicarbonate. The protein concentration was determined using the RC DC Protein Quantification Assay (Bio-Rad). A 400 µg portion of secreted protein was reduced, alkylated and trypsin-digested similarly to the insoluble protein fraction. The peptides were dried by vacuum concentration. 2-DE. Prior to IEF, 500 µg of soluble protein (for each strip) was precipitated using TCA/acetone. For this, the protein sample was mixed with two volumes of acetone consisting of 10% v/v TCA and 1% v/v of tributylphosphine (TBP). The sample was incubated overnight at -20 °C. After incubation, the sample was centrifuged at 21 000 × g for 30 min at 4 °C, and washed with ice-cold acetone prior to re-suspension in 50 µL of Destreak Rehydration Buffer (GE Healthcare, Buckinghamshire, UK) containing 0.5% v/v Pharmalyte pH 3-10. IPG strips pH 3-10 NL (Bio-Rad) were rehydrated with 330 µL of Destreak Rehydration Buffer (GE Healthcare) overnight at room temperature. A cup-loading technique2 was employed for IEF. IPG strips were focused using a Protean II IEF Cell (BioRad). A “desalting step and progressive voltage” program (50 V for 9 h, 200 V for 1 h, linear gradient to 1000 V over 1 h, linear gradient to 10 000 V over 6 h, and 10 000 V for a total of 40 kV-hr giving a total of 18.5 h for the run) was employed as described by Joubert-Caron et al.25 After the first dimension, IPG strips were reduced with 2% w/v DTT for 15 min and alkylated with 2.5% w/v iodoacetamide for another 15 min in an equilibration buffer, consisting of 6 M urea, 2% w/v SDS, 0.375 M Tris-HCl (pH 8.8), 30% v/v glycerol. The second dimension separation was carried out using a Protean II XL Cell System (Bio-Rad) with 10% polyacrylamide gel. The gels were maintained at 16 mA/gel for 30 min, followed by 24 mA/gel until the dye front ran out of the gel. The total run time for the second dimension was approximately 5 h. Gels were stained using Bio-Safe Coomassie Blue (Bio-Rad) and de-stained in deionized water overnight. Visualization and spot identification were carried out using a GS-800 Densitometer (Bio-Rad) and PDQuest software v7.2 (Bio-Rad) based on triplicate gels. All the visible spots were excised for in-gel tryptic digestion. In-Gel Tryptic Digestion. Excised gel pieces/strips from 2-DE and shotgun proteomics approaches (1D-IEF and IEF-IEF methods) were de-stained twice using 200 mM ammonium bicarbonate in 40% v/v acetonitrile. The gel pieces were dehydrated by incubation with 100% v/v acetonitrile for 10 min in room temperature. The gel pieces were further dried (at room temperature) by vacuum concentration for 15 min. Reduction and alkylation was carried out as described before for excised strips obtained from the shotgun approach. This step was not necessary for 2-DE spots before adding trypsin. A final concentration of 10% acetonitrile was added to the solution to increase the trypsin activity,26 and the gel pieces were incubated overnight at 37 °C. After overnight incubation, the liquid was recovered into a new centrifuge tube. Peptides were extracted twice with 30% v/v acetonitrile in 3.5% v/v formic acid and incubated at 37 °C for 15 min, followed by

research articles

Figure 2. Protein distribution in each fraction. (A) 1D-IEF workflow employing 1-DE for protein separation using 15% SDS-PAGE mini gel. Six fractions were excised from the gel according to the range of molecular weight shown. (B) IEF-IEF workflow utilizing IEF in protein separation and six fractions were obtained from 17 cm, pH 3-10 NL IPG strips. Larger length of cuts were allocated for the basic and acidic ends of the strip, as there are less protein to be identified based on the S. solfataricus theoretical proteome annotation. The pI ranges for each fraction were estimated based on the 2-DE gel map.

one change of 50% v/v acetonitrile in 5% v/v formic acid for 30 min at 37 °C and one change of 100% v/v acetonitrile for 15 min at 37 °C. All the liquid recovered containing peptides were dried by vacuum concentration. IEF-IEF. This method employed IEF for both protein and peptide fractionation. First, 2 mg of soluble protein was TCA/ acetone precipitated prior to IEF. IEF conditions were identical to those described for 2-DE. After IEF, each strip was cut into 6 fractions as shown in Figure 2(B), and in-gel tryptic digestion was performed on each fraction. After tryptic digestion, the dried peptides were re-suspended in 8 M urea and 0.5% v/v Pharmalyte pH 3-10. IPG strips pH 3-10NL (Bio-Rad) were rehydrated overnight with 8 M urea prior to IEF of the peptides. The IEF condition was identical to that used for 2-DE. After IEF peptide fractionation, each strip was cut into 5 fractions of equal length (giving a total of 30 fractions). Peptides were eluted from the strips as per the elution step for the in-gel digestion protocol. The recovered elutant was dried by vacuum concentration, and re-suspended in 0.1% v/v TFA prior to C18 cleanup. 1D-IEF. In this method, soluble proteins were first separated using conventional 1D SDS-PAGE followed by IEF peptide fractionation. 2 mg of soluble protein was partially dried by vacuum concentration before adding two volumes of Laemmli Buffer (Bio-Rad). The sample was denatured at 95 °C for 10 min prior to loading onto the gel. The gel size was 7 cm × 7 cm × 1 mm with a 4% stacking gel on top of the 15% resolving gel. A broad range protein molecular weight marker (Promega, Southampton, UK) was used as a molecular weight indicator. The gel was run on a mini-Protean III electrophoresis system (Bio-Rad) at 50 V for an hour, followed by 70 V until the dye front ran out of the gel. Bio-Safe Coomassie Blue (Bio-Rad) was Journal of Proteome Research • Vol. 4, No. 5, 2005 1791

research articles used for staining, and de-staining was carried out using deionized water overnight. The mini gel was then cut into 6 fractions as shown in Figure 2(A). Each excised fraction was trypsin-digested. Further peptide separation was carried out using IEF. A total of 30 fractions were collected and peptide elution steps were similar as those described above. IEF Peptides for Insoluble and Secreted Proteins Fractions. The digested, dried peptides from insoluble and secreted protein fractions were re-suspended in 8 M urea. IEF peptide separation as previously described was carried out. A total of 10 fractions were collected for each insoluble and secreted protein fraction. Peptide elution and cleanup was performed. C18 Clean-Up. To remove excess urea from the recovered peptides, a C18 Discovery DSC-18 SPE column (100 mg capacity, Supelco, Sigma) was conditioned twice with 1 volume of 100% methanol and 2 volumes of 0.1% v/v TFA. The sample was then loaded onto the column and washed twice with 1 volume of 0.1% v/v TFA before eluting with 1 volume of 0.1% v/v TFA in 80% v/v acetonitrile and 1 volume of 100% v/v methanol. Samples were dried and stored at -20 °C prior to mass spectrometric analysis. Mass Spectrometric Analysis. Mass spectrometry was performed using a QStar XL Hybrid ESI Quadrupole time-of-flight tandem mass spectrometer, ESI-qQ-TOF-MS/MS (Applied Biosystems, Framingham, MA, USA; MDS-Sciex, Concord, Ontario, Canada) coupled with an online capillary liquid chromatograph (Famos, Switchos and Ultimate liquid chromatography system from Dionex/LC Packings, Amsterdam, The Netherlands) as described elsewhere.11 The peptide mixture was separated on a PepMap C-18 RP capillary column (LC Packings). For shotgun proteomics analyses, a 75 min gradient was used, where the gradient started with 5% Buffer B (0.1% formic acid in 95% acetonitrile) and 95% Buffer A (0.1% formic acid in 5% acetonitrile) for 5 min, followed by 30 min of ramping from 5% to 40% Buffer B. This was then held for 5 min at 90% Buffer B and finally 5 min with 5% Buffer B. For 2-DE spots, a 45 min gradient was used, starting with 5% Buffer B for 3 min, followed by ramping from 5% to 55% Buffer B for 30 min, then 90% Buffer B for 5 min and finally 5% Buffer B for 5 min. Both gradients, either for shotgun or 2-DE, had a flow rate of 0.3 µL/min. The mass spectrometer was set to perform data acquisition in the positive ion mode, with a selected mass range of 300-2000 m/z. Peptides with +2 and +3 charge states were selected for tandem mass spectrometry. Database Searching. Protein identification was carried out using ProID software v1.0 (Applied Biosystems, MDS-Sciex). The search was performed against the S. solfataricus P2 single genome database (3001 entries) downloaded from NCBI (March 2004). The search parameters allowed for peptide and MS/MS tolerance up to 1.2 Da and 0.8 Da, respectively; one miscleavage of trypsin; oxidation of methionine and carbamidomethylation of cysteine. A filter of 80% confidence and minimum score of 10 was set following the recommendation from Applied Biosystems. For 2-DE spot analysis, only proteins with >2 peptides were taken into account as a positive identification. The criteria used for shotgun analysis is discussed later. For insoluble protein identification, the 1.6b10 online version of Mascot (http://www.matrixscience.com/cgi/search_form.pI) was used, as it allowed for cyanogen bromide and trypsin as cleavage reagents for protein digestion. The search parameters were set as per the parameters used in ProID, except the MSDB database was used with Archaea taxonomy. By default, only 1792

Journal of Proteome Research • Vol. 4, No. 5, 2005

Chong and Wright

proteins with MOWSE scores > 38 were considered as significant hits, regardless of the number of peptides.

Results and Discussion Overview. We were able to identify a total of 1399 unique proteins from S. solfataricus P2 in this study. This is a significant coverage, comprising almost 50% of the theoretical proteome of S. solfataricus P2. A total of 1323 proteins were identified in the soluble fraction using a combination of 2-DE and shotgun workflows. In contrast, 285 and 223 proteins were found in the insoluble and secreted fractions, respectively. A total of 241 and 191 proteins identified (data not shown) in the insoluble and secreted fractions were also found in the soluble fraction. Soluble protein carryover to the insoluble fraction may result from insufficient washing prior to cyanogen bromide digestion. By eliminating these repeat proteins, only 44 and 32 unique proteins were found in the insoluble and secreted fractions, respectively. The 42 proteins identified in the insoluble fraction can be considered as membrane or trans-membrane proteins as high hydrophobicity was predicted (data not shown) using Kyte-Doolittle Hydropathy Plots27 (refer to http://www.bio.davidson.edu/courses/compbio/flc/home.html), available online. The master list of proteins identified and the 2-DE gel mapping are available on our website, (http://www.sheffield.ac.uk/wrightlab/sulfolobus/) and in the Supporting Information. Taking into account the total number of proteins found in all three cell fractions, proteins were characterized into different categories according to their functionality annotated in the genome.19 Table 1 (Column 6) shows that the majority of proteins (37%) were classified as hypothetical, followed by 11% involved in energy metabolism, followed by central intermediary metabolism; proteases and protein modification; cell envelope and membrane; nucleotides had the smallest percentage (approximately 2%) of proteins identified. Comparing this to the protein characterization of the whole annotated proteome in Table 1 (Column 2), a relatively similar distribution was observed, with hypothetical proteins being the largest group. In the genome annotation, the second largest group is the IS element, followed by energy metabolism, but the reverse order was observed in the actual experimental proteome identification process. The total proteome coverage for each group were good, especially for amino acid biosynthesis, translation, lipids, replication, and repair which covered more than 70%. Among the largest functional groups, IS elements, transport, and, hypothetical proteins had the poorest coverage, less than 40%, whereas the remaining functional groups lay between 50% and 70% coverage. In the insoluble fraction, the majority of proteins were tabulated around hypothetical proteins (34%) and energy metabolism (23%). No protein coverage was observed in 4 of the functional groups as shown in Table 1 (Column 4). In the secreted fraction, hypothetical proteins (56%) remained the largest group, followed by energy metabolism (13%) and transport (13%). A total of 8 out of 16 functional groups had no proteins identified (refer to Table 1, Column 5). Soluble proteins found using 2-DE or shotgun were widely distributed across all the functional groups. Both methods had good protein distribution, yet shotgun showed a better proteome coverage (40%) compared to 2-DE (14%). Among the coverage, more than 10% of the unique proteins found in 2-DE were covered in the shotgun approach. For a better understanding of inter- and intracellular activity within the organism, a

research articles

Proteomics Analysis of S. solfataricus P2

Table 1. Protein Distribution of Soluble, Insoluble, and Secreted Proteins Identified According to Their Annotated Functionality Groupsd annotated proteome column no.

categories

IS Element energy metabolism transport amino acid biosynthesis transcription and regulation cofactor biosynthesis lipids cellular processes translation replication and repair centre intermediary metabolism proteases and protein modification cell envelope and membrane nucleotides hypothetical proteins uncharacterized and other total proteins identified total proteome coverage

1

2

no. of proteins

protein distribution (%)

398 244 173 123 71 84 79 65 128 48 34 42 44 44 1356 68

13.3 8.1 5.8 4.1 2.4 2.8 2.6 2.2 4.3 1.6 1.1 1.4 1.5 1.5 45.2 2.3 3001 100%

a

protein distribution 3 soluble fraction shotguna (%)

2-DEa (%)

3.9 8.3 11.3 16.0 3.9 2.5 7.8 8.8 3.0 2.1 4.2 3.0 4.9 4.6 2.6 2.5 7.6 7.6 2.9 3.5 1.8 2.1 1.8 3.2 2.0 1.2 2.2 2.8 36.9 28.5 3.2 3.2 1208 432 40.3% 14.4% 44.1%

4

5

6

7

insoluble fractionb (%)

secreted fractionb (%)

total proteinsc (%)

functionality coveragec| (%)

0 22.7 9.1 2.3 9.1 2.3 6.8 4.5 0 0 0 2.3 2.3 0 34.1 4.5 44

3.1 12.5 12.5 0 0 0 0 0 6.3 3.1 0 0 3.1 3.1 56.3 0 32

1.5%

1.1%

6.1 11.2 4.2 6.9 2.9 3.9 4.6 2.6 6.9 2.8 1.6 1.8 2.0 2.1 37.4 3.0

21.9 65.2 34.7 79.7 57.7 66.7 82.3 56.9 76.6 83.3 67.6 61.9 63.6 68.2 39.3 63.2 1399 46.6%

b

Including the repeated soluble proteins found in either shotgun or 2-DE. Only the unique proteins were taken into account (excluding the repeated proteins). c All the unique proteins identified in this study (soluble, insoluble and secreted fractions). d The ‘Functionality Coverage’ was defined as the percentage of the proteins identified in the particular functional group against the theoretical annotated proteome of the group.

proteomics tool with greater proteome coverage and good protein distribution is essential as we move toward a systems biology understanding of an organism. We propose that the shotgun proteomics approach employed here is an alternative for soluble protein identification to the conventional 2-DE. However, various pre-fractionation methods such as sequential extraction via a series of buffers,24 reversed-phase chromatography28 and liquid-phase iso-electric focusing29 have also been employed prior to 2-DE to improve the proteome coverage. 2-DE Analysis. Using a conventional 2-DE approach, an average of 450 spots were detected on triplicate gels (see Figure 3), however only 333 spots were excised for tryptic digestion after detailed analysis. From those excised spots, 432 unique proteins were identified with >2 peptides, giving a ratio of 1.3 proteins per spot. Each spot on 2-DE gels should correspond to single protein, yet in this study, some proteins have very similar molecular weight and pI values, resulting in difficulties in separating them on the gel without resorting to prefractionation and/or narrow range IPG strips. For example, SSO2044 and SSO1907 both have a similar pI (7.0) and molecular weight (46 kDa) and eventually ended up in a same spot on the gel (spot no. 156 in Figure 3). Besides, proteins can carry over through smearing, but the gel depicted in Figure 3 showed distinct spot features, indicating that the smearing problem was minimal in our hands. A total of 25 excised spots gave no confident protein identifications. These spots were formatted as bold and underlined in Figure 3. Examining these spots specifically, they appeared to be of low abundance as visualized on the gel. In spot detection, low abundance spots had a low signal-to-noise ratio caused by uneven background staining, which eventually may lead to false identification. We observed that some proteins were identified across several spots. This finding likely resulted from the possibility of protein isoforms30 and potential co- and post-translation modifications described by Harry et al.31 For

example, Aconitate hydratase (SSO1095) was found in spots 26, 27, 28, 29, 30, 31, and 32. We observed that proteins appeared on the gel in areas not corresponding to their pI range, and approximately 40 proteins found had pI > 10 (theoretically outside the range of the gel). At this stage, we do not have a robust reason for these findings, and this is the subject of further work. Not all of the proteins found from 2-DE can be identified using the shotgun workflows or vice versa. A total of 115 and 891 proteins identified via 2-DE and shotgun, respectively, were found to be unique (here we define ‘unique’ as those proteins that are not found in another method), as some proteins are better resolved in the shotgun approach and some others in the 2-DE approach.32 This suggests that both the gel-based proteomics workflows (2-DE or gel-based shotgun proteomics) are complementary to each other. In addition, the pattern of proteins identified via either method (2-DE or shotgun proteomics) was highly similar to the theoretical proteome of S. solfataricus, which suggests the gel-based proteomics approach yields a broad proteome coverage (refer to Figure 4). This corresponds to a previous study11 which revealed that different fractionation methods are necessary to cover the whole proteome of an organism, as each method has its own unique proteins. For example, a total of 415 Synechocystis sp. PCC 6803 proteins were reported11 to be unique from six different fractionation combinations. 2-DE had a low protein identification throughput, with approximately 25 proteins identified per day (refer to Table 2), compared to an average of 68 proteins identified per day in the shotgun workflows (1D-IEF and IEF-IEF). The total processing time for the shotgun approach was 13 days per workflow instead of 17.5 days required to complete the 2-DE analysis. Nevertheless, a higher 2-DE throughput may be achieved if the spots were analyzed using MALDI-MS/MS or ESI-MS/MS coupled with direct chip nanospray infusion, using, for example an Advion Nanomate (Advion Biosciences, Journal of Proteome Research • Vol. 4, No. 5, 2005 1793

research articles

Chong and Wright

Figure 3. 500 µg of soluble proteins were loaded onto a 2-DE gel (10% SDS-PAGE) using a 17 cm, 3-10NL IPG strip. 478 spots were detected and 333 spots were excised. The excised spots were labeled as in the figure and their corresponding ID can be obtained online (http://www.sheffield.ac.uk/wrightlab/sulfolobus/) and in the Supporting Information.

Figure 4. Proteins found using: (A) 2-DE gel using 3-10NL IPG strip on 10% SDS PAGE (B) shotgun approaches, i.e., 1D-IEF and IEF-IEF methods coupled with three injections (C) The theoretical proteome of P2. The symbols used here represent proteins found in (+) 2-DE, (×) shotgun and (4) theoretical annotated proteome.

Norfolk, UK),33,34 as spot analysis can be done within minutes instead of 45 min on our standard LC-MS/MS run. Optimization of MS times, LC gradients and use of infusion electrospray is the subject of future work. Shotgun Analysis. We employed two gel-based shotgun proteomics approaches, i.e., 1-DE protein separation followed by IEF peptide separation (1D-IEF), and IEF separation for both the proteins and peptides (IEF-IEF). Use of the two methods coupled with three injections yielded 570 common proteins found across both methods, out of a total of 1208 reliable 1794

Journal of Proteome Research • Vol. 4, No. 5, 2005

unique protein identifications (Figure 5C). The total number of reliable proteins found in the 1D-IEF workflow (941 proteins) was approximately 13% greater than those found via the IEFIEF workflow (837 proteins). Each method contributed a number of distinct proteins respectively; here we define ‘distinct’ proteins as those not found repeated in another workflow. No particular bias/trend either in pI or molecular weight was observed in these proteins. This finding corresponds to the suggestion11 that a 1D-IEF approach is better than IEFIEF for soluble protein identification.

research articles

Proteomics Analysis of S. solfataricus P2

Table 2. Total Reliable Unique Proteins Identified in Soluble, Insoluble and Secreted Fractions and the Time Required for Each Workflow Were Showne insoluble proteins

secreted proteins

330

10

10

soluble proteins 1D-IEF no. of injection no. of fractions/ spots possible identifications unique proteinsa unique proteins in each injection total unique proteins in each method total unique proteins labor time (days) MS time (days) total processing time (days) protein throughput (proteins/day)

1

IEF-IEF

2 30

3

613

699

730

529 471

602 261

650 209

1

2-DE

2 30

3

595

613

640

n/ad

n/a

291

530 467

559 222

557 148

432 n/a

285 n/a

223 n/a

432b

n/a

n/a

44c

32c

5 2 7

5 2 7

41

32

941b

837b 1323c

7 6 13

7 6 13

6 11.5 17.5

72

64

25

a This was the reliable proteins based on multiple injection analysis. b The summation of this value does not correspond to the Total Unique Proteins as repeated proteins were taken into account. c This value excluded the repeated proteins found in other method/fraction. d n/a Data not available. e Detailed analysis of each injection was made for the soluble fraction, since it had the greatest protein complement.

Figure 5. Multiple (three) injections analysis based on soluble fraction. (A) 1D-IEF workflow and (B) IEF-IEF workflow. (C) Overall total reliable proteins obtained from the soluble fraction using both 2-DE and shotgun proteomics approaches.

The use of 1D gel for protein separation improved the protein identifications compared to the IEF approach. Beausoleil el al.35 reported 967 unique proteins were identified from a HeLa cell lysate employing a 1D gel for protein separation followed by SCX peptide separation. Another study, this time on Synechocystis, reports11 that a 50% increment in protein identification was achieved using 1D gel for protein separation compared to IEF separation, where 308 and 204 proteins were found via 1D-SCX and IEF-SCX approaches. The main advantage of 1D gel for protein separation is minimal protein loss, as the protein sample can easily diffuse into the gel. Sample

loss was observed in IEF protein separation, where proteins were focused out of the IPG strip, and some still remained in the cup at the end of the IEF program run. Therefore, the 1D gel workflow is a good option for initial fractionation for shotgun proteomics. This suggestion was further supported with greater peptide coverage in the 1D gel workflow, where a total of 4608 unique peptides were identified compared to 3562 unique peptides in the IEF workflow (Table 3). For peptide separation in both methods, IEF peptide separation was performed instead of a conventional SCX approach, as according to Gan et al.,11 IEF of peptides followed by Journal of Proteome Research • Vol. 4, No. 5, 2005 1795

research articles

Chong and Wright

Table 3. Total Unique Peptides Identified for Replicate Injections of the Soluble Protein Samples Obtained from Shotgun Proteomics Approaches 1D-IEF workflow

a

IEF-IEF workflow

proteins with

1 injection

2 injections

3 injections

1 injection

2 injections

3 injections

g 1 peptide g 2 peptides g 3 peptides total peptidesa

613 471 273 1708

909 717 551 3168

1101 932 786 4608

595 467 274 1756

805 663 508 2908

896 822 612 3562

This value corresponds to the total unique peptides excluding the repeated peptides in any injection made.

reversed-phase chromatography offers a better orthogonal separation and result. By using the IEF-IEF approach, from previous work,11 344 unique proteins were identified from Synechocystis, giving an increment of 69% in unique proteins identified compared to the IEF-SCX workflow. Therefore, IEF of peptides was also employed for the insoluble and secreted protein fractions in this study. As the majority of the proteome was expected in the soluble fraction, hence only peptide separations were employed for insoluble and secreted protein fractions. Fractionation Efficiency. The protein distribution in both workflows was analyzed according to their molecular weight (Mw) and pI as depicted in Figure 2. The 1-DE separation was rather well-defined, as the majority of proteins found in each fraction corresponded to the range of Mw excised. This trend is best illustrated in Figure 2(A). However, 332 proteins identified were not within their corresponding Mw. We suspected that this may be influenced by protein hydrophobicity. Proteins with lower Mw found were expected to be more hydrophobic, and higher molecular weight proteins were to be more hydrophilic compared to those proteins within the range, yet this parameter predicted (data not shown) using Kyte-Doolittle Hydropathy Plots27 did not bear this out. This wide protein distribution might result from unavoidable smearing across the gel. IEF protein separation via the IEF-IEF workflow also gave reasonable pI separation. The identified proteins’ pI distribution in each fraction is shown in Figure 2(B). Similar to 1-DE separation, there were approximately 334 proteins found outside the expected range. This corresponds to the result obtained in 2-DE as reported above, where some proteins had migrated on the gel away from their theoretical pI. The same approach of hydrophobicity analysis as described before was carried out. Nonetheless, the analysis (data not shown) eliminated the possibility of a hydrophobicity effect, from that theoretical approach. Post-translational modification can be one of the major reasons for this phenomena as reported elsewhere,36 however further investigation is required before drawing any conclusion, and is the subject of future work. For a further reduction in the sample complexity prior to mass spectrometric analysis, IEF peptide separation was employed after protein fractionation. Peptides were separated according to their iso-electric focusing point (pI). Their pI and Mw were estimated using JVirGel37 (http://www.jvirgel.de/ index.html), an online bioinformatics tool. The peptide distribution in each fraction (for both workflows) was not welldefined and widespread (refer to Figure 6 in the Supporting Information or our website mentioned above). The wide peptide pI coverage in each cut suggested potential peptide diffusion due to their smaller Mw (from 0.5 kDa to 4.2 kDa found in this study). If the process of cutting the strips takes more than 10 min, then this can cause significant peptide diffusion.38 However, this was unavoidable as a longer time was required 1796

Journal of Proteome Research • Vol. 4, No. 5, 2005

to remove excess mineral oil from 6 strips (for one workflow) simultaneously. Despite this consequence, IEF peptide fractionation was sufficient to simplify the peptide mixture prior to mass spectrometric analysis. Multiple Injections and Criteria for Protein Identification. As detailed by Schaefer et al.,39 replicate injections of the same fraction into an ESI-MS/MS are essential to ensure reproducibility and reliability of the results. It was reported that an injection of the same sample repeatly into ESI-MS/MS coupled with liquid chromatography leads to different proteins being identified.40 This is due to chance auto-selection of peptides for MS/MS analysis. Hence, the same samples (soluble, insoluble and secreted fractions) were injected three times into the mass spectrometer, and the results from each injection are summarized in Table 2. The criteria set for multiple injections were (i) proteins with more than two peptides were considered as reliable proteins, and (ii) single peptide protein identifications were also considered reliable if they were found repeated in the same fraction number of other injections. However, if a single injection was made only the first criterion was valid. A total of 275 and 279 proteins were found in all three injections in the 1D-IEF and IEF-IEF workflow, respectively (refer to Figure 5). Common proteins between injections (those proteins found repeatedly at least once in another injection of the same method) were 60% for 1D-IEF and 63% for IEF-IEF. In each injection of either method, at least 10% of the total proteins identified were unique (not repeated in any injections). If the analysis was based on a single injection, i.e., only proteins with two peptides were taken into account, 471 and 467 reliable unique proteins were found in using 1D-IEF and for IEF-IEF, out of a total of 613 and 595 proteins respectively in the first injection. On the basis of analysis for three injections, these values increased to 529 and 530 reliable unique proteins (Table 2). The decrease in single injection analysis was due to the elimination of proteins with a single peptide found repeated in other injections, which was taken into account in the multiple injection analysis. For two injections, the total reliable unique proteins identified increased to 732 and 689 proteins for 1D-IEF and IEF-IEF, corresponding to an increment of 55% and 48% from the first injection. This was followed by a lower increment of 29% and 21% for the third injection, leading to a total number of reliable unique proteins identified of 941 and 837 for 1D-IEF and IEF-IEF, respectively. Clearly, the increment in the number of total reliable unique proteins identified per injection will continue to decrease. It would thus lead to a point where another injection is not worthwhile, as only a small contribution would be made to the total reliable unique proteins found. Here we were limited to three injection replicates due to sample and time constraints. However, it may be interesting to further investigate the trend of the number of proteins identified versus the number of injections made to

research articles

Proteomics Analysis of S. solfataricus P2

predict the optimum injection required for S. solfataricus’ proteome coverage, as was done for yeast.41 There was a significant improvement in peptide coverage via replicate injections of the same fractions into LC-ESI-MS/ MS. A total of 786 and 612 unique proteins with g3 peptides were identified from three injections compared to 273 and 274 proteins from a single injection made in the 1D-IEF and IEFIEF workflows, as shown in Table 3. The replicate injections enhanced the total peptide coverage leading-to-a ratio of 4 peptides per protein identified (total peptides/total proteins identified) for both shotgun proteomics approaches in the soluble fraction. Despite 1D-IEF having the highest total number of proteins and peptides, IEF-IEF showed its advantage in peptide reproducibility with over 50% of the peptides being found (repeated) in at least 2 replicate injections, compared to 27% in 1D-IEF. Low peptide reproducibility in 1D-IEF can lead to some doubt for peptide reliability. However the increased in number of peptides identified from the same proteins improved the peptide coverage (number of peptides) per protein and the confidence of proteins identified. This finding further strengthens the importance and necessity of replicate injections into LC-ESI-MS/MS for a better protein/ peptide coverage and reproducibility.

Conclusions There is an ever increasing pace of development in protein and peptide separation for better coverage of the proteome of an organism. The approach demonstrated here, consisting of gel-based shotgun proteomics is a powerful technique for protein identification. This is especially the case for the 1DIEF fractionation workflow used for determining the soluble proteome. Although both 2-DE and shotgun proteomics are complementary to each other, the shotgun workflow provides greater throughput (using ESI-MS/MS) in protein identification at 68 proteins per day versus 25 proteins per day for gels. The shotgun proteomics workflow employed here further demonstrated its capability in identifying proteins in the insoluble and secreted proteins fractions. So far, in this study, by using 2-DE and the shotgun workflows, 1399 proteins were identified from S. solfataricus P2, including 44 insoluble proteins and 32 secreted proteins. This significant proteome coverage was achieved with multiple injections of the same samples, as different proteins were identified. Utilizing these gel-based shotgun approaches allows a selective region of molecular weight or pI to be studied. This is very helpful, particularly in finding targeted proteins of interest with known molecular weight or pI. The time for sample preparation can be significantly reduced as it focuses on a particular protein fraction. Quantification can be carried out using metabolic labeling22 (with 15N or 13C) or ICAT42 coupled with gel-based shotgun approaches. With this proteomics data, metabolic pathway reconstruction of S. solfataricus can be attempted, thus laying a foundation toward a systems biology understanding of this organism.

Acknowledgment. This work was funded by the United Kingdom’s Engineering and Physical Sciences Research Council (EPSRC) (GR/S84347/01). P.C.W. also thanks the EPSRC for provision of an Advanced Research Fellowship (GR/A11311/ 01). P.K.C. would like to thank the University of Sheffield and the Overseas Research Students Awards Scheme (ORS) for scholarships. We acknowledge C. S. Gan for his expertise in shotgun proteomics, Martin Barrios-Llerena for technical sup-

port, and critical reading from Helia Radianingtyas and Adam Burja.

Supporting Information Available: The master list of proteins identified and the 2-DE gel mapping. This material is available free of charge via the Internet at http://pubs.acs.org. References (1) Beranova-Giorgianni, S. Anal. Chem. 2003, 22, 273-281. (2) Barry, R. C.; Alsaker, B. L.; Robinson-Cox, J. F.; Dratz, E. A. Electrophoresis 2003, 24, 3390-3404. (3) Santoni, V.; Molloy, M.; Rabilloud, T. Electrophoresis 2000, 21. (4) Sergeyenko, T. V.; Los, D. A. FEMS Microbiol. Lett. 2000, 193, 213216. (5) Wolters, D. A.; Washburn, M. P.; Yates, J. R. 3rd Anal. Chem. 2001, 73, 5683-5690. (6) Washburn, M. P.; Wolters, D.; Yates, J. R. 3rd Nat. Biotechnol. 2001, 19, 242-247. (7) Han, D. K.; Eng, J.; Zhou, H.; Aebersold, R. Nat. Biotechnol. 2002, 19, 946-951. (8) Blonder, J.; Goshe, M. B.; Moore, R. J.; Pasa-Tolic, L.; Masselon, C. D.; Lipton, M. S.; Smith, R. D. J. Proteome Res. 2002, 1, 351360. (9) Peng, J.; Elias, J. E.; Thoreen, C. C.; Licklider, L. J.; Gygi, S. P. J. Proteome Res. 2003, 1, 43-50. (10) Vollmer, M.; Horth, P.; Nagele, E. Anal. Chem. 2004, 17, 51805185. (11) Gan, C. S.; Reardon, K. F.; Wright, P. C. Proteomics 2005, 5, 24682478. (12) Ostrowski, L. E.; Blackburn, K.; Radde, K. M.; Moyer, M. B.; Schlatzer, D. M.; Moseley, A.; Boucher, R. C. Mol. Cell. Proteomics 2002, 1, 451-465. (13) Cargile, B. J.; Bundy, J. L.; Freeman, T. W.; Stephenson, J. L., Jr. J. Proteome Res. 2004, 3, 112-119. (14) Figeys, D.; Ducret, A.; Yates, J. R., III; Aebersold, R. Nat. Biotechnol. 1996, 14, 1579-1583. (15) Resing, K. A.; Meyer-Arendt, K.; Mendoza, A. M.; Aveline-Wolf, L. D.; Jonscher, K. R.; Pierce, K. G.; Old, W. M.; Cheung, H. T.; Russell, S.; Wattawa, J. L.; Goehle, G. R.; Knight, R. D.; Ahn, N. G. Anal. Chem. 2004, 76, 3356-3568. (16) Cargile, B. J.; Bundy, J. L.; Stephenson, J. L., Jr. J. Proteome Res. 2004, 3, 1082-1085. (17) Shevchenko, A.; Sunyaev, S.; Loboda, A.; Shevchenko, A.; Bork, P.; Ens, W.; Standing, K. G. Anal. Chem. 2001, 73, 1917-1926. (18) Zhu, W.; Reich, C. I.; Olsen, G. J.; Giometti, C. S.; Yates, J. R., III J. Proteome Res. 2004, 3, 538-548. (19) She, Q.; Singh, R. K.; Confalonieri, F.; Zivanovic, Y.; Allard, G.; Awayez, M. J.; Chan-Weiher, C. C.; Clausen, I. G.; Curtis, B. A.; De Moors, A.; Erauso, G.; Fletcher, C.; Gordon, P. M.; Heikampde Jong, I.; Jeffries, A. C.; Kozera, C. J.; Medina, N.; Peng, X.; ThiNgoc, H. P.; Redder, P.; Schenk, M. E.; Theriault, C.; Tolstrup, N.; Charlebois, R. L.; Doolittle, W. F.; Duguet, M.; Gaasterland, T.; Garrett, R. A.; Ragan, M. A.; Sensen, C. W.; Van der Oost, J. Proc. Natl. Acad. Sci. U.S.A. 2001, 98, 7835-7840. (20) Jonuscheit, M.; Martusewitsch, E.; Stedman, K. M.; Schleper, C. Mol. Microbiol. 2003, 48, 1241-1252. (21) Bult, C. J.; White, O.; Olsen, G. J.; Zhou, L.; Fleischmann, R. D.; Sutton, G. G.; Blake, J. A.; FitzGerald, L. M.; Clayton, R. A.; Gocayne, J. D.; Kerlavage, A. R.; Dougherty, B. A.; Tomb, J. F.; Adams, M. D.; Reich, C. I.; Overbeek, R.; Kirkness, E. F.; Weinstock, K. G.; Merrick, J. M.; Glodek, A.; Scott, J. L.; Geoghagen, N. S.; Venter, J. C. Science 1996, 273, 1058-1073. (22) Snijders, A. P. L.; De Vos, M. G. J.; Wright, P. C. J. Proteome Res. 2005, 4, 578-585. (23) Atlas, R. M. Handbook of Microbiological Media; New York: CRC Press: Boca Raton, 1997. (24) Chan, L. L.; Lo, S. C.; Hodgkiss, I. J. Proteomics 2002, 2, 11691186. (25) Joubert-Caron, R.; Feuillard, J.; Kohanna, S.; Poirier, F.; Le Caer, J. P.; Schuhmacher, M.; Bornkamm, G. W.; Polack, A.; Caron, M.; Bladier, D.; Raphael, M. Electrophoresis 1999, 20, 1017-1026. (26) Russell, W. K.; Park, Z. Y.; Russell, D. H. Anal. Chem. 2001, 73, 2682-2685. (27) Kyte, J.; Doolittle, R. F. J. Mol. Biol. 1982, 157, 105-132. (28) Badock, V.; Steinhusen, U.; Bommert, K.; Otto, A. Electrophoresis 2001, 22, 2856-2864. (29) Herbert, B.; Righetti, P. G. Electrophoresis 2000, 21, 3639-3648. (30) Poland, J.; Bo¨hme, A.; Schubert, K.; Sinha, P. Electrophoresis 2002, 23, 4067-4071.

Journal of Proteome Research • Vol. 4, No. 5, 2005 1797

research articles (31) Harry, J. L.; Wilkins, M. R.; Herbert, B. R.; H., P. N.; Gooley, A. A.; Williams, K. L. Electrophoresis 2000, 21, 1071-1081. (32) Choe, L. H.; Aggarwal, K.; Franck, Z.; Lee, K. H. Electrophoresis 2005, 26, 2437-2449. (33) Snijders, A. P. L.; Gan, C. S.; Chong, P. K.; Sterling, A.; Baumert, M.; Jackson, P. J.; Reardon, K. F.; Wright, P. C. Use of Automated Nano-electrospray Tandem Mass Spectrometry for Analysis of Complex Proteomes. Annual Meeting, American Society for Mass Spectrometry 2005, San Antonio, Texas. (34) Zhang, S.; Van Pelt, C. K. Exp. Rev. Proteomics 2004, 1, 449-468. (35) Beausoleil, S. A.; Jedrychowski, M.; Schwartz, D.; Elias, J. E.; Villen, J.; Li, J.; Cohn, M. A.; Cantley, L. C.; Gygi, S. P. Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 12130-12135. (36) Zhu, W.; Zhoa, J.; Lubman, D. M.; Miller, F. R.; Barder, T. J. Anal. Chem. 2005, 77, 2745-2755.

1798

Journal of Proteome Research • Vol. 4, No. 5, 2005

Chong and Wright (37) Hiller, K.; Schobert, M.; Hundertmark, C.; Jahn, D.; Mu¨nch, R. Nucleic Acids Res. 2003, 31, 3862-3865. (38) Cargile, B. J.; Talley, D. L.; Stephenson, J. L., Jr. Electrophoresis 2004, 25, 936-945. (39) Schaefer, H.; Chervet, J. P.; Bunse, C.; Joppich, C.; Meyer, H. E.; Marcus, K. Proteomics 2004, 4, 2541-2544. (40) Spahr, C. S.; Susin, S. A.; Bures, E. J.; Robinson, J. H.; Davis, M. T.; McGinley, M. D.; Kroemer, G.; Patterson, S. D. Electrophoresis 2000, 21, 1635-1650. (41) Liu, H.; Sadygov, R. G.; Yates, J. R., III Anal. Chem. 2004, 76, 41934201. (42) Li, J.; Steen, H.; Gygi, S. P. Mol. Cell. Proteomics 2003, 2, 1198-1204.

PR0501214