Global Proteome Discovery Using an Online Three ... - ACS Publications

Online 3D LC−MS/MS. The digested protein samples were analyzed by the 3D LC−MS/MS system consisting of an Agilent 1100 series HPLC (Agilent, Palo ...
0 downloads 0 Views 248KB Size
Global Proteome Discovery Using an Online Three-Dimensional LC-MS/MS Jing Wei,* Jun Sun, Wen Yu, Arianna Jones, Paul Oeller, Martin Keller, Gary Woodnutt, and Jay M. Short Diversa Corporation, 4955 Directors Place, San Diego, California 92121 Received December 14, 2004

We have developed a proteomics technology featuring on-line three-dimensional liquid chromatography coupled to tandem mass spectrometry (3D LC-MS/MS). Using 3D LC-MS/MS, the yeast-soluble, ureasolubilized peripheral membrane and SDS-solubilized membrane protein samples collectively yielded 3019 unique yeast protein identifications with an average of 5.5 peptides per protein from the 6300gene Saccharomyces Genome Database searched with SEQUEST. A single run of the urea-solubilized sample yielded 2255 unique protein identifications, suggesting high peak capacity and resolving power of 3D LC-MS/MS. After precipitation of SDS from the digested membrane protein sample, 3D LCMS/MS allowed the analysis of membrane proteins. Among 1221 proteins containing two or more predicted transmembrane domains, 495 such proteins were identified. The improved yeast proteome data allowed the mapping of many metabolic pathways and functional categories. The 3D LC-MS/MS technology provides a suitable tool for global proteome discovery. Keywords: proteomics • liquid chromatography • tandem mass spectrometry • 3D LC-MS/MS • yeast • protein identification • proteome • membrane protein

Introduction Global proteome discovery aims to understand complex biological systems in fields of protein expression, protein function, protein modifications, and protein interactions. It has also been widely used for biomarker discovery. Proteomics faces challenges such as the high complexity of protein species, the large dynamic range of protein levels, the analysis of integral membrane proteins, and the need for high-throughput.1-5 It is vital to develop high-resolution, sensitive, and automated proteomic platforms to identify large amounts of proteins from the complex biological systems. Recently, Yates and colleagues have developed MudPIT (Multidimensional Protein Identification Technology)5-8 designed to overcome the limitations of traditional two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) coupled to mass spectrometry.9-11 The MudPIT method was capable of detecting proteins of low abundance and extreme hydrophobicity, pI, and molecular weight.12-18 In a typical MudPIT experiment, the protein sample was digested into peptide mixtures, which were then separated by microcapillary twodimensional liquid chromatography (2D-LC). Tandem mass spectrometry (MS/MS) data were acquired from the separated peptides and used for database searches to identify peptides and proteins. Using the on-line MudPIT method,8 three protein fractions, termed soluble, lightly washed insoluble, and heavily washed insoluble fractions at 420, 440, and 490 µg each, respectively, yielded 1484 yeast proteins derived from 5540 * To whom correspondence should be addressed. Tel: (858) 526-5201. Fax: (858) 526-5701. E-mail: [email protected]. 10.1021/pr0497632 CCC: $30.25

 2005 American Chemical Society

unique peptide identifications6. Similarly, an off-line 2D LCMS/MS approach identified 1504 yeast proteins derived from 7537 unique peptide identifications from 1 milligram of whole yeast proteins.3 The off-line 2D LC-MS/MS method separated an acidified peptide mixture with strong cation-exchange (SCX) liquid chromatography into nearly 80 fractions that were subsequently analyzed one at a time by reversed-phase liquid chromatography (RPC) and MS/MS. These 2D LC-MS/MS based methods have demonstrated many strengths and much improvement over the traditional 2D-PAGE method and are being widely applied. The off-line method was much more labor intensive and tedious compared to the on-line method but provided slightly better protein and peptide identifications mainly due to the efficiency of the SCX elution by adding organic solvents during the salt gradient. In both methods, the use of SCX as the first LC phase was not ideal for the binding of all peptides and required sample desalting first. During their development of the on-line MudPIT,6,8 Wolters and colleagues performed offline prefractionation of the peptides with a C18 column into five fractions followed by separate MudPIT analyses on each fraction. Doing so, they identified 607 yeast soluble proteins in total. The offline prefractionation was very labor intensive. McDonald and colleagues19 placed additional RP material upstream of the SCX in the 2-phase MudPIT microcapillary for on-line desalting and made improvements in peptide/protein identifications over the off-line desalting two-phase MudPIT with small quantity samples. However, the resolving powers of these separation techniques as reported can be further improved to better resolve the complexity of biological systems. Journal of Proteome Research 2005, 4, 801-808

801

Published on Web 04/06/2005

research articles To address the aforementioned shortcomings as well as other challenges concerning global proteomics, we developed an online 3D LC-MS/MS system with improved peak capacity and resolution for peptide separation. The 3D LC-MS/MS system generated many more unique peptide and protein identifications than other reported methods. In this paper, we describe the global discovery of the yeast proteome demonstrating the advantages of the 3D LC-MS/MS proteomics platform.

Experimental Section Reagents. Endoproteinase Lys-C and recombinant trypsin were purchased from Roche Diagnostics (Indianapolis, IN). Dithiothreitol (DTT) was obtained from Pharmacia Biotech (Uppsala, Sweden). All other chemicals, unless otherwise noted, were obtained from Sigma Chemical (St. Louis, MO). Preparation of Protein Samples from S. cerevisiae Cells. S. cerevisiae strain S150-2B (MATa, leu2-3, leu2-112, ura3-52, trp1-289, his3-∆1,Gal+) was cultured in YPD medium at 30 °C to log growth phase. The cell pellet was resuspended in TNE extraction buffer (50 mM Tris-HCl, pH 8.0, 100 mM NaCl, 10 mM EDTA, 1 mM DTT, 0.7 µg/mL of pepstatin A, 5 µg/mL of RNase and DNase). Cell lysis was performed by homogenization with glass beads in a Mini-BeadBeater (BioSpec Products, Bartlesville, OK) for six cycles of 1 min breakage and 1 min cooling on ice. The cell lysate was centrifuged at 1000g for 5 min, and the supernatant was subjected to ultracentrifugation at 100000g for 30 min at 4 °C. The supernatant was collected as the soluble protein sample. The pellet was washed twice with 4 M urea, 100 mM Tris-HCl, pH 8.0, 1 mM DTT buffer to generate the urea-solubilized protein sample. Optionally, the pellet was washed again with 8 M urea, 100 mM Tris-HCl, pH 8.0, 1 mM DTT buffer to yield an alternative urea-solubilized fraction. The remaining pellet was resuspended with 1% SDS, 50 mM Tris-HCl, pH 8.0, 1 mM DTT buffer. After ultracentrifugation, the supernatant was collected as the SDSsolubilized protein sample. Protein concentrations were measured using a BCA protein assay kit (Pierce Biotechnology, Rockford, IL). The soluble protein sample was adjusted to 100 mM TrisHCl, pH 8.0, 4 M urea alongside the urea-solubilized protein sample; the two mixtures were reduced with 1 mM DTT and alkylated with 0.4 mg/mL of iodoacetamide. After digestion with endoproteinase Lys-C (1/200 of protein), the protein samples were diluted to 1 M urea and further digested twice with trypsin (1/100 of protein). The second batch of trypsin was added 4 h after the addition of the first batch. The digested proteins were concentrated in a DNA110 SpeedVac (Thermo Savant, Holbrook, NY) if necessary. The SDS-solubilized protein sample was diluted to 0.1% SDS. The proteins were reduced and alkylated and then digested with endoproteinase Lys-C and twice with trypsin. The protein sample was concentrated if necessary. SDS was removed using the SDS-Out Sodium Dodecyl Sulfate Precipitation Kit from Pierce Biotechnology (Rockford, IL). The completion of digestion for each sample was confirmed by SDS-PAGE and silver staining using the SilverQuest Silver Staining Kit from Invitrogen (Carlsbad, CA). Online 3D LC-MS/MS. The digested protein samples were analyzed by the 3D LC-MS/MS system consisting of an Agilent 1100 series HPLC (Agilent, Palo Alto, CA), a Reverse phase 1-Strong cation exchange phase-Reverse phase2 (RP1-SCX-RP2) microcapillary column and an LCQ Deca XP mass spectrometer equipped with a nano-spray source (Thermo Finnigan, San 802

Journal of Proteome Research • Vol. 4, No. 3, 2005

Wei et al.

Jose, CA). The RP1-SCX-RP2 column was generated in house using a pressure bomb. The column was constructed with two microcapillaries coupled with an inline microfilter assembly (Upchurch Scientific, Oak Habor, WA) and packed with three LC phases (Figure 1). The first microcapillary (180 µm i.d. × 365 µm o.d. × 30 cm) was packed with Zorbax SB-C18 reversedphase material (Agilent, Palo Alto, CA) as RP1. The second microcapillary (100 µm i.d. × 365 µm o.d. × 15 cm) was first packed with 10 cm of Zorbax SB-C18 reversed-phase particles as RP2 and then 5 cm of polysulfoethyl A strong cationexchange material (PolyLC Inc., Columbia) as SCX. The column was then connected to the HPLC pump through RP1 and coupled to the LCQ through RP2. Without desalting, 200 µg of peptide mixture from each digested protein sample was directly loaded onto the RP1 region of the microcapillary column using the pressure bomb. The absolute loading capacity of RP1 was tested and found to be capable of up to 1 mg of protein digests. The LC separation was carried out with a three-cycle method in a fully automated manner using four buffer solutions: buffer A (2% ACN/0.1% formic acid), buffer B (80% ACN/0.1% formic acid), buffer C (250 mM ammonium acetate/2% ACN/0.1% formic acid), and buffer D (2 M ammonium acetate/2% ACN/ 0.1% formic acid). An RP gradient of Xn - Xn+1% B over 120 min with a flow rate of 250 nL/min was applied to elute a fraction of the absorbed peptides from the Rp1 region, which were retained on the Scx phase. Then a salt step of 10 min with a flow rate of 1 µL/min was used to subfractionate peptides from the Scx phase onto the Rp2 region. The peptides on Rp2 were then separated using the same Xn - Xn+1% B RP gradient of 120 min with a flow rate of 150 nl/min. The peptides eluted from Rp2 were then directly analyzed by the LCQ mass spectrometer. When the elution from Rp2 was completed, another salt step with increased salt concentrations was applied to transfer another subfraction from the Scx region to Rp2, followed by the same Xn - Xn+1% B RP gradient for Rp2 separation. When a series of salt steps were completed within the initial RP gradient, a higher RP gradient of Xn+1 - Xn+2% B was applied and the salt steps were repeated with the new RP gradient. Each of the RP cycles was applied in an iterative manner, with the total number of cycles depending on the complexity of the peptides, as shown in Figure 1. For the yeast protein samples, the separation included 5 RP gradients (08% B, 8-15% B, 15-30% B, 30-50% B, and 50-100% B) each followed by 12 salt steps (0, 25, 50, 75, 100, 125, 150, 175, 200, 225, 250, and 2000 mM ammonium acetate). The total mass spectral scan range was divided into three segments to utilize the gas-phase fractionation power of the instrument. The LCQ mass spectrometer was set to divide the full MS scan into three smaller sections covering a total range of 400-2000 m/z. Each of the smaller MS scans was followed by 4 MS/MS scans of the most intense ions from the preceding MS scan. The typical collision energy for collision-induced dissociation was set to 35% with a 30-ms activation time. Dynamic exclusion was enabled with a repeat count of 1 and a 3-min exclusion duration window. Comparison with Existing LC-MS/MS Systems. For comparison purposes, the same digested soluble yeast samples were applied to various existing LC-MS/MS systems. A strong cation exchange phase-reverse phase (SCX-RP) column was constructed with a microcapillary (100 µm i.d. × 365 µm o.d. × 15 cm) first packed with 10 cm of Zorbax SB-C18 and then 5 cm of polysulfoethyl A SCX material. The yeast samples were offline desalted with Spec Plus PT C18 solid-phase extraction tips

research articles

Global Proteome Discovery

Figure 1. RP1-SCX-RP2 microcapillary LC column and 3D LC separation elution profiles. (A) Complex peptides in digestion buffer were directly loaded onto the column through RP1. Peptides were fractioned in RP1, subfractioned in SCX, and separated further on RP2, with a combination of RP and salt gradients through the three phases in an iterative process. A voltage of 1.3 kV was applied to the front of RP1 via a liquid-metal interface producing a stable electrospray at the ion source of the mass spectrometer. (B) Elution profiles of peptide subfractions from RPC 0-8% B fraction at zero salt step (left panel, B) and 150 mM ammonium acetate salt step (right panel, B). Each of the subfractions was separated on RP2 using 0-8% B RPC gradients over 2 h.

(Ansys Diagnostics, Lake Forest, CA) before being loaded to the SCX-RP columns using the pressure bomb. The typical twocycle MudPIT separation method was used in some comparison experiments.6 The peptides were eluted in 14 salt steps (0, 25, 50, 75, 100, 125, 150, 175, 200, 225, 225, 250, 250, and 2000 mM ammonium acetate) with an RP gradient of 0-100% B for 120 min within each salt step. For non-gas-phase separation procedures, LCQ was set to acquire a full MS scan between 400 and 1400 m/z followed by full MS/MS scans between 400 and 2000 m/z of the top three ions from the preceding MS scan. Data Analysis. The LC-MS/MS raw data were extracted using the XCalibur Rawfile Converter V 1.0.0a and then searched against the 6300-gene Saccharomyces Genome Database (SGD)20 using the SEQUEST program. The nonspecific cleavage rule was designated during the SEQUEST search. Differential modifications of Met oxidation (+16) and Cys alkylation (+57) were allowed for the database search. The results were filtered using the same criteria set as previously described6,8 to obtain the peptide identifications. Briefly, all peptide identifications had a ∆Cn > 0.1; peptides with a +1 charge state were fully tryptic with XCorr > 1.9; peptides with a +2 charge state were with XCorr > 3.0 or fully or partially tryptic with XCorr ranging between 2.2 and 3.0; peptides with a +3 charge state were fully or partially tryptic with XCorr > 3.5. The Portfolio program was used for protein identification and summarizing the results. For each protein sequence in SGD, the program assembled the peptide identifications that matched its substring. The protein identification was estab-

lished if one or more peptide identifications were matched. The peptide identifications were allowed for multiple protein identifications. The number of distinct peptides and sequence coverage per protein identification were also calculated. Proteins identified by a single peptide are considered at a reduced level of confidence. These protein identifications by single peptide were included in all results and their percentages in the total protein identifications were compared among all systems. The cumulative total of protein identifications was calculated as the sum of yeast sequences from SGD that were detected in at least one of the protein samples. The predictions of functional and subcellular membrane locations came from the MIPS Comprehensive Yeast Genome Database21 and pathway designations from the KEGG database.22 Both were downloaded from the respective sources and merged into the header of the gene predictions from SGD.20 The membrane predictions from the Yeast Membrane Protein Library23 were merged based on sequence identity.

Results Peptide and Protein Identification from 3D LC-MS/MS. Soluble, urea-solubilized, and SDS-solubilized proteins from log phase yeast cells were independently digested and analyzed using the 3D LC-MS/MS system. The MS/MS data was searched using SEQUEST and summarized using Portfolio. From a single analysis of the soluble protein sample, we identified 5954 unique peptides and 1457 proteins. A single Journal of Proteome Research • Vol. 4, No. 3, 2005 803

research articles

Wei et al.

Table 1. 3D LC-MS/MS Significantly Improved over Existing LC-MS/MS Methods column configuration

separation method gas-phase separation separation time (h) protein identifications peptides/protein ID single peptide ID (%) extra

SCX-RP

RP1-SCX-RP2

two cycle

two cycle

two cycle

30 334 2.5 72 off-line desalting

30 580 3.0 59

96 583 3.1 58

analysis of the urea-solubilized protein sample identified 10 144 unique peptides and 2255 proteins, while analysis of the SDSsolubilized protein sample generated 2875 unique peptides and 805 protein identifications. Cumulatively, 3019 unique proteins with an average 4.2 peptides/protein identification were identified after merging the overlapping identifications from these three analyses. Duplicated analysis of these three samples yielded a similar number of peptide and protein identifications but was not used for the cumulative identifications. Within the 3019 protein identifications, 63% of the proteins were identified

two cycle yes 30 632 3.5 52

three cycle 96 1404 4.5 48

three cycle yes 96 1495 5.5 41

by two or more peptides. We identified 48% of the total proteins in the yeast database using the on-line 3D LC-MS/MS system, which showed significant improvement in proteome discovery over the existing LC-MS/MS systems. Improvements from 3D LC-MS/MS. 3D LC-MS/MS integrated several improvements over the existing LC-MS/MS systems to maximize the peptide and protein identifications. We studied the use of RP1-SCX-RP2 LC column, the application of three-cycle LC separation method and the utilization of gasphase separation in the LCQ mass spectrometer. An equal

Figure 2. Reproducibility of 3D LC separation. Duplicate yeast soluble protein samples were analyzed with identical RP1-SCX-RP2 LC columns on the same instrument with same 3D LC separation method. (A) The reproducibility of the distinct peptide identifications. From the duplicate analysis, the numbers of distinct peptides from the overlapping protein identifications were plotted. Each point represents an overlapping protein identification. (B) The reproducibility of peptide elution positions as indicated by select-ionchromatograms. Peptide TAEQLENLNIQDDQK was eluted by 8-15% buffer B and 20% buffer C at 59.90 and 59.10 min during the duplicate analysis with a difference of 0.80 min in retention time (B1, B2). Peptide YVDPNVLPETESLALVIDR was eluted in 15-30% buffer B and 50% buffer C at 78.63 and 80.40 min during the duplicate analysis with a retention time difference of 1.77 min (B3, B4). 804

Journal of Proteome Research • Vol. 4, No. 3, 2005

Global Proteome Discovery

research articles

Figure 3. Evaluation of LC performance. A total of 306235 MS/MS spectra acquired from the urea-solubilized protein sample were used. Five reversed-phase gradients (0-8%, 8-15%, 15-30%, 30-50%, and 50-100% buffer B) were each followed by 12 salt steps (a-l corresponding to 0, 25, 50, 75, 100, 125, 150, 175, 200, 225, 250, and 2000 mM ammonium acetate, respectively). (A) The number of MS/MS spectra in each step. (B) The number of proteins identified in each step.

amount of 200 µg of yeast-soluble protein digests were used for each LC-MS/MS experiment. When comparing the 3D LCMS/MS system to the MudPIT, the 3D LC-MS/MS system produced 1495 unique protein identifications that were 4.5 times of the 334 proteins identified by a typical MudPIT analysis on the same sample (Table 1). The 3D LC-MS/MS also increased peptides per protein identification (from 2.5 by MudPIT to 5.5) and reduced percentage of proteins identified by single peptide (from 72% by MudPIT to 41%) and, in turn, greatly increased the confidence of the protein identifications. Under the same two-cycle separation method and other settings, the use of the RP1-SCX-RP2 LC column generated 74% more protein identifications than the SCX-RP LC column (Table 1). The peptide identifications from RP1-SCX-RP2 column were 111% more than those from SCX-RP LC column, resulting in better peptides per protein identification and less percentage of proteins identified by single peptide. The samples were offline desalted before loaded to the SCX-RP LC column, whereas RP1 in the RP1-SCX-RP2 LC column under the 2-cycle LC separation method functioned as on-line desalting. It is likely the on-line desalting with RP1-SCX-RP2 LC column minimized the sample loss during the off-line desalting for SCX-RP LC column, thus improving peptide and protein identifications. Next, samples were loaded onto RP1-SCX-RP2 columns but resolved with a two-cycle LC separation method or a threecycle LC separation method (Table 1). In these experiments, the three-cycle separation method generated 142% more protein identifications and 266% more peptide identifications than the two-cycle separation method. These results suggested that the three-cycle separation method fully utilized the configuration of RP1-SCX-RP2 LC column and provided extra separation power from RP1 of the RP1-SCX-RP2 column in addition to its on-line desalting function. It was noticed that the three-cycle separation method took 96 h of separation time whereas the two-cycle separation method took 30 h of separation time. To demonstrate that it was the three-cycle separation method, rather than the separation time, that provided the increased separation power, a 96-h two-cycle separation method with longer RP gradient after each salt step was applied but failed to improve the protein identifications and peptide identifications. Finally, the utilization of gas-phase fractionation into separation methods was tested (Table 1). In two-cycle and

three-cycle separation methods, gas-phase fractionation provided 9% and 7% more protein identifications correspondingly. There were 27-30% more peptide identifications generated with the integration of gas-phase fractionation into the separation methods. This set of experiments indicated that the use of RP1-SCXRP2 LC column, the application of 3-cycle separation method, and the utilization of gas-phase fractionation all contributed to the dramatic increases in both protein and peptide identifications using the 3D LC-MS/MS system. Among these three factors, the application of the three-cycle separation method provided the most improvement for the 3D LC-MS/MS system. Performance of 3D LC-MS/MS. The reproducibility of the 3D LC-MS/MS system was studied by performing duplicate analyses of the same protein sample. Peptide mixtures from 200 µg of digested yeast soluble protein samples were loaded onto identical RP1-SCX-RP2 LC columns and analyzed on the same LC-MS/MS instrumentation with the same three-cycle LC separation method. The duplicate analyses yielded 1317 and 1288 protein identifications, respectively. There were 1010 overlapping protein identifications, 76.7% from analysis 1 and 78.4% from analysis 2. When the distinct peptide numbers for the overlapping protein identifications from the duplicate analysis were plotted, good linear correlation (r2 ) 0.95) was observed (Figure 2A). The variation in retention time for the same peptide was less than 3 min across the 4-day analysis period. As shown in Figure 2B, two separate peptides eluted from the duplicate analyses with differences in retention time of 0.80 and 1.77 min, respectively. In general, peptides that eluted earlier had less variation in retention time than those that eluted later, suggesting that accumulative effects contributed to big variations in retention time; peptides that did not bind to the SCX phase displayed more variation in retention time than those that bound to the SCX phase. The chromatography profile of a 3D analysis of the ureasolubilized sample is depicted in Figure 3. Among the total 306 235 MS/MS spectra, the majority of them were evenly distributed across the separation steps of 0 to 50% buffer B of RP1 separation, whereas the number of spectra was significantly reduced in the last RP1 fraction of 100% buffer B. It was also observed that a large number of spectra were generated from the zero salt subfractions within each RP1 fraction. The Journal of Proteome Research • Vol. 4, No. 3, 2005 805

research articles

Wei et al.

Table 2. Proteins Identified in Metabolic Pathways

metaJLbolic pathways

glycolysis/gluconeogenesis citrate cycle (TCA cycle) pentose phosphate pathway fructose and mannose metabolism galactose metabolism fatty acid metabolism bile acid biosynthesis ubiquinone biosynthesis oxidative phosphorylation ATP synthesis purine metabolism pyrimidine metabolism glutamate metabolism alanine and aspartate metabolism glycine, serine and threonine metabolism valine, leucine and isoleucine biosynthesis lysine biosynthesis lysine degradation arginine and proline metabolism histidine metabolism tyrosine metabolism tryptophan metabolism phenylalanine, tyrosine and tryptophan biosynthesis selenoamino acid metabolism starch and sucrose metabolism N-glycan biosynthesis aminosugars metabolism glycerolipid metabolism inositol phosphate metabolism glycosylphosphatidylinositol (GPI)-anchor biosynthesis sphingoglycolipid metabolism pyruvate metabolism glyoxylate and dicarboxylate metabolism benzoate degradation via CoA ligation butanoate metabolism carbon fixation nicotinate and nicotinamide metabolism folate biosynthesis aminoacyl-tRNA biosynthesis ribosome RNA polymerase transcription factors proteasome MAPK signaling pathway second messenger signaling pathway phosphatidylinositol signaling system cell cycle ubiquitin-mediated proteolysis total

designated proteins

detected proteins

detected/ designated (%)

47 32 28 37

39 26 22 24

83 81 79 65

29 18 22 18 65 30 91 72 29 29

14 14 14 7 52 27 65 52 20 22

48 78 64 39 80 90 71 72 69 76

44

33

75

16

14

88

21 31 24 21 21 21 23

12 19 17 10 10 13 20

57 61 71 48 48 62 87

19 141 45 22 60 105 26

13 76 32 12 44 54 19

68 54 71 55 73 51 73

112 34 16

61 29 9

54 85 56

113

55

49

29 18 105

17 15 53

59 83 50

21 39 133 29 23 32 58 19

16 28 122 21 10 28 33 9

76 72 92 72 43 88 57 47

21

9

43

87 20 2096

41 10 1362

47 50 65

number of MS/MS spectra from these steps accounted for 9.8% of the total, indicating that a significant number of peptides did not bind to the SCX resin and underlining the importance of sample binding function of RP1. To examine the efficiency of the 3D LC separation, the degree of redundancy was studied using a subset of data from a complete 3D LC MS/MS analysis. Less than 4% of the total identified peptides eluted across multiple RP2 separation steps 806

Journal of Proteome Research • Vol. 4, No. 3, 2005

Table 3. Proteins with More than Two Predicted TMD Detected in SDS-Solubilized and Urea-Solubilized Protein Samples predicted no. of TMD in proteins

predicted proteins

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 20 total

437 198 124 51 53 52 32 34 42 37 108 22 13 8 5 3 1 1 1221

detected proteins

detected/ predicted (%)

detected in SDS- and ureasolubilized protein samples

SDS and urea/total detected (%)

160 48 51 23 25 28 19 18 19 22 55 10 8 4 2 3 0 0 495

37 24 41 45 47 54 59 53 45 59 51 45 62 50 40 100 0 0 41

150 44 50 23 25 26 19 17 17 18 53 9 7 4 2 3 0 0 467

94 92 98 100 100 93 100 94 89 82 96 90 88 100 100 100 0 0 94

and were found to be mainly across the SCX dimension rather than RP1 dimension, suggesting more effective separation of the RP1 dimension than the SCX dimension. Another 4% of the peptides were detected multiple times within a single RP2 separation, accounting for the remaining detection redundancy. Overall, redundancy was limited to a minority (8%) of peptides, which were mainly linked to abundant proteins. Ultimately, the efficient separation for the majority of peptides was achieved through the combination of all three dimensions. Metabolic Pathways. With the increased number of protein identifications, we were able to detect the presence of whole or partial metabolic pathways. Changes in protein levels in pathways could provide vital information for cell biology studies. The KEGG database provided protein designation for 48 metabolic pathways.22 Out of the total 2098 proteins in the 48 metabolic pathways, 1362 proteins were detected by 3D LCMS/MS (Table 2). All 48 metabolic pathways were detected with coverage ranging from 39 to 92% at an average of 65%. Out of these pathways, the ATP synthesis and the ribosome pathways were covered at 90% and 92%, respectively, and 83% of the enzymes in the glycolysis/gluconeogenesis pathway were detected. Overall, more enzymes located in the main path were identified than those in branch paths. Membrane Proteins. It is well-known that integral membrane proteins are tightly bound to membranes and may only be solubilized using various detergents. During sample preparation, 8 M urea was used to extract the peripheral membrane proteins along with some integral membrane proteins, followed by SDS addition to solubilize the remaining integral membrane proteins. The SDS-solubilized membrane proteins were digested in the presence of 0.1% SDS followed by the partial removal of excess amounts of SDS. By doing so, precipitation of membrane proteins in the absence of detergent was avoided and the level of SDS was reduced to a level tolerated by 3D LC-MS/MS. The SDS-solubilized protein sample provided 805 unique protein identifications from high-quality spectra. The exhibited tolerance to detergents may be attributed to the tight binding of SDS to the RP1 column, allowing the SDS-free

research articles

Global Proteome Discovery

Table 4. Proteins Localized in Subcellular Membranes Detected in SDS-Solubilized and Urea-Solubilized Protein Samples

subcellular membrane location

known proteins

detected proteins

detected/known (%)

detected in SDS- and urea-solubilized protein samples

SDS and urea/total detected (%)

ER membrane Golgi membrane integral membrane mitochondrial inner membrane mitochondrial outer membrane plasma membrane vacuolar membrane total

75 49 15 117 18 146 36 456

59 36 8 80 13 88 27 311

79 73 53 68 72 60 75 68

55 32 8 77 13 86 25 296

93 89 100 96 100 98 93 95

peptides to achieve good separation on the SCX and RP2 column with minimal interference on their ionization. A total of 1593 candidate proteins with two or more predicted TMDs from the Yeast Membrane Protein Library23 were searched against the SGD generating 1221 unique entries. Out of these entries, 495 proteins containing two or more of the predicted TMDs were successfully identified (Table 3). Although the number of identified membrane proteins (41%) was slightly lower than that of the total genes identified (48%), it remains the highest percentage of membrane protein identifications reported thus far. Of the 495 proteins, 94% were found in the urea and SDS-solubilized protein fractions, supporting their predicted transmembrane location. Moreover, 102 of the detected proteins were found only in the 1% SDS-solubilized sample, affirming the utility of a detergent-tolerant system for profiling the membrane proteome. Additionally, it is well-known that transmembrane regions of integral membrane proteins present a great challenge to existing technologies and yet are vital to better understand membrane proteins. The experiments identified 179 peptides from 151 proteins covering part or all of the predicted TMDs of these membrane proteins. Among them, 50 identified peptides covered the whole span of the predicted TMDs. The yeast protein YPL131W contained two predicted TMDs that were fully represented by identified peptides with an overall sequence coverage of 67% (Figure 4A). The yeast mitochondrial phosphate transport protein, YJR077C, contained 6 predicted TMDs of which three full TMDs and two partial TMDs were

Figure 4. The protein sequence and TMD coverage of the yeast proteins YPL131W (A) and YJR077C (B). Residues that were contained in one or more peptide identifications are highlighted in blue. The predicted TMDs are underlined.

covered by identified peptides (Figure 4B). The overall protein sequence coverage of YJR077C was 73%. The favorable TMD coverage was most likely due to the ability to use SDS to solubilize the integral membrane proteins for 3D LC-MS/MS. Membrane proteins can also be classified by their location in subcellular membranes, such as plasma, ER, Golgi, mitochondria, vacuoles, and integral membranes. A library of 456 proteins located in these subcellular membranes was collected from the MIPS Comprehensive Yeast Genome Database.21 Among them, 68%, or 311, of these proteins were detected by 3D LC-MS/MS (Table 4). Among the detected proteins, 95% were found in the urea and SDS-solubilized protein fractions, confirming their actual membrane location.

Discussion We report here the 3D LC-MS/MS system that significantly improved the protein and peptide identifications of the yeast proteome over the existing LC-MS/MS systems. It provides a systemic tool for exploring the complex biological proteomes. The 3D LC-MS/MS system achieved the high peak capacity and resolution by using an improved three-phase RP1-SCXRP2 LC column, applying a unique three-cycle LC separation method, and utilizing the gas-phase fractionation in the mass spectrometer. The RP1-SCX-RP2 column was constructed with three consecutive sections of LC material (Figure 1). The three-cycle LC separation method was developed to accomplish a threedimensional LC separation. RP1 provided the first dimension for separation as well as the functions of sample binding and desalting. The second dimension, SCX, further separated the mixture via increasing salt step gradients. The last dimension RP2 produced the high-resolution separation of each subfraction from the SCX section. Furthermore, the acquisition method of the LCQ mass spectrometer was optimized to overcome the dynamic range limitations of the ion trap, which has a well-known limit of ion capacity,24 and to improve the quality of the MS/MS spectra. In the 3D LC-MS/MS system, RP1’s function was not just for on-line desalting as in previous study.19 It contributed to the improvement in resolving power and also increased method robustness beyond the benefits of on-line desalting. Since RP1 served as a trapping column, it was typically replaced after each use, whereas the other two dimensions were reusable. Doing so provided a desirable way for comparison between related protein mixtures by limiting variation between columns used. The column described in this paper was found to efficiently analyze up to 0.4 mg of digested proteins. The capacity of the 3D LC system was flexible in that higher loading requirements could be achieved by increasing the length of RP1. The 3D LCMS/MS system detected a significant number of peptides that Journal of Proteome Research • Vol. 4, No. 3, 2005 807

research articles did not bind to the SCX particles (Figure 3, zero salt step a of each RP gradient), which may have otherwise been overlooked in off-line 2D LC systems. Furthermore, the C18 in RP1 bound detergents such as SDS very tightly, allowing the detergentfree peptides to separate in RP2 and enter the instrument. The ability to analyze samples containing SDS or other detergent greatly enhances the number of applications possible for LCMS/MS technology. The 3D LC-MS/MS showed significant improvements on peptide and protein identifications, but used longer separation time than typical MudPIT run. However, simply increasing separation time of the two-cycle LC method did not improve the LC separation (Table 2). Therefore, 3D LC-MS/MS provides the best tool when maximum discovery is desired, but will not completely replace existing LC-MS/MS methods. For complex samples such as mammalian proteomes, it is necessary to utilize the high separation power of 3D LC-MS/MS system to identify significant number of proteins. In addition, the 3D LCMS/MS system can be modified to reduce analysis time for less complex samples. Membrane proteins are an important area of focus for proteomics, but they have been difficult to analyze since they have to be solubilized with detergents. These detergents cause serious harm to reverse phase separation. The 3D LC-MS/MS system is very suitable for the analysis of these important proteins mainly because it is amenable to the samples treated with detergents via retaining the remaining detergent on RP1 while analyzing detergent-free peptides from SCX and RP2. Illustrating this capability, one 3D LC-MS/MS experiment was performed where the membrane pellet was washed with only 4 M urea buffer, leaving more proteins in the SDS-solubilized protein sample. By doing so, 1306 unique proteins were identified, indicating that the SDS-solubilized protein samples were also capable of yielding extensive data using 3D LC-MS/ MS. Among the three protein fractions, the urea-solubilized protein sample consistently yielded the most protein identifications. This is because soluble proteins may associate with membranes through protein modifications and proteinprotein interactions and dissociate from the membrane upon urea extraction. Also, integral membrane proteins with lower hydrophobicity can be extracted using a high concentration of urea and thus be present in the urea-solubilized protein sample. Consequently, the urea solubilized protein sample retained both the hydrophilic as well as some hydrophobic proteins and, as a result, showed the highest complexity. The urea solubilized protein sample may also have a reduced dynamic range of protein levels since most of the soluble abundant proteins were separated into the soluble protein sample. The urea-solubilized sample yielded 2050 identified proteins from a single 3D LC-MS/MS analysis, clearly indicating the suitability of the 3D LC separation for proteomics discovery. In summary, the 3D LC-MS/MS system demonstrated high separation power and tolerance to detergent. We reported here the most protein identifications of the yeast proteome. The protein identification is likely to be further improved if the 3D LC is coupled with mass spectrometers with faster scan rate. The 3D LC-MS/MS system will play important roles in

808

Journal of Proteome Research • Vol. 4, No. 3, 2005

Wei et al.

proteome discovery and other proteomics applications. For instance, quantification by either metabolic or chemical/ isotope labeling, which would nearly double the complexity of the sample, could be performed without separation prior to analysis. The high separation power would also increase the opportunity for the detection of posttranslational modifications. Finally, applications such as protein-protein interactions, protein-lipid interactions, raft proteins, and protein trafficking, that requires SDS or other detergents, would be accessible to global analysis using 3D LC-MS/MS.

Acknowledgment. We thank John R. Yates III, Antonius Koller, David Schieltz, Huijuan Zhang, Timothy Torchia, and Steven Briggs for their critical reading of the manuscript and their valuable suggestions. References (1) Hatzimanikatis, V.; Choe, L. H.; Lee, K. H. Biotechnol. Prog. 1999, 15, 312-318. (2) Corthals, G. L.; Wasinger, V. C.; Hochstrasser, D. F.; Sanchez, J. C. Electrophoresis 2000, 21, 1104-1115. (3) Peng, J.; Elias, J. E.; Thoreen, C. C.; Licklider, L. J.; Gygi, S. P. J Proteome Res. 2003, 2, 43-50. (4) Aebersold, R.; Mann, M. Nature 2003, 422, 198-207. (5) Link, A. J.; Eng, J.; Schieltz, D. M.; Carmack, E.; Mize, G. J.; Morris, D. R.; Garvik, B. M.; Yates, J. R., III. Nat. Biotechnol. 1999, 17, 676-682. (6) Washburn, M. P.; Wolters, D.; Yates, J. R., III. Nat. Biotechnol. 2001, 19, 242-247. (7) Washburn, M. P.; Ulaszek, R.; Deciu, C.; Schieltz, D. M.; Yates, J. R., III. Anal. Chem. 2002, 74, 1650-1657. (8) Wolters, D. A.; Washburn, M. P.; Yates, J. R., III. Anal. Chem. 2001, 73, 5683-5690. (9) Hanash, S. M. Electrophoresis 2000, 21, 1202-1209. (10) Pandey, A.; Mann, M. Nature 2000, 405, 837-846. (11) Washburn, M. P.; Yates, J. R., III. Curr. Opin. Microbiol. 2000, 3, 292-297. (12) Futcher, B.; Latter, G. I.; Monardo, P.; McLaughlin, C. S.; Garrels, J. I. Mol. Cell Biol. 1999, 19, 7357-7368. (13) Garrels, J. I.; McLaughlin, C. S.; Warner, J. R.; Futcher, B.; Latter, G. I.; Kobayashi, R.; Schwender, B.; Volpe, T.; Anderson, D. S.; Mesquita-Fuentes, R.; Payne, W. E. Electrophoresis 1997, 18, 1347-1360. (14) Gygi, S. P.; Rochon, Y.; Franza, B. R.; Aebersold, R. Mol. Cell Biol. 1999, 19, 1720-1730. (15) Gygi, S. P.; Corthals, G. L.; Zhang, Y.; Rochon, Y.; Aebersold, R. Proc. Natl. Acad. Sci. U.S.A. 2000, 97, 9390-9395. (16) Perrot, M.; Sagliocco, F.; Mini, T.; Monribot, C.; Schneider, U.; Shevchenko, A.; Mann, M.; Jeno, P.; Boucherie, H. Electrophoresis 1999, 20, 2280-2298. (17) Regnier, F. E.; Riggs, L.; Zhang, R.; Xiong, L.; Liu, P.; Chakraborty, A.; Seeley, E.; Sioma, C.; Thompson, R. A. J. Mass Spectrom. 2002, 37, 133-145. (18) Santoni, V.; Molloy, M.; Rabilloud, T. Electrophoresis 2000, 21, 1054-1070. (19) McDonald, W. H.; Ohi, R.; Miyamoto, D. T.; Mitchison, T. J.; Yates, I. I. I. Int. J. Mass Spectrom. 2002, 219, 245-251. (20) Dolinski, K et al. Saccharomyces Genome Database. ftp:// genome-ftp.stanford.edu/pub/yeast/SacchDB/, 10-31-2002. (21) Mewes, H. W.; Frishman, D.; Guldener, U.; Mannhaupt, G.; Mayer, K.; Mokrejs, M.; Morgenstern, B.; Munsterkotter, M.; Rudd, S.; Weil, B. Nucleic Acids Res. 2002, 30, 31-34. (22) Kanehisa, M.; Goto, S.; Kawashima, S.; Nakaya, A. Nucleic Acids Res. 2002, 30, 42-46. (23) Ward J. YMPL: Yeast Membrane Protein Library. http://www. cbs.umn.edu/yeast/, 2003. (24) Yi, E. C.; Marelli, M.; Lee, H.; Purvine, S. O.; Aebersold, R.; Aitchison, J. D.; Goodlett, D. R. Electrophoresis 2002, 23, 3205-3216.

PR0497632