Improve the Coverage for the Analysis of Phosphoproteome of HeLa

Apr 3, 2012 - Complete coverage of all phosphorylation sites in a proteome is the ultimate goal for large-scale phosphoproteome analysis. However, onl...
1 downloads 8 Views 576KB Size
Article pubs.acs.org/jpr

Improve the Coverage for the Analysis of Phosphoproteome of HeLa Cells by a Tandem Digestion Approach Yangyang Bian,†,‡ Mingliang Ye,*,† Chunxia Song,†,‡ Kai Cheng,†,‡ Chunli Wang,†,‡ Xiaoluan Wei,† Jun Zhu,†,‡ Rui Chen,†,‡ Fangjun Wang,† and Hanfa Zou*,† †

Key Laboratory of Separation Sciences for Analytical Chemistry, National Chromatographic R&A Center, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China ‡ Graduate School of Chinese Academy of Science, Beijing 100049, China S Supporting Information *

ABSTRACT: Complete coverage of all phosphorylation sites in a proteome is the ultimate goal for large-scale phosphoproteome analysis. However, only making use of one protease trypsin for protein digestion cannot cover all phosphorylation sites, because not all tryptic phosphopeptides are detectable in MS. To further increase the phosphoproteomics coverage of HeLa cells, we proposed a tandem digestion approach by using two different proteases. By combining the data set of the first Glu-C digestion and the second trypsin digestion, the tandem digestion approach resulted in the identification of 8062 unique phosphopeptides and 8507 phosphorylation sites in HeLa cells. The conventional trypsin digestion approach resulted in the identification of 3891 unique phosphopeptides and 4647 phosphorylation sites. It was found that the phosphorylation sites identified from the above two approaches were highly complementary. By combining above two data sets, in total we identified 10899 unique phosphopeptides and 11262 phosphorylation sites, corresponding to 3437 unique phosphoproteins with FDR < 1% at peptide level. We also compared the kinase motifs extracted from trypsin, Glu-C, or a second trypsin digestion data sets. It was observed that basophilic motifs were more frequently found in the trypsin and the second trypsin digestion data sets, and the acidic motifs were more frequently found in the Glu-C digestion data set. These results demonstrated that our tandem digestion approach is a good complement to the conventional trypsin digestion approach for improving the phosphoproteomics analysis coverage of HeLa cells. KEYWORDS: tandem protease digestion, Glu-C, phosphorylation sites, phosphoproteomics, LC−MS/MS, kinase motif



INTRODUCTION Reversible phosphorylation of protein is one of the most important posttranslational modifications in eukaryotic cells, which involves the regulation of a large variety of cellular functions, such as cell cycle regulation, apoptosis, and cell differentiation.1 It is estimated that approximately 30% of all the proteins in a cell are phosphorylated during the life cycle.2,3 Complete coverage of all phosphorylation sites in a proteome is the ultimate goal in large-scale phosphoproteome analysis.4 However, this goal is not likely to be accomplished in the near future because of technique difficulties. Many strategies have been developed to improve the coverage of global phosphoproteome analysis.5−10 In a popular strategy, strong cation exchange chromatography (SCX) was used to fractionate the complex peptides mixture, and then immobilized metal ion affinity chromatography (IMAC) was used to enrich phosphopeptides from each peptide fraction. This strategy was used for large-scale phosphoproteome analysis of mouse liver6 and Drosophila melanogaster embryos,11 which resulted in the identification of 5635 and 13720 phosphorylation sites, © 2012 American Chemical Society

separately. In another strategy, phosphopeptides enriched by IMAC were fractionated by hydrophilic interaction chromatography (HILIC) and then submitted to RPLC−MS/MS analysis, leading to the identification of 8764 unique phosphopeptides from the yeast Saccharomyces cerevisiae.12 Recently, a RP-RP strategy was proposed for global phosphoproteome analysis.13 In this strategy, phosphopeptides enriched from human liver were first fractionated by an offline RPLC at high pH, and then these fractions were pooled and submitted to an online RPLC− MS/MS. Finally, about 10000 phosphorylation sites were identified, and it was the largest data set for human liver phosphoproteome. However, it should be noted that the phosphorylation sites identified by phosphoproteome analysis were far from comprehensive.14 One important reason is that only a single protease, i.e., trypsin, was used to digest the proteome sample in the global phosphoproteome analysis strategies described Received: December 11, 2011 Published: April 3, 2012 2828

dx.doi.org/10.1021/pr300242w | J. Proteome Res. 2012, 11, 2828−2837

Journal of Proteome Research

Article

Figure 1. Schematic diagram of the tandem protease approach for the phosphoproteome analysis of HeLa cells.

However, they mostly focused on the second trypsin digestion and ignored the importance of the Glu-C generated phosphorylation sites. HeLa cells are a model cell line used in many scientific studies, and the phosphoproteome of HeLa cells had been extensively analyzed. Up to now, several large-scale HeLa phosphoproteome analyses were performed.21−24 Among them, the largest data set was published by Olsen et al.24 They identified 20443 phosphorylation sites in HeLa cells by using trypsin for digestion. Because extensive fractionation methods have been applied to in-depth phosphoproteome analysis of HeLa cells, further increasing the coverage by using new separation technique is extremely challenging. In this report, a tandem digestion approach (Figure 1) was used to improve the phosphoproteomic coverage of HeLa cells. We reasoned the poor results of Glu-C digestion and introduced a second trypsin digestion step for Glu-C generated peptides to increase the number of phosphorylation site identifications. The tandem digestion approach produced more phosphorylation sites compared with the traditional trypsin digestion approach, and the overlap between the two data sets was very low, indicating the high complementarity. By combining the tandem digestion approach and the conventional trypsin digestion approach, in total 11262 phosphorylation sites were identified in HeLa cells. Among them, 6858 phosphorylation sites had the localization probability higher than 99%. We had also compared our high confidence results with the three major phosphorylation site databases of human phosphoproteome: 867 phosphorylation sites identified from traditional trypsin digestion approach were not collected in the databases, while 2332 phosphorylation sites from tandem digestion approach were not collected in the databases. In the tandem digestion approach, we identified not only a much higher ratio of acidic kinase motifs, but also no basophilic motifs from the Glu-C digestion data set, while in

above. It had been shown that the use of multiple proteases could remarkly increase the sequence coverage of identified proteins for large-scale proteomics analysis.15 The lowspecificity proteases such as elastase, proteinase K, and thermolysin had been used for in-depth mapping of phosphorylation sites of murine circadian protein period 2; a total of 21 phosphorylation sites were identified from this protein, whereas only 6 sites were identified by trypsin.16 The low-specificity protease elastase had been used together with trypsin for phosphoproteome analysis of mitotic spindle proteins;17 the overlap of phosphorylation sites identified by the two proteases was less than 10%. However, the drawback for using low-specificity protease was their long database search time and poor sensitivity, especially when they were used for complex samples. Therefore, specific proteases other than trypsin such as Glu-C, Lys-C, and Lys-N drew more attention for phosphoproteomics analysis.14,18 Since Lys-C also cleaves proteins after lysine, the peptides generated by Lys-C have some overlap with trypsin.14 As Glu-C cleaves proteins after glutamic acid (E) or aspartic acid (D), the specificity of trypsin and Glu-C is completely orthogonal. Wang et al.19 had used both Glu-C and trypsin for the mapping of phosphorylation sites in human MSK1; the phosphorylation sites identified by Glu-C and trypsin were highly complementary. What is more, the phosphorylation sites identified from Glu-C digestion were mostly not reported. Molina et al.18 had compared the CID and ETD modes for detection of phosphopeptides generated by trypsin, Lys-C and Glu-C, and they found that the number of phosphopeptides identified by Glu-C was much lower. Recently, a sequential digestion approach by combining the use of Glu-C and trypsin was reported to improve the phosphoproteomic coverage by Gilmore et al.20 It was demonstrated that the first Glu-C digestion was very essential to increase the coverage of phosphoproteome analysis. 2829

dx.doi.org/10.1021/pr300242w | J. Proteome Res. 2012, 11, 2828−2837

Journal of Proteome Research

Article

6% TFA and mixed with the peptide mixture with a microspheres-to-protein ratio of 10/1 (w/w). After violent vibration for 30 min, the supernatant was removed after centrifugation, then the microspheres were sequentially washed with washing buffer 1 containing 50% ACN, 6% TFA, 200 mM NaCl and washing buffer 2 containing 30% ACN, 0.1% TFA for 30 min. The bound phosphopeptides were eluted with 200 μL of 10% ammonia−water under vibration for 15 min, followed by sonication for another 15 min. After centrifugation at 20000g for 3 min, the supernatant was collected and lyophilized to dryness.

the second trypsin digestion data set, we identified the highest ratio of proline-directed motifs.



EXPERIMENTAL SECTION

Materials and Chemicals

Dithiothreitol (DTT), iodoacetamide (IAA), ammonium bicarbonate (NH4HCO3), trifluoroacetic acid (TFA), sodium orthovanadate (Na3VO4), sodium fluoride (NaF), and trypsin were all obtained from Sigma Aldrich (St. Louis, MO, USA). EDTA, EGTA, and PMSF were purchased from Amresco (Solon, OH, USA). Formic acid (FA) was bought from Fluka (Buches, Germany), and acetonitrile (ACN, HPLC grade) was purchased from Merck (Darmstadt, Germany). Endoproteinase Glu-C from Staphylococcus aureus V8 were obtained from Roche (Mannheim, Germany). All the water used in the experiments was purified using a Milli-Q system from Millipore Company (Bedford, MA,USA). Daisogel ODS-AQ (3 μm, 12 nm pore) was purchased from DAISO Chemical Co., Ltd. (Osaka, Japan). Magic C18AQ (5 μm, 20 nm pore) was purchased from Michrom BioResources (Auburn, CA, USA). Fused silica capillaries with 75 μm i.d. were purchased from Yongnian Optical Fiber Factory (Hebei, China).

Mass Spectrometric Analysis

A quaternary surveyor MS pump (Thermo, San Jose, CA, USA) coupled with a LTQ Orbitrap XL mass spectrometer (Thermo) was used. The capillary analytical column with a 75 μm i.d. was packed in house with C18 particles (3 μm, 12 nm pore) to 12 cm length. The mobile phase A was 0.1% formic acid (v/v) in H2O, and mobile phase B was 0.1% formic acid in ACN. For the 1D RPLC−MS/MS, phosphopeptides enriched from trypsin digestion or Glu-C digestion of 250 μg of HeLa cell lysates or phosphopeptides from second trypsin digestion were redissolved in 5 μL of 0.1% formic acid solution, and 1 μL of the sample was then manually loaded onto the analytical column. The gradient elution was performed by 0−25% B in 112 min with flow rate ∼200 nL/min. For the 2D-RPLC−MS/ MS, a 14 cm RP-SCX biphasic column (200 μm i.d.) was prepared as previously described28 and used as the trapping column as well as the first dimension separation column. Phosphopeptides enriched from Glu-C, trypsin digestion of 750 μg of HeLa cell lysate, or second trypsin digestion of Glu-C generated peptides were redissolved in 50 μL of 0.1% formic acid solution and manually loaded onto the biphasic trapping column by air pressure. The phosphopeptides retained on the RP segment were eluted by a 145 min RP gradient nanoflow LC−MS/MS (0 mM) onto the SCX monolithic column of biphasic column. Then a series of stepwise elutions with salt concentrations of 24, 40, 56, 72, 100, 160, and 500 mM NH4AC was used to elute phosphopeptides from SCX column to the second dimensional C18 separation column. Each elution lasted 10 min and was followed by equilibrium by 0.1% formic acid for additional 15 min. Each elution step was followed by a subsequent RPLC−MS/MS analysis with a 97 min gradient from 0 to 25% acetonitrile. The mass spectrometer was set as follows: ion transfer capillary 200 °C, spray voltage 1.8 kV, full MS range 400−2000, and full mass spectra was acquired in the Orbitrap at a resolution of 60000 with the target ion setting of 106. One full MS scan was followed by three MS2 scans and three neutral loss MS3 scans; more details were the same as described by Han et al.7 The dynamic exclusion function was set as follows: repeat count 2, repeat duration 30 s, and exclusion duration 60 s.

Sample Preparation

The HeLa cells were grown in RPMI-1640, supplemented with 10% bovine serum, 100 U/mL of streptomycin and penicillin. The cells were harvested at about 80% density. The cell pellets were softly homogenized in an ice-cold lysis buffer containing 8 M urea, 50 mM Tris-HCl (pH = 7.4), 65 mM DTT, 2% protease cocktail (v/v), 1% Triton X-100 (v/v), 1 mM EDTA, 1 mM EDGA, 1 mM PMSF, 1 mM NaF, and 1 mM Na3VO4, sonicated for 400 W × 120 s, and centrifuged at 25000g for 1 h. The supernatant containing the total HeLa cell proteins was precipitated with 5 volumes of ice-cold acetone/ethanol/acetic acid (v/v/v = 50/50/0.1) at −20 °C. Protein precipitant was centrifuged at 14000g for 30 min. The pellet was washed separately with 1 mL of acetone and 75% ethanol, then lyophilized to dryness and stored at −80 °C. Protein Digestion

The HeLa cell proteins were diluted in reducing buffer containing 100 mM NH4HCO3 (pH = 8.2) and 8 M urea as previously reported by Zhou et al.25 The protein concentration was determined by Bradford assay, and the mixture was reduced by 10 mM DTT at 60 °C for 1 h and then alkylated by 20 mM IAA in the darkness at room temperature for 30 min. After that, 100 mM NH4HCO3 buffer (pH = 8.2) was added until the protein concentration was about 1 mg/mL. For trypsin digestion, the digestion was performed at 37 °C overnight with an enzyme-to-protein ratio of 1/25 (w/w). For Glu-C digestion, the digestion was performed at 30 °C for 18 h with an enzyme-to-protein ratio of 1/40 (w/w). For the second trypsin digestion of Glu-C generated peptides, trypsin was added to the Glu-C digestion with an enzyme-to-protein ratio of 1/25 (w/w), and the digestion was performed at 37 °C overnight. Each digestion was then stored at −80 °C for further analysis.

Database Search and Data Analysis

The peak lists for the MS2 and MS3 were generated by BioWorks (Thermo, version 3.3.1 SP1) with the following parameters: mass range, 600−3500 Da; intensity threshold, 1000; minimum ion count, 10. The spectras were searched using SEQUEST against a composite database including both original database (IPI.human.v.3.17, including 60234 entries) and the reversed complement. The parameters were set as follows: precursor-ion mass tolerance, 10 ppm; fragment-ion mass tolerance, 1 Da; static modification, Cys (+57.0215 Da); dynamic modifications, Met (+15.9949 Da), Ser, Thr, and Tyr

Phosphopeptide Enrichment

The Ti4+-IMAC microspheres were prepared in our lab with the method described by Yu et al.26 The phosphopeptides were enriched by Ti4+-IMAC microspheres following the protocol reported by Zhou et al.27 Briefly, the Ti4+-IMAC microspheres were first suspended in loading buffer containing 80% ACN and 2830

dx.doi.org/10.1021/pr300242w | J. Proteome Res. 2012, 11, 2828−2837

Journal of Proteome Research

Article

(+79.96633 Da). For the searching of MS3 data, additional dynamic modifications were also set for water loss on Ser and Thr (−18.0000 Da). The cleavage site was set according to the manufacture’s guide: trypsin, KR/P; Glu-C, E; for the second trypsin digested sample, the cleavage sites were set as KRE. Enzyme limits were set as fully enzymatic; two, three, and four missed cleavage sites were permitted for trypsin, Glu-C, and second trypsin digested peptides, respectively. A homemade software named Armone was applied to validate the identifications as reported by Jiang et al.29,30 Herein, Rank’m, ΔCn’m, and Xcorr’s were used as cutoff filters to achieve false discovery rate (FDR)