Multidimensional Strategy for Sensitive Phosphoproteomics

Sep 29, 2011 - Department of Biochemistry and Molecular Biology, University of Southern Denmark, Campusvej 55, DK-5230, Odense, Denmark...
0 downloads 0 Views 3MB Size
ARTICLE pubs.acs.org/jpr

Multidimensional Strategy for Sensitive Phosphoproteomics Incorporating Protein Prefractionation Combined with SIMAC, HILIC, and TiO2 Chromatography Applied to Proximal EGF Signaling Kasper Engholm-Keller, Thomas Aarup Hansen, Giuseppe Palmisano, and Martin R. Larsen* Department of Biochemistry and Molecular Biology, University of Southern Denmark, Campusvej 55, DK-5230, Odense, Denmark

bS Supporting Information ABSTRACT:

Comprehensive enrichment and fractionation is essential to obtain a broad coverage of the phosphoproteome. This inevitably leads to sample loss, and thus, phosphoproteomics studies are usually only performed on highly abundant samples. Here, we present a comprehensive phosphoproteomics strategy applied to 400 μg of protein from EGF-stimulated HeLa cells. The proteins are separated into membrane and cytoplasmic fractions using sodium carbonate combined with ultracentrifugation. The phosphopeptides were separated into monophosphorylated and multiphosphorylated pools using sequential elution from IMAC (SIMAC) followed by hydrophilic interaction liquid chromatography of the mono- and nonphosphorylated peptides and subsequent titanium dioxide chromatography of the HILIC fractions. This strategy facilitated the identification of >4700 unique phosphopeptides, while 636 phosphosites were changing following short-term EGF stimulation, many of which were not previously known to be involved in EGFR signaling. We further compared three different data processing programs and found large differences in their peptide identification rates due to different implementations of recalibration and filtering. Manually validating a subset of low-scoring peptides exclusively identified using the MaxQuant software revealed a large percentage of false positive identifications. This indicates that, despite having highly accurate precursor mass determination, peptides with low fragment ion scores should not automatically be reported in phosphoproteomics studies. KEYWORDS: Phosphoproteomics, multidimensional phosphopeptide fractionation, EGF signalling, false discovery rate estimation

’ INTRODUCTION Reversible protein phosphorylation is a fundamental posttranslational modification participating in most cellular processes, including inter- and intracellular signaling, protein synthesis and degradation, as well as cellular proliferation, metabolism, and apoptosis.1 Analysis of the localization and extent of protein phosphorylation is thus of major scientific interest. The field of phosphoproteomics has evolved dramatically due to advances in robust phosphopeptide enrichment techniques and nonhypothesis-driven mass spectrometric phosphopeptide analysis.2,3 Due to the substoichiometric abundance of phosphopeptides versus their nonphosphorylated counterparts, efficient and robust enrichment procedures need to be applied to increase the phosphopeptide coverage from complex mixtures. Such techniques include immobilized metal affinity chromatography r 2011 American Chemical Society

(IMAC),46 phosphoamidate chemistry,7 and metal oxides such as titanium dioxide (TiO2) chromatography.811 Previously, we have developed TiO2 chromatography into a technique that routinely provides very high sensitivity and selectivity toward phosphopeptides, only if specific organic acids such as dihydroxy benzoic acid, phthalic acid, or glycolic acid are used in the loading procedure to out-compete binding of nonphosphorylated peptides.10 However, in most large-scale experiments, the samples remain too complex for in-depth characterization using even the fastest available mass spectrometers, especially when combining purification procedures with quantitation. To overcome this burden, most large-scale phosphoproteomic experiments Received: July 8, 2011 Published: September 29, 2011 5383

dx.doi.org/10.1021/pr200641x | J. Proteome Res. 2011, 10, 5383–5397

Journal of Proteome Research include a prefractionation step by liquid chromatography prior to phosphopeptide enrichment and subsequent reversed-phase (RP) LCMS/MS analysis. Strong cation exchange (SCX) chromatography has been the method of choice in most experiments (e.g., refs 1217); however, desalting is required prior to MS leading to sample loss, and SCX also has the drawback of a lower peak capacity.18 Recently, McNulty and Annan introduced hydrophilic interaction chromatography (HILIC)19 as a promising prefractionation alternative to SCX in phosphoproteomics.20 This chromatographic technique utilizes the partition of analytes between a mainly organic mobile phase and a solid resin-associated waterlayer. The analytes are retained on the column in high organic solvent concentration and are released by increasing concentration of water. HILIC has the advantage of being highly orthogonal to the subsequent RP dimension in a typical offline two-dimensional LCMS setup.21 Furthermore, the HILIC fractionation performed on a TSKGel Amide-80 column had the advantage of employing volatile buffer directly compatible with subsequent IMAC phosphopeptide enrichment thus providing very high selectivity. The fraction of identified multiply phosphorylated peptides was quite low suggesting that HILIC is not the optimal method for separation of multiphosphorylated peptides. Recently, we introduced a novel method for separation of monophosphorylated peptides from multiphosphorylated peptides, which we abbreviated SIMAC (Sequential elution from IMAC), and which increases the number of multiphosphorylated peptides identified from complex mixtures.22 Despite the recent development of robust phosphopeptide enrichment strategies and advanced instrumentation for MS analysis and phosphopeptide fragmentation,2,3 phosphoproteomics remains far from being able to provide a complete coverage of the phosphoproteome of any given cell type. One emerging concern is the correct assignment of the phosphorylated peptide sequence within the increasing number of MS/MS spectra obtained in large-scale studies. Another concern is the correct assignment of the phosphorylated amino acid in the identified peptide sequence.2327 Several database search algorithms have been developed in order to assign peptide sequences to MS/MS spectra.2832 Many of these algorithms are built on probabilistic scoring systems, which assign a significance score to an identified peptide in theoretical peptide databases that match a given fragment spectrum with a certain probability. Most studies report peptide identifications from MS/MS data based on a filtering for 1% false discovery rate (FDR), however this is not performed the same way for all data processing software. Due to the difference in algorithms and the way the FDR is calculated, the same data set can give varying results depending on the program used, which thus intellectually compromises the interpretation of the data and generate poor hypotheses for future biological experimentation. Thus, there are several challenges associated with modern phosphoproteomics including choice of phosphopeptide enrichment method, prior postfractionation to detect a larger coverage, choice of database search program, and sample availability. The latter is very important as it will influence the method of choice for the phosphoproteomics strategy, and the number of phosphopeptides that one might expect to detect. This especially applies to samples containing limited amounts of protein, as fewer and/or lower quality spectra and thereby peptide identifications are obtained. Therefore, most comprehensive phosphoproteomics studies have been carried out using large amounts of cell material,14,15,33 and it was recently stated that large-scale phosphoproteomics on lower than 2 mg of protein has had limited success.34

ARTICLE

To address the issues of sample complexity, abundance, and low multiphosphopeptide yield, we present here a comprehensive strategy combining protein separation by solubility with SIMAC for the separation of monophosphorylated peptides from multiphosphorylated peptides with further fractionation of the non- and monophosphorylated peptides using HILIC followed by TiO2 chromatography of the HILIC fractions. Using stable isotope labeling with amino acids in cell culture (SILAC),35,36 this strategy allowed us to identify more than 4700 unique phosphopeptide sequences from 400 μg of protein from EGF stimulated (5 min) HeLa cells with a false discovery rate of 0.89% via the Mascot database search program. Hereby, we show the feasibility of performing large-scale quantitative phosphoproteomics on submilligram amounts of protein that could be applied to cell material of low abundance. The quantitative analysis revealed 636 EGF-regulated phosphorylation sites, of which both well-known effectors of the EGFR as well as novel EGF-regulated sites from proteins involved in the cell cycle, cellular growth, and differentiation were identified. In addition, we show that the number of phosphopeptides that could be identified from our raw data was similar when using either the DTASuperCharge software or Proteome Discoverer (Thermo Scientific) for peak list generation prior to database searching via Mascot. However, when the same raw data were processed and the identified peptides were filtered in MaxQuant37 30% more phosphopeptides were identified at the same estimated FDR. Manual evaluation of a randomly chosen subset of the low-scoring peptides (Mascot ion score below 22) from the MaxQuant identifications showed that >50% of these were false positives. Extending these findings to the entire data set from MaxQuant, about 10% of the unique phosphopeptides are false positives, thereby showing a significant deviation between the estimated FDR and the actual number of false positives. These results indicate that false discovery rate modeling in the common forward-reverse database search approach becomes unreliable for peptides identified primarily based on precursor mass.

’ MATERIALS AND METHODS GIBCO Custom formulated DMEM (without lysine and arginine) and dialyzed fetal calf serum were from Invitrogen (Taastrup, Denmark). 13C6-Lys and 13C615N4Arg was from Cambridge Isotope Laboratories (Andover, MA). SDS, orthovanadate and Phosphatase Inhibitor Cocktails 1 and 2 were from Sigma (St. Louis, MO). Complete protease inhibitor without EDTA was from Roche Applied Science (Meylan, France). Benzonase was from Merck (Darmstadt, Germany). Biochrom 30 amino acid analyzer was from Biochrom (Cambridge, U.K.). Modified trypsin was from Promega (Madison, WI). Lysyl Endopeptidase (Lys-C) was from Wako Pure Chemical Industries (Osaka, Japan). TSKGel Amide-80 2 mm, 3 μm particle size HILIC column was from Tosoh Bioscience (Stuttgart, Germany). Poros Oligo R3 reversed phase material was from PerSeptive Biosystems (Framingham, MA). StageTips were from Thermo Scientific (Odense, Denmark). Glycolic acid, trifluoroacetic acid, and Na2CO3 were from Fluka (St. Louis, MO). Ultrapure water was from an Elga Purelab Ultra water system (Bucks, U.K.). Titanium dioxide TitanSphere beads were a kind gift from GL Sciences Inc. (Tokyo, Japan). All other chemicals and solvents used were HPLC grade or higher. Cell Culture, Stimulation and Cell Lysis

HeLa cells were cultured in 5 cm cell culture dishes in custom formulated high glucose DMEM with L-glutamine and pyruvate without Lys and Arg and supplemented with 10% dialyzed fetal 5384

dx.doi.org/10.1021/pr200641x |J. Proteome Res. 2011, 10, 5383–5397

Journal of Proteome Research calf serum, and 48 mg/L 13C6-Lys and 21 mg/L 13C615N4Arg. At 95% confluence, the cells were serum starved for 18 h before stimulation. The “heavy” cell culture was stimulated with 150 ng/ mL EGF in DMEM for 5 min, while the “light” culture was “stimulated” with DMEM for 5 min. The cultures were subsequently washed with ice-cold phosphate buffered saline (PBS) and lyzed on ice with ice-cold 0.1 M Na2CO3 lysis buffer containing protease (Roche Complete EDTA Free) and phosphatase inhibitors (1 mM activated pervanadate and Sigma Phosphatase inhibitor Cocktails 1 and 2). The “light” and “heavy” lysates were immediately scraped from the plates, transferred and mixed in a 1.5 mL Eppendorf tube, snap-frozen in liquid nitrogen and stored at 80 °C until further analysis. After thawing, the lysate was adjusted to 1 mM MgCl2 and 5 μL benzonase was added to decrease the viscosity of the lysate by digesting RNA and DNA. After 15 min incubation on ice, the lysates was transferred to centrifuge tubes and centrifuged at 100 000 g for 30 min at 4 °C to fractionate membrane-bound proteins (pellet) from cytosolic proteins (supernatant). The supernatant was removed and acidified to pH ∼4 with 100% formic acid. The pellets were washed and resuspended in lysis buffer and centrifuged (repeated twice) to reduce soluble protein contamination of the membrane pellet. Protein Precipitation and Digestion

The proteins in the supernatant were precipitated by adding trichloroacetic acid (TCA) to an end concentration of 10%, followed by 15 min incubation on ice and 15 min of centrifugation at 5,000 x g. The pellets containing soluble proteins were washed with ice cold acetone and lyophilized. The soluble and membrane-associated protein pellets were dissolved in 30 μL 6 M urea, 2 M thiourea by mechanical disruption using a 1.5 mL Kontes pellet pestle (FisherScientific, Slangerup, Denmark) and the protein concentration was determined by amino acid analysis using a Biochrom 30 amino acid analyzer (Biochrom, Cambridge, U.K.) to ∼300 μg (the soluble protein fraction) and ∼100 μg (membrane protein fraction). The two fractions were adjusted to 50 mM ammonium bicarbonate, 1 mM CaCl2, pH 7.8 prior to reduction (20 mM DTT for 45 min at 25 °C) and subsequent alkylation (40 mM iodoacetamide in the dark for 50 min at 25 °C). The membrane fraction was further treated with 0.5 μL sialidase A and 1 μL PNGase F overnight to remove glycan structures such as sialylated glycopeptides, which also adsorb to TiO238 thereby decreasing the phosphopeptide enrichment efficiency. The proteins in each fraction were digested with 0.02 AU Lys-C for 3 h. After digestion the solution was diluted to a final urea concentration of 1 M and trypsin was added (1:100 w/w) and the samples incubated overnight at 25 °C. The peptide samples were acidified to 0.1% TFA and insoluble material (e.g., lipids) removed by centrifugation at 14 000 g for 5 min. The supernatants containing tryptic peptides were stored for further SIMAC enrichment. SIMAC Enrichment

The peptide samples were subjected to the SIMAC enrichment protocol essentially as described in22 in order to separate multiphosphorylated peptides from mono- and nonphosphorylated peptides. The peptide solutions were diluted to the IMAC loading/washing buffer (50% acetonitrile (ACN), 0.1% TFA) and applied onto 60 μL PhosSelect IMAC beads (Sigma-Aldrich, St. Louis, MO) in an Eppendorf tube. After incubating the solutions for 30 min under rotation, the slurries were loaded onto constricted 200 μL GeLoader tips and small IMAC columns

ARTICLE

were generated by applying a gentle air pressure. For each solution the flow through was collected in a new low binding Eppendorf tube and the IMAC columns were washed using 50 μL washing buffer, which was collected together with the flow through. The IMAC column was further washed with 20% ACN, 1% TF to elute monophosphorylated and acidic peptides from the IMAC resin. The 1% TFA fraction was collected together with the flow through solution. The multiphosphorylated peptides were subsequently eluted using 1% ammonium hydroxide, pH 11.3 HILIC Fractionation of Mono- and Nonphosphorylated Peptides and Subsequent TiO2 Enrichment

The acidic elution and flow through from SIMAC were separated on a TSKGel Amide 80 HILIC HPLC column (length: 15 cm, diameter: 2 mm, particle size: 3 μm) essentially as described in.20 A total of 31 fractions from the soluble protein fraction and 14 from the membrane protein fraction were collected, adjusted to 80% ACN, 5% TFA, 1 M glycolic acid and enriched for phosphopeptides using 0.2 mg (for the soluble protein fractions) or 0.1 mg (for the membrane protein fractions) of TiO2 per fraction, essentially as described in,39 except for the enrichment being performed in batch-mode. Furthermore, the TiO2 beads were washed with aqueous solvent (0.1% TFA) to elute hydrophilic nonphosphorylated peptides, which tend to bind via HILIC mode. The fractions were subsequently pooled two and two and re-enriched using half of the amount of TiO2 as used in the first enrichment. nanoLCESIMS/MS

A nanoflow HPLC system (EASY-nLC, Thermo Scientific, Odense, Denmark) was used for online reversed phase chromatographic separation of the phosphopeptide samples prior to nanoESI and MS detection. The peptides were loaded onto an 18 cm long fused silica capillary column (100 μm inner diameter) packed with ReproSil-Pur C18 AQ 3 μm reverse phase material (Dr. Maisch, Germany). The peptides were eluted at 250 nL/min by an increasing concentration of ACN (034% B-buffer (95% ACN, 0.1% formic acid) in 50 min for the HILIC fractions and 034% B-buffer in 180 min for the SIMAC pH 11 elution fractions). Purified phosphopeptides were analyzed utilizing automated data-dependent acquisition on a LTQ-Orbitrap XL (hybrid-2D-Linear Quadrupole Ion Trap-Orbitrap) mass spectrometer (Thermo Scientific, Bremen, Germany). Each MS scan was acquired at a resolution of 30,000 fwhm (at 400 m/z) and was followed by 3 MS/MS scans triggered at an intensity of 30 000 using Multi Stage Activation (MSA)40 (normalized collision energy 35; MSA on the loss of 98 (for HILIC fractions) or 98 and 196 (for SIMAC multiphosphorylated samples) from 2+, 3+, and 4+ ions and the sample, respectively). The maximum ion injection time was 500 for both MS and MS/MS scans. The automatic gain control (AGC) target value was 1 000 000 for MS scans in the Orbitrap and 40 000 for the MS/MS scans in the LTQ. All results presented are from the first replicate except noted otherwise. Data Processing and Analysis—DTASuperCharge

Raw data from the LTQ Orbitrap XL instrument were processed using the extract_msn.exe program (included in Xcalibur 2.0.5, Thermo Scientific) through DTASuperCharge (version 1.31) (http://msquant.sourceforge.net) and converted into Mascot generic format (mgf) peak list files. Deisotoping was 5385

dx.doi.org/10.1021/pr200641x |J. Proteome Res. 2011, 10, 5383–5397

Journal of Proteome Research performed using the software default settings. The peak lists were searched against a concatenated forward-reverse human IPI protein database (version 3.44 on an in-house Mascot server (version 2.2.03) (Matrix Science, London, U.K.). The database searches were performed with fixed modification carbamidomethyl (C) and variable modifications acetylation (protein N-term.), oxidation (M), N-terminal pyroQ, phosphorylation (STY), and SILAC (K +6 and R +10) as quantitation. Enzyme specificity was selected to trypsin with two missed cleavages allowed. The initial search was performed with a 12 ppm peptide tolerance ion and 0.6 Da MS/MS tolerances. The identified peptide masses were recalibrated using the MSRecal PERL script (http://msquant.sourceforge.net) and the data were researched with 7 ppm peptide tolerance and 0.6 Da MS/MS tolerance. A cutoff value of Mascot score g22 and peptide length >5 was chosen to give a false discovery rate of 5 was chosen to give a false discovery rate of