Online Automated in Vivo Zebrafish Phosphoproteomics: From Large-Scale Analysis Down to a Single Embryo Simone Lemeer,†,‡ Martijn W. H. Pinkse,†,§ Shabaz Mohammed,† Bas van Breukelen,† Jeroen den Hertog,‡ Monique Slijper,† and Albert J. R. Heck*,† Biomolecular Mass Spectrometry and Proteomics Group, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, Utrecht University, Sorbonnelaan 16, 3584 CA Utrecht, The Netherlands, and Hubrecht Institute, Uppsalalaan 8, 3584 CT Utrecht, The Netherlands Received October 15, 2007
In the developing embryo, as in many other biological processes, complex signaling pathways are under tight control of reversible phosphorylation, guiding cell proliferation, differentiation, and growth. Therefore the large-scale identification of signaling proteins and their post-translational modifications is crucial to understand the proteome biology of the developing zebrafish embryo. Here, we used an automated, robust, and sensitive online TiO2-based LC-MS/MS setup to enrich for phosphorylated peptides from 1 day old zebrafish embryos. We identified, with high confidence, 1067 endogenous phosphorylation sites in a sample taken from 60 embryos (approximately 180 µg), 321 from 10 embryos, and 47 phosphorylation sites from a single embryo, illustrating the sensitivity of the method. This data set, representing by far the largest for zebrafish, was further exploited by searching for serine/threonine or tyrosine kinase motifs using Scansite. For one-third of the identified phosphopeptides a potential kinase motif could be predicted, where it appeared that Cdk5 kinase, p38MAPK, PKA, and Casein Kinase 2 substrates were the most predominant motifs present, underpinning the importance of these kinases in signaling pathways in embryonic development. The phosphopeptide data set was further interrogated using alignments with phosphopeptides identified in recent large-scale phosphoproteomics screens in human and mouse samples. These alignments revealed conservation of phosphorylation sites in several proteins suggesting preserved function in embryonic development. Keywords: online TiO2 phosphopeptide enrichment • in vivo phosphoproteomics • embryonic development • kinase motifs • conservation of phosphorylation sites
Introduction Over the past few years, zebrafish has become a wellestablished model organism for vertebrate development and human disease.1–3 The genome biology of the zebrafish has been interrogated following gene transcription patterns during various stages of development and genes crucial for normal development have been identified.4,5 Surprisingly large-scale proteomic screens have rarely been reported.6–8 The few “proteome biology” studies available however clearly show that proteomic analysis of different stages in the embryonic development of the zebrafish provide crucial information on proteins regulated on a translational and post-translational levels, undetectable by gene expression analysis alone.6–8 During embryonic development, protein phosphorylation is one of the most important regulatory events in cells to modulate cellular processes like migration, differentiation, and growth. Improved knowledge about the identity of phosphorylated proteins and * To whom correspondence should be addressed. Tel, 31-30-2536797; fax, 31-30-2518219; e-mail,
[email protected]. † Utrecht University. ‡ Hubrecht Institute. § Present address: Analytical Biotechnology, Delft University of Technology, Julianalaan 67, 2628 BC Delft, The Netherlands. 10.1021/pr700667w CCC: $40.75
2008 American Chemical Society
their interaction partners,9 and responsible kinases will improve our understanding of signaling pathways. Mass spectrometry based phosphoproteomic analyses allow the direct site-specific identification of post-translational modifications on endogenously expressed proteins, providing important insights into signal transduction pathways. Methods to improve the large-scale identification of phosphorylation sites have therefore received considerable attention in recent years.10–15 The most successful phosphoproteomic analyses use enrichment procedures prior to analysis by mass spectrometry. Phospho-specific antibodies have been effectively applied to enrich for tyrosine-phosphorylated proteins and peptides.16–20 This approach, however, is not equally suitable for serine- and threonine-phosphorylated peptides and proteins, due to the lack of efficient antibodies. Analytical approaches using affinity purification through immobilized metal affinity chromatography (IMAC), with and without methyl-esterification have proven quite successful.16,20–22 However, implementation of IMAC is laborious, protocols vary widely, and it requires extensive experience. Automated online IMAC-LC-MS-MS methods have been described, but these require complicated LC setups with multiple pumps and switching valves.17 Journal of Proteome Research 2008, 7, 1555–1564 1555 Published on Web 02/29/2008
research articles Titanium dioxide was recently introduced as a more facile alternative to IMAC for effective isolation of phosphorylated peptides from complex mixtures.23,24 An experimentally convenient procedure using self-packed or commercially available TiO2 microcolumns has become very popular for phosphopeptide enrichment.25–28 However, this method lacks reproducibility, and it is less suited for automated large-scale phosphoproteomics analyses. A more comprehensive enrichment strategy includes peptide Strong Cation Exchange (SCX) preseparation, to preconcentrate phosphorylated peptides and reduce sample complexity, followed by TiO2 enrichment which has proven to be the most successful method to characterize the phosphoproteome.19,29 The recently reported online RP-TiO2-RP-LC-MS/MS method allows automated large-scale phosphoproteomics experiments without extensive manual intervention at high speed and good sensitivity30 and can even be implemented on a micro fluidics device.31 This fully automated setup is extremely robust and allows the complete analysis of a sample since both flow through (i.e., nonphosphorylated fraction) and phosphorylated fractions are analyzed in a single experiment. Here, we apply this combination of SCX and online RP-TiO2-RP-LC-MS/MS and perform a global analysis of in vivo phosphorylation in embryonic zebrafish. We performed phosphoproteomics analyses using the combination of SCX followed by online RP-TiO2-RP-LC-MS/MS using as starting material 60, 10, or 1 of 24 h post-fertilization (hpf) embryos. Our method proved to be highly sensitive, enabling the confident identification of more than 1000 phosphopeptides. In extreme circumstances such as the analysis of a single embryo, corresponding to 3 µg of starting material, tens of phosphosites could be assigned. We sampled our large data set for conserved phosphorylation motifs which provided insights into global kinase activity in 24 hpf zebrafish embryos. Using BLAST-like alignments, we evaluated the phosphopeptides in our data set and searched for conserved phosphorylation sites between zebrafish and human/mouse, which provided us with several phosphoproteins that appear to be important for embryonic development and have conserved phosphorylation sites between species.
Methods Sample Preparation. Embryos (24 hpf) were manually dechorionated and deyolked with deyolking buffer (1/2 Ginzburg Fish Ringer) without calcium.32 Subsequently, embryos were lysed in 8 M urea/25 mM ammonium bicarbonate containing 5 mM sodium phosphate, 1 mM potassium fluoride, 1 mM sodium orthovanadate, pH 8.2, and EDTA-free protease inhibitor cocktail (Sigma). Homogenized lysates were centrifuged at 14 000g to pellet cellular debris. Lys-C (Roche Diagnostics) was added to the lysate, and digestion was performed for 4 h at 37 °C. Samples were reduced with DTT at a final concentration of 2 mM at 56 °C; subsequently, samples were alkylated with iodoacetamide at a final concentration of 4 mM at 20 °C. The eluate was diluted to 2 M urea/50 mM ammoniumbicarbonate, and trypsin (Roche Diagnostics) was added. Digestion was performed overnight at 37 °C. Strong Cation Exchange. Strong cation exchange was performed using a Zorbax BioSCX-Series II columns (0.8 mm (i.d.) × 50 mm (l); particle size, 3.5 µm), a Famos autosampler (LCpackings, Amsterdam, The Netherlands), a Shimadzu LC9A binary pump, and a SPD-6A UV-detector (Shimadzu, Tokyo, Japan). Prior to SCX chromatography, protein digests were 1556
Journal of Proteome Research • Vol. 7, No. 4, 2008
Lemeer et al. desalted using a small plug of C18 material (3 M Empore C18 extraction disk) packed into a GELoader Tip as previously described.33 The eluate was dried completely by vacuum centrifugation and subsequently reconstituted in 20% ACN and 0.05% formic acid. After injection, the first 10 min were run isocratically at 100% solvent A (0.05% formic acid in 8:2 (v/v) water/ACN, pH 3.0), followed by a linear gradient of 1.3% min-1 solvent B (500 mM NaCl in 0.05% formic acid in 8:2 (v/v) water/ ACN, pH 3.0). A total number of 25 SCX fractions (1 min each, i.e., 50 µL elution volume) were manually collected and dried in a vacuum centrifuge. 2D Nanoflow HPLC. Nanoflow LC-MS/MS was performed by coupling an Agilent 1100 HPLC system (Agilent Technologies, Waldbronn, Germany) with an Orbitrap mass spectrometer (Thermo Electron, Bremen, Germany). The trap column consists of three separate precolumns; a 30 mm (l) × 100 µm Aqua C18 precolumn, followed by a 5 mm (l) × 100 µm (i.d.) TiO2 precolumn, followed by a 30 mm (l) × 100 µm (i.d.) Aqua precolumn. The ‘sandwich’ precolumn is then coupled with a 200 mm (l) × 50 µm (i.d.) ReproSil-Pur C18-AQ analytical column. Peptides were trapped at 5 µL/min in 100% solvent A (0.1 M acetic acid and 0.13 M formic acid in water) on the first 30 mm C18 trap column. The subsequent H2O/ACN gradient elutes and separates bound peptides using the analytical column at a flow rate of ∼100 nL min-1. Phosphorylated peptides will pass through the TiO2 precolumn at this flow rate and bind, as previously described.30,31 All other peptides, with no TiO2 affinity, are chromatographically separated at ∼100 nL/ min in a 100-min gradient from 0 to 40% solvent B (80% ACN, 0.1 M acetic acid, and 0.13 M formic acid). Elution of phosphorylated peptides is achieved by injection of 30 µL of 100 mM ammonium hydrogen bicarbonate, pH 9.0 (adjusted with ammonia), containing 10 mM sodium phosphate, 5 mM sodium orthovanadate, and 1 mM potassium fluoride, followed by an injection of 20 µL of 5% formic acid. During a second H2O/ACN gradient, phosphopeptides are separated using the 200 mm analytical column at ∼100 nL/min in a 100-min gradient from 0 to 40% solvent B (80% ACN, 0.1 M acetic acid,and 0.13 M formic acid). The eluent was sprayed via distal coated emitter tips (New Objective), butt-connected to the analytical column. Between the high voltage supply and the Orbitrap and the electrospray needle, an additional 33 MΩ resistor was placed to reduce ion current. Mass Spectrometry. The mass spectrometer was operated in data-dependent mode, automatically switching between MS and MS/MS and neutral loss driven MS3 acquisition. Full-scan MS spectra (from m/z 300–1500) were acquired in the Orbitrap with a resolution of 60 000 at m/z 400 after accumulation to target value of 500 000. The three most intense ions at a threshold above 5000 were selected for collision-induced fragmentation in the linear ion trap at a normalized collision energy of 35% after accumulation to a target value of 10 000. The data-dependent neutral loss settings were chosen to trigger a MS3 event after a neutral loss of either 24.5 or 32.6 of 49 ( 0.5 m/z units was detected among the 5 most intense fragment ions. Data Analysis. All MS2 and MS3 spectra were converted to single DTA files using Bioworks 3.2. An in-house developed Perl script was used to assign the original and accurate parent mass to all MS3 spectra, enabling an accurate mass database search. All first and second LC-MS runs of all SCX fractions were searched using an in-house licensed Mascot search engine (Matrix Science) against the Zebrafish IPI database version 3.31
Online Automated in Vivo Zebrafish Phosphoproteomics (46028 entries) with carbamidomethyl cysteine as a fixed modification, Protein N-acetylation, oxidized methionines, and phosphorylation of serine, threonine, or tyrosine were set as variable modifications. Trypsin was specified as the proteolytic enzyme, and up to two missed cleavages were allowed. The mass tolerance of the precursor ion was set to 5 ppm, and that of fragment ions was set to 0.9 Da. An in-house-developed java-script was used to extract unique phosphorylated peptides from Mascot DAT files. Unless stated otherwise, the threshold for phosphopeptide Mascot identification was set to 35; multiple identifications of the same peptide, including different charge states were omitted from the count, resulting in an absolute number of unique phosphopeptides. In-house-developed Perl and Java scripts are available upon request. Scaffold (version Scaffold-01_05_00, Proteome Software, Inc., Portland, OR) was used to validate protein identifications and to present identified proteins as a datafile (made available online). Protein identifications were accepted if they could be established at greater than 99.0% probability and contained at least 2 identified peptides.34 A BLAST tool (http://www.ncbi.nlm.nih.gov/blast/) was used to identify protein names from their sequence. Scansite Annotation. Phosphorylation motif detection in the phosphopeptides was performed using in-house-developed scripts that extract the phosphopeptides from Mascot results file (dat files) with a user-defined Mascot score cutoff >35 (corresponding p-value e 0.005). Second, the phosphopeptide sequences were extended with 10 aa to both the N- and C-termini (if possible). Subsequently, this extended peptide set was submitted to the Scansite Parallel Version, with the search stringency set to medium (http://stjuderesearch.org/scansite/ and ref 35) to obtain the possible phosphorylation motifs in these peptides. Cross-Species Phosphopeptide Conservation. A database containing phosphopeptides originating from multiple species was created using data sets obtained from literature: mouse liver36 and human (HeLa).19 All peptides were stored in a FASTA file format. Subsequently, the FASTA file was converted to a BLAST database. By means of an in-house-developed script, all zebrafish phosphopeptides identified were searched against the database with PSI-blast (v 2.2.11)37 algorithm using the following settings: blastpgp, BLOSUM62, Max passes ) 10, evalue threshold 1.0, and Expect value (E) ) 1.0. All possible hits with their alignments were stored in a flat file and manually curated.
Results We set out to evaluate the in vivo phosphoproteome of 24 hpf zebrafish embryos using an online automated RP-TiO2RP-LC-MS-MS. A schematic overview of the complete experimental approach is shown in Figure 1A. First, we prepared proteolytic digests of 60, 10, and 1 24-h hpf embryo, as described in Methods. (An illustrative picture of such an embryo is shown in Figure 1A). The resulting peptide mixture was first separated and fractionated by strong cation exchange chromatography which, under acidic conditions, causes phosphopeptides to elute earlier than regular tryptic peptides.38 Resulting SCX fractions were subsequently subjected to our online RP-TiO2-RP-LC-MS-MS enrichment setup. The resulting peptides were analyzed by mass spectrometric detection using an LTQ-Orbitrap, whereby the MS/MS spectra were searched against the IPI zebrafish database using Mascot to identify
research articles peptide sequence and phosphorylation sites. Phosphopeptide identifications were accepted if the Mascot peptide score was g35 and p-value e 0.005 unless stated otherwise. Applying these selected thresholds for identification, we were able to identify 1067 phosphorylation sites in 604 proteins from the 60 embryo sample. In this experiment, only 50 phosphopeptides were detected in the flowthrough, revealing the selectivity of the enrichment method. In the 10 embryo data set, we identified 321 phosphopeptides from 231 proteins. Finally, taking just a single embryo, we were still able to identify 47 unique phosphopeptides. (A list of all identified phosphopeptides, including Mascot score, protein name, and IPI accession number is available as Supplementary Table 1 in Supporting Information. Raw data are available as Scaffold file at https://bioinformatics.chem.uu.nl/supplementary/lemeer_JPR/) and submitted to PRIDE. Although the majority of the phosphopeptides eluted, as expected, in the earlier SCX fractions, there is a significant contribution from other ‘singly charged’ species such as N-terminal blocked peptides originating from sources such as N-acetylated proteins, thus, making the subsequent TiO2 enrichment crucial for phosphoproteome screens. Through our TiO2 setup, hardly any phosphopeptides were detected in the flowthrough fractions, whereas the protein N-acetylated peptides were almost exclusively present in the flowthrough (data not shown). Furthermore, a comparison of phosphopeptides identified in the samples originating from the 60, 10, and 1 embryos was performed. The phosphopeptides identified in the 10 embryo data set show a 70% overlap with the 60 embryo data set. The 1 embryo data set shows a 73% overlap with both the 60 and 10 embryo data set (Figure 2A). In line with literature reports, we observed in our complex samples relatively little overlap, primarily caused by the high biological variance and undersampling by the mass spectrometer.39 A ‘proper’ identification of phosphopeptides depends on the quality and uniqueness of the peptide MS/MS spectrum, to a large extent represented by its Mascot score and p-value. With a lower Mascot score threshold, more phosphopeptides are identified, however, with the increased risk of miss-assignments. We investigated whether our chosen threshold of g35 (p e 0.005) for phosphopeptides could be lowered, thus, increasing the number of identified phosphopeptides. By lowering the threshold to 30 (p e 0 0.02), we increased the number of phosphopeptides identified in the 60 embryo sample to 1237, but the overlap with the 10 embryo data set decreased from 70% to 67% (Figure 2A). Further decreasing the Mascot peptide score to 25 (p e 0.05), we increased the number of phosphopeptides identified in the 60 embryo sample to 1613, in the 10 embryos to 482, and in the 1 embryo to 123, but the overlap in the 10 and 60 embryo data sets dramatically decreased to 58% (Figure 2A). The overlap of the 1 embryo with the 10 or 60 embryo data set decreased even more dramatically to only 43% (Figure 2A). On the other hand, increasing the threshold to values above 35 did not increase the overlap significantly. With a Mascot score of 45 and p e 0.0004, the overlap of the 10 embryos sample with the 60 embryos is 74% (Figure 2A). To determine the false discovery rate (FDR) in the data sets, we performed a Mascot search against a decoy database. For a Mascot peptide score of 25 (p e 0.05), the FDR was about 2.5%, whereas the false discovery rate dropped to 1.2% for a Mascot score of 30 (p e 0 0.02) and to 0.6% for a Mascot score Journal of Proteome Research • Vol. 7, No. 4, 2008 1557
research articles
Lemeer et al.
Figure 1. Experimental. (A) Lateral view of a 24 hpf embryo. The lysates from 1, 10, or 60 one-day-old embryos were digested, and the resulting peptide mixtures were first separated by strong cation exchange chromatography. Each of the obtained SCX fractions was further analyzed by automated online RP-TiO2-RP-LC-MS/MS. Peptides were analyzed and identified by MS2 and MS3 using a LTQOrbitrap mass spectrometer. The phosphopeptides data set was aligned with previous data sets and sequence motifs. (B) Shematic overview of the online automated TiO2 setup. The precolumn consists of three individual precolomns, C18, TiO2, and C18, respectively. (C ) Illustrative picture of the ‘sandwich’ precolumn.
of 35 (p e 0.005). Therefore, also the FDR indicate that using a Mascot score of 35 or higher provides reasonable confident protein phosphorylation sites. We further plotted the number of unique identified phosphopeptides against the Mascot peptide score for the 60, 10, and 1 embryo samples (Figure 2B). As can be seen from this graph, the number of identified phosphopeptides in both the 60 embryo and 10 embryo data set rapidly increases below a Mascot score of 30. However, by increasing the score from 35 to 50, the number of unique identified phosphopeptides decreases only slowly (Figure 2B). Together these results hint that the number of incorrectly annotated phosphopeptides dramatically increases by decreasing the Mascot threshold score for phosphopeptides below 35 and p-value >0.005; therefore, we took this threshold value for our further analyses. We analyzed three biologically different samples originating from 60, 10, and 1 embryo, respectively. These samples correspond to approximately 180, 30, and 3 µg of starting material. Plotting the amount of protein against the number of unique identified phosphopeptides revealed a nonlinear increase in the number of unique identified phosphopeptides (data not shown). The number of identified phosphopeptides 1558
Journal of Proteome Research • Vol. 7, No. 4, 2008
seemed to indicate a plateau may be reached at higher amounts of starting material, thus, highlighting the other great challenge now facing phosphoproteomics, dynamic range. The limited ion capacity of trapping instruments, such as the LTQ-Orbitrap, will override and negate the advantage provided by higher amount of material.40 SCX-TiO2 phosphopeptide enrichment efficiency has reached such levels that it may now be conceivable that dynamic range issues among solely PTM peptides may hinder further in-depth mining into the phosphoproteome.
Discussion Here, we probed the in vivo phosphoproteome of 24 hpf zebrafish embryos by RP-TiO2-RP-LC-MS-MS. Using stringent settings, we detected 1067 phosphopeptides in 60 zebrafish embryos, 321 phosphopeptides in 10 embryos and 47 phosphopeptides in a single embryo. Seventy percent of the phosphopeptides detected in the 10 embryo sample was also detected in the 60 embryo sample. Although this overlap is far from complete, it is in line with previous studies, and originates from a sum of biological variation and under-sampling of the complex peptide mixtures.
Online Automated in Vivo Zebrafish Phosphoproteomics
Figure 2. Mascot score dependent overlap. (A) Overlap between the three different phosphopeptides data sets, (60 embryos blue, 10 embryos red, 1 embryo green) for a Mascot peptide score g25, 35, and 45. For a Mascot peptide score lower than 35, the amount of phosphopeptides rapidly increases, but the overlap between the data sets decreases dramatically. (B) Plot of the number of unique identified phosphopeptides in the 60 and 10 embryo data set for different Mascot peptide score cutoffs. The plot clearly shows a rapid increase in the number of unique identified phosphopeptides for a Mascot score