Expanding Tandem Mass Spectral Libraries of Phosphorylated

Oct 14, 2013 - (3-6) Sequence database searching of low-energy collision-induced ... cell line (H. sapiens) used in the iPRG 2010 study (Data-iPRG2010...
9 downloads 0 Views 817KB Size
Technical Note pubs.acs.org/jpr

Expanding Tandem Mass Spectral Libraries of Phosphorylated Peptides: Advances and Applications Yingwei Hu† and Henry Lam*,†,‡ †

Department of Chemical and Biomolecular Engineering and ‡Division of Biomedical Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China ABSTRACT: The identification of phosphorylated proteins remains a challenge in proteomics, partially due to the difficulty in assigning tandem mass (MS/MS) spectra to their originating peptide sequences with correct phosphosite localization. Because of its advantages in efficiency and sensitivity, spectral library searching is a promising alternative to conventional sequence database searching. Our work aims to construct the largest collision-induced dissociation (CID) MS/MS spectral libraries of phosphorylated peptides in human (Homo sapiens) and four model organisms (Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, and Mus musculus) to date, to facilitate phosphorylated peptide identification by spectral library searching. We employed state-of-the-art search methods to published data and applied two recently published phosphorylation site localization tools (PhosphoRS and PTMProphet) to ascertain the phosphorylation sites. To further increase the coverage of this library, we predicted “semi-empirical” spectra for peptides containing known phosphorylation sites from the corresponding template unphosphorylated peptide spectra. The performance of the spectral libraries built were evaluated and found to be superior to conventional database searching in terms of sensitivity. Updated spectral libraries of phosphorylated peptides are made freely available for use with the spectral search engine SpectraST. The work flow being developed will be used to continuously update the libraries when new data become available. KEYWORDS: tandem mass spectral libraries, phosphoproteomics, collision-induced dissociation (CID)



INTRODUCTION Phosphorylation is one of the most studied post-translational modifications (PTMs) of proteins, important in a wide range of cellular processes. It often functions as a switch to activate or inactivate a protein, and as such is heavily involved in signal transduction and regulation. Phosphorylation typically happens on serine, threonine, and tyrosine residues via the hydroxyl group.1,2 The identification of phosphorylated proteins remains a challenge in proteomics practice, partially due to the difficulty in assigning the tandem mass (MS/MS) spectrum to the correct phosphorylated peptide sequence.3−6 Sequence database searching of low-energy collision-induced dissociation (CID) tandem mass spectra of peptides is currently the most common computational method for this purpose.7−12 However, there are two shortcomings with this approach. First, the predicted fragmentation patterns by sequence database search engines are often simplistic and lack intensity information. Because of the labile phosphate group, phosphoserine- and phosphothreonine-containing peptides may exhibit atypical fragmentation patterns containing intense neutral losses of the phosphate moiety, which are usually not considered by sequence database search engines.13 Second, serines and threonines are relatively common; therefore, considering all permutations of unmodified/modified states of these residues vastly expands the search space. This leads to long and sometimes impractical search times and diminished sensitivity and specificity. Addressing both of these shortcomings, spectral © 2013 American Chemical Society

library searching has emerged as a promising complementary approach to sequence database searching.14−17 For the purpose of identifying previously observed peptides, including phosphorylated peptides, spectral library searching is a sensible way of capitalizing on prior knowledge to improve the chance of redetecting these peptides in the future and across different samples. By performing spectrum-to-spectrum matching, spectral library searching can take full advantage of the rich information contained in a spectrum to maximize discrimination, provided the reference spectrum is of high quality and captures the fragmentation pattern accurately.18 By limiting its search space to only previously observed and identified spectra in the library, the search speed and sensitivity can be greatly improved. A detailed review of spectral library building and searching methods can be found in ref 19. Spectral libraries are collections of reference spectra for which the identifications are known. Although newer fragmentation methods are available,20,21 low-energy CID remains the most commonly used method in many laboratories. The largest publicly available CID MS/MS spectral library of phosphorylated peptides thus far was released as part of the Phosphopep database in 20084,6,22−24 consisting of 42 303 SEQUEST8-identified spectra of human and three model organisms (Saccharomyces cerevisiae, Drosophila melanogaster, Received: July 17, 2013 Published: October 14, 2013 5971

dx.doi.org/10.1021/pr4007443 | J. Proteome Res. 2013, 12, 5971−5977

Journal of Proteome Research

Technical Note

Table 1. Data Sets Used in This Studya

a

name

species

tissue/cell type

instrument

reference

Data-HsP-2008 Data-iPRG2010 Data-HEK293 Data-TEST-HEK293 Data-TEST-U2OS Data-DmP-2008 Data-ScP-2008 Data-CeP-2008 Data-MmP

H. sapiens H. sapiens H. sapiens H. sapiens H. sapiens D. melanogaster S. cerevisiae C. elegans Mus musculus

HeLa, NSCLC K562 HEK293 HEK293 U2OS Kc167 whole organism whole organism brain, brown fat, heart, kidney, liver, lung, pancreas, spleen, testis

LTQ-FT/Orbitrap LTQ-Orbitrap LTQ-Orbitrap LTQ-Orbitrap LTQ-Orbitrap LTQ-FT/Orbitrap LTQ-FT/Orbitrap LTQ-FT/Orbitrap LTQ-Orbitrap

4, 6, 22, 23 28 27 42 43 6 6, 24 6 29

NSCLC: non-small-cell lung carcinoma.

and Caenorhabditis elegans). At the time, the identification process was not well-optimized, and the false identification rate and false localization rate of that library was not established. Since then, significant advances in sequence search engines and statistical validation methods have been made, and much more phosphorylated peptide MS/MS data have been accumulated. Therefore, it is an opportune time to update the phosphorylated peptide spectral libraries to improve the coverage and accuracy. We employed newer database search engines on the same data and several other new data sets. Two newly published phosphorylation site localization tools were utilized to ascertain the phosphorylation sites. To further increase the coverage of this library, we also predicted “semi-empirical” spectra25 for peptides with known phosphorylation sites from available template spectra from their unphosphorylated counterparts. The performance testing on the new spectral libraries indicates that spectral library searching can greatly outperform sequence database searching in identifying phosphorylated peptides. These spectral libraries are freely available for download on PeptideAtlas26 (http://www. peptideatlas.org/speclib/) and should be a useful resource for the proteomics community. The spectral library of human phosphorylated peptides will be described and evaluated in the following sections, unless otherwise specified.



(SwissProt version dated 2012.04.18, UniProt version dated 2012.11.20) employing a target-decoy search strategy.34 In addition to phosphorylation, methionine oxidation, protein Nterminus acetylation, and peptide N-terminus pyroglutamate formation are considered as variable modifications. The search results were validated by PeptideProphet35 (in decoy-enabled semiparametric mode36) and combined by iProphet37 in the TPP software suite. The identified spectra were filtered by a strict global false discovery rate (FDR) cutoff of 0.1%. To verify and potentially correct the phosphorylation sites assigned by iProphet based on the results of the search engines, two site localization tools, PhosphoRS38 and PTMProphet,39 were used. A voting scheme is employed to refine the phosphorylation site assignments of the identified peptides. The site assignment of the majority among these two tools and the original database search engine is deemed the correct site assignment for the identified peptides. For human, 109 578 identified spectra, corresponding to 18 066 distinct phosphorylated peptides, have phosphorylation sites confirmed by this voting system, in which 4060 spectra were reassigned phosphorylation sites. Finally, SpectraST40 was used to merge replicate spectra to create a consensus spectral library. The error rate for the identifications of library spectra is 0.06%, estimated based on decoy counting. Semi-empirical Spectrum Prediction

Tryptic peptides with known phosphorylation sites in the UniProt database were extracted as candidates to predict. For each candidate, the existing NIST ion-trap spectral library was searched to find if the spectrum of the corresponding unphosphorylated peptide is available as a prediction template. If so, the prediction was made by a simple mass-shift strategy based on peak annotations, as previously described.25 The predicted spectra were compiled into a separate spectral library, also made available for download.

METHODOLOGY

Data Source

We included the original Phosphopep data sets from H. sapiens, S. cerevisiae, D. melanogaster, and C. elegans used to build the 2008 library (referred to as Data-HsP-2008, Data-ScP-2008, Data-DmP-2008, and Data-CeP-2008, respectively)4,6,22−24 and three more large data sets from different researchers. The additional data sets were: (i) a phosphorylated peptideenriched sample from HEK293 cell line (H. sapiens) (DataHEK293),27 (ii) a phosphorylated peptide-enriched sample from K562 cell line (H. sapiens) used in the iPRG 2010 study (Data-iPRG2010),28 and (iii) phosphorylated peptide-enriched samples collected from nine tissues of mouse (Mus musculus) donated by Dr. Edward Huttlin (Data-MmP).29 The data sets used are listed in Table 1.

Performance Evaluation

Three evaluation scenarios were set up to evaluate the performance of the newly built libraries: (i) repeated search: searching a data set included in building the spectral libraries; (ii) included cell line search: searching a data set which was excluded in building the libraries but acquired from a cell line represented in the spectral libraries; and (iii) independent cell line search: searching a data set from a cell line not represented in the spectral libraries. For statistical validation, an equal-sized decoy library generated by the shuffle-and-reposition method41 was appended to SPLIB-2013-HsP prior to searching. The data sets Data-iPRG2010, Data-TEST-HEK293,42 and Data-TESTU2OS43 were chosen as testing data sets for scenarios (i) to (iii), respectively. The search parameters were set as default. The performance of a commonly used database search engine,

Library Building from Real Data

All data sets were searched by three open-source sequence database search engines: OMSSA8 (version 2.1.9), X!Tandem7 with K-score plugin30,31 (packaged with Trans Proteomic Pipeline, TPP,32 version 2010.10.01.1), and MS-GFDB9 (version dated 2012.01.06) against the UniProt/SwissProt33 protein sequence databases of the respective organism 5972

dx.doi.org/10.1021/pr4007443 | J. Proteome Res. 2013, 12, 5971−5977

Journal of Proteome Research

Technical Note

Table 2. Statistics of Newly Built CID MS/MS Spectral Libraries of Phosphorylated Peptides.a Homo sapiens DPI DPS

SPLIB-HsP-2008

SPLIB-HsP-2013 (SwissProt)

5093 4178

18066 9595

SPLIB-HsP-SEMI-2013 35099 7032 Drosophila melanogaster

Caenorhabditis elegans SwissProt DPI DPS

2348 1159

DPI DPS

UniProt 9225 4453 Mus musculus

SwissProt

UniProt

6276 3036

16177 8162 Saccharomyces cerevisiae

SwissProt

SwissProt

51420 20660

18412 6902

a

DPI: number of distinct phosphorylated peptide ions; DPS: number of distinct phosphorylated peptide sequences. SPLIB-HsP-SEMI-2008: the original phosphopeptide spectral library released with the Phosphopep database in 2008. SPLIB-HsP-2013: the newly built spectral library compiled from real data. SPLIB-HsP-SEMI-2013: the newly built spectral library of semi-empirical predicted spectra. For C. elegans and D. melanogaster, the SwissProt database (the reviewed subset of UniProt) is quite small and incomplete, so we also built the spectral libraries from search results against the entire UniProt protein databases. For the other species, the search results of SwissProt and UniProt are similar.

Figure 1. (A) Distribution of phosphosites covered in SPLIB-HsP-2013. pS, pT, and pY represent phosphoserine, phosphothreonine, and phosphotyrosine respectively. (B) SwissProt phosphosite coverage. SPLIB-HsP-2013 included 25% of phosphosites reported in SwissProt. SPLIBHsP-SEMI-2013 covered 35%. Both covered 11% of phosphosites reported in SwissProt; the overlap is due to different forms (e.g., different charge states) of the same phosphorylated peptides. In sum, almost half of the phosphosites in SwissProt was covered by our spectral libraries.



X!Tandem (with K-score plugin) on the same data sets, employing typical search parameters (tryptic on both termini, precursor tolerance [−2, 4 Da], carbamidomethylation on cysteine as fixed modification, oxidation on methionine, and phosphorylation on serines, threonines, and tyrosines as variable modifications) against all human sequences in the SwissProt database (version dated 2012-04-18) with shuffled decoys appended, was also shown. PeptideProphet in the decoy-assisted semiparametric mode was used to assign posterior probabilities to spectrum identifications by SpectraST or X!Tandem. The global FDR at different probability cutoff was estimated by summing the residuals of the posterior probabilities of the retained spectrum identifications at that cutoff. In the case of the X!Tandem search, identifications of unphosphorylated peptides were not counted.

RESULTS AND DISCUSSION

Spectral Library Overview

The statistics for newly built spectral libraries can be found in Table 2. For human, including Data-HsP2008, Data-iPRG2010, and Data-HEK293, but excluding predicted spectra, the newly built spectral library (SPLIB-HsP-2013) contains 18 066 CID tandem mass spectra of distinct peptide ions, which belong to 9595 unique peptide sequences. In total, 8785 phosphoserine, 2025 phosphothreonine, and 2047 phosphotyrosine sites are recorded in the spectral library, which includes 25% of all known phosphorylation sites reported in SwissProt (Figure 1) The precursor charge-state distribution was: 20 for 1+, 8108 for 2+, 7793 for 3+, 1873 for 4+, 251 for 5+, and 21 for 6+ or higher. To improve the coverage further, we focused on the remaining known phosphorylation sites in the SwissProt database that were not yet covered by real spectra in SPLIB5973

dx.doi.org/10.1021/pr4007443 | J. Proteome Res. 2013, 12, 5971−5977

Journal of Proteome Research

Technical Note

HsP-2013. Of these phosphorylation sites, we managed to add 35 099 semi-empirical spectra, predicted from their corresponding unphosphorylated peptide spectra by a method previously described,25 which nearly doubled the coverage to almost half of all known human phosphorylation sites in the SwissProt database. Improvement in Data Processing

The multiple database search engines searching strategy significantly improved the identification rate. Compared with the 2008 human spectral library (SPLIB-HsP-2008), there are 93% more phosphorylated peptide ions and 42% more unique peptide sequences identified in the new search workflow based on the same raw data (Data-HsP-2008) and search space. In the new search result of Data-HsP-2008, ∼35% validated peptide ions were found by all three database search engines, and 27% were found by two database search engines. In total, 8079 distinct phosphorylated peptide ions were imported into the new library (SPLIB-HsP-2013) from the iProphet result, filtered at an estimated global FDR of 0.1%. In terms of site localization, we found that most (83%) site assignments of phosphorylated peptide PSMs were agreed to by all three site-localization methods in the search results used to build the SPLIB-HsP-2013 library. PhosphoRS and PTMProphet, voting together, corrected the site assignments of only ∼4% of all PSMs assigned by the original search engine. We believe that this is partially due to the extremely stringent FDR cutoff we employed, which filtered out many low-quality spectra for which the site assignment might appear ambiguous. Fewer than 2% of PSMs were given different site assignments by all three localization methods; in such cases, we kept the original peptide identification assigned by the search engines.

Figure 2. Performance Evaluation 1: Repeated Search. The number of phosphorylated peptide spectrum identifications at different FDRs in Data-iPRG2010, which was included in building SPLIB-HsP-2013, by different methods. iProphet: combined search results of X!Tandem (K-score plugin), OMSSA, and MS-GFDB by iProphet. HsP2013: result of searching Data-iPRG2010 against SPLIB-HsP-2013 by SpectraST.

Performance Evaluation

We conducted performance evaluation using the human spectral library, SPLIB-HsP-2013, under three scenarios. In the first scenario of a repeated search, proposed previously as a mechanism to improve sensitivity,44 we compared the search results from three different database search engines (OMSSA, X!Tandem, and MS-GFDB) and the results of spectral library searching against our new library SPLIB-HsP-2013 on the same data set Data-iPRG2010 (Figure 2). Combining the three database search engines by iProphet provided a 17% increase in identification (at FDR 1%) compared with the best-performing engine, MS-GFDB. The spectral library search alone obtained 13% more identifications (at FDR 1%) than the iProphetcombined results of the three database search engines, indicating an improved sensitivity. The result from this repeated search shows that spectral library searching can identify fainter matches that were not identified by the sequence database search in the first place. In the second scenario, a data set acquired from HEK293 cells not used to build the library was searched against SPLIBHsP-2008, SPLIB-HsP-2013, and SPLIB-HsP-SEMI-2013, respectively, using SpectraST (Figure 3). The results from the X!Tandem (with K-score plugin) search of the same data set against SwissProt were also shown as a benchmark of what one might obtain by a typical sequence search. Note that DataTEST-HEK293 was not used in the building of spectral library, but another data set (Data-HEK293) acquired in a different laboratory from the same HEK298 cell line was used in building SPLIB-HsP2013. Compared with the search result against SPLIB-HsP-2008, the number of identifications drastically increased from 251 to 1635 (at FDR 1%) by using SPLIB-

Figure 3. Performance Evaluation 2: Included Cell Line Search. The number of phosphorylated peptide spectrum identifications at different FDRs in Data-TEST-HEK293, which was not included in building SPLIB-HsP-2013. Another data set from HEK293 cells was included in the library. X!Tandem-SwissProt-2012: X!Tandem (K-score plugin) search against SwissProt. SPLIB-HsP-2008: SpectraST search against the original Phosphopep spectral library in 2008. SPLIB-HsP-2013: SpectraST search against the newly built spectral library from real data only. SPLIB-HsP-2013+SEMI: Summed results from SpectraST searches against SPLIB-HsP-2013 and SPLIB-HsP-SEMI-2013.

5974

dx.doi.org/10.1021/pr4007443 | J. Proteome Res. 2013, 12, 5971−5977

Journal of Proteome Research

Technical Note

of different cell types are only partially overlapping, as expected. Therefore, including additional data from different cell types should further improve the coverage and hence performance of the library for general use. Moreover, SPLIB-HsP-SEMI-2013 in this scenario was able to identify nearly 30% (294) more spectra at the same FDR, which suggests that the predicted spectral library was also a practical alternative to improve the coverage. Lastly, it is worth noting that although many known phosphorylation sites have been reported in the literature and compiled into databases such as SwissProt, this information is rarely used in proteomics data analysis. This is because it is practically impossible to ask a sequence database search engine to consider phosphorylation only on some sites but not on others. Spectral library searching against semi-empirical spectra of known phosphorylated peptides, as shown here, is a simple and effective way of incorporating existing knowledge to detect these peptides in real samples, without expanding the search space exponentially.

HsP-2013. At the same FDR, X!Tandem assigned only 654 spectra to a phosphorylated peptide identification. This comparison indicates that spectral library searching has a substantial sensitivity advantage over sequence database searching if peptides from the sample in question are already well-represented in the spectral library. This is not surprising because most of the identifications would essentially be redetection of the same peptide ions observed in previous profiling experiments of the same cell type. In the third scenario, we tested the ability of our library to identify phosphorylated peptides from a cell line not yet represented in the spectral library (Figure 4). This would



CONCLUSIONS We successfully improved the coverage and accuracy of the real phosphorylated peptide spectral library by using multiple database search engines and phosphorylation site localization tools. The semi-empirical phosphorylated peptide spectrum prediction was shown to be helpful to extend the spectral library coverage and increase the number of identifications. Moreover, we have set up an effective and automated data analysis pipeline to build spectral libraries of phosphorylated peptides, with which it is relatively straightforward to update the libraries incrementally over time. The bottleneck to improve the spectral libraries further is in collecting highquality data from different samples and conditions. It is therefore imperative that the community continues to share data, so that better libraries of greater coverage can be compiled for all to use.



Figure 4. Performance Evaluation 3: Independent Cell Line Search. The number of phosphorylated peptide spectrum identifications at different FDRs in Data-TEST-U2OS, which was not included in building SPLIB-HsP-2013. No data set from U2OS was included in building SPLIB-HsP-2013. X!Tandem-SwissProt-2012: X!Tandem (Kscore plugin) search against SwissProt. SPLIB-HsP-2008: SpectraST search against the original Phosphopep spectral library in 2008. SPLIBHsP-2013: SpectraST search against the newly built spectral library from real data only. SPLIB-HsP-2013+SEMI: Summed results from SpectraST searches against SPLIB-HsP-2013 and SPLIB-HsP-SEMI2013.

AUTHOR INFORMATION

Corresponding Author

*Phone: + 852 2358 7133. Fax: + 852 2358 0054. E-mail: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work is fully funded by the Research Grant Council of the Hong Kong Special Administrative Region, China (Project No. HKUST 601909). We also thank the authors of the software and datasets for access to these resources and tools.

provide some indication of the general applicability of the library for all human samples. For this purpose, another independent data set from the human osteosarcoma cell line U2OS (Data-TEST-U2OS) was searched against SPLIB-HsP2013 in the same manner as in the second scenario. (Figure 3) The U2OS cell line was not sampled in any data used to build SPLIB-HsP-2013, but one would still expect to identify many phosphorylated peptides owing to common proteins and pathways among different cell types. Compared with the search result of SPLIB-HsP-2008, the number of phosphorylated peptide identifications again increased substantially from 251 to 794 spectra (at FDR 1%) by using SPLIB-HsP-2013. The spectral library search result was still superior to that of X! Tandem, although the gap was not as big as that in the second scenario. Contrasting this with the results in the second scenario, we concluded that the observable phosphoproteomes



REFERENCES

(1) Khoury, G. A.; Baliban, R. C.; Floudas, C. A. Proteome-wide post-translational modification statistics: frequency analysis and curation of the swiss-prot database. Sci. Rep. 2011, 1, 90. (2) Derouiche, A.; Cousin, C.; Mijakovic, I. Protein phosphorylation from the perspective of systems biology. Curr. Opin. Biotechnol. 2012, 23 (4), 585−590. (3) Engholm-Keller, K.; Larsen, M. R. Technologies and challenges in large-scale phosphoproteomics. Proteomics 2013, 13 (6), 910−31. (4) Beausoleil, S. A.; Villen, J.; Gerber, S. A.; Rush, J.; Gygi, S. P. A probability-based approach for high-throughput protein phosphorylation analysis and site localization. Nat. Biotechnol. 2006, 24 (10), 1285−1292.

5975

dx.doi.org/10.1021/pr4007443 | J. Proteome Res. 2013, 12, 5971−5977

Journal of Proteome Research

Technical Note

(5) Lu, B.; Ruse, C.; Xu, T.; Park, S. K.; Yates, J. Automatic validation of phosphopeptide identifications from tandem mass spectra. Anal. Chem. 2007, 79 (4), 1301−1310. (6) Bodenmiller, B.; Campbell, D.; Gerrits, B.; Lam, H.; Jovanovic, M.; Picotti, P.; Schlapbach, R.; Aebersold, R. PhosphoPep–a database of protein phosphorylation sites in model organisms. Nat. Biotechnol. 2008, 26 (12), 1339−1340. (7) Eng, J. K.; Searle, B. C.; Clauser, K. R.; Tabb, D. L. A face in the crowd: recognizing peptides through database search. Mol. Cell. Proteomics 2011, 10 (11), R111 009522. (8) Eng, J. K.; McCormack, A. L.; Yates, J. R., III. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass 1994, 5 (11), 976− 989. (9) Perkins, D. N.; Pappin, D. J. C.; Creasy, D. M.; Cottrell, J. S. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999, 20 (18), 3551−3567. (10) Craig, R.; Beavis, R. C. TANDEM: matching proteins with tandem mass spectra. Bioinformatics 2004, 20 (9), 1466−7. (11) Geer, L. Y.; Markey, S. P.; Kowalak, J. A.; Wagner, L.; Xu, M.; Maynard, D. M.; Yang, X.; Shi, W.; Bryant, S. H. Open mass spectrometry search algorithm. J. Proteome Res. 2004, 3 (5), 958−964. (12) Kim, S.; Mischerikow, N.; Bandeira, N.; Navarro, J. D.; Wich, L.; Mohammed, S.; Heck, A. J. R.; Pevzner, P. A. The generating function of CID, ETD and CID/ETD pairs of tandem mass spectra: Applications to database search. Mol. Cell. Proteomics 2010, 9 (12), 2840−2852. (13) Boersema, P. J.; Mohammed, S.; Heck, A. J. R. Phosphopeptide fragmentation and analysis by mass spectrometry. J. Mass Spectrom. 2009, 44 (6), 861−878. (14) Craig, R.; Cortens, J. C.; Fenyo, D.; Beavis, R. C. Using annotated peptide mass spectrum libraries for protein identification. J. Proteome Res. 2006, 5 (8), 1843−1849. (15) Frewen, B. E.; Merrihew, G. E.; Wu, C. C.; Noble, W. S.; MacCoss, M. J. Analysis of peptide MS/MS spectra from large-scale proteomics experiments using spectrum libraries. Anal. Chem. 2006, 78 (16), 5678−84. (16) Lam, H.; Deutsch, E. W.; Eddes, J. S.; Eng, J. K.; King, N.; Stein, S. E.; Aebersold, R. Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics 2007, 7 (5), 655−667. (17) Dasari, S.; Chambers, M. C.; Martinez, M. A.; Carpenter, K. L.; Ham, A. J.; Vega-Montoto, L. J.; Tabb, D. L. Pepitome: evaluating improved spectral library search for identification complementarity and quality assessment. J. Proteome Res. 2012, 11 (3), 1686−95. (18) Zhang, X.; Li, Y.; Shao, W.; Lam, H. Understanding the improved sensitivity of spectral library searching over sequence database searching in proteomics data analysis. Proteomics 2011, 11 (6), 1075−85. (19) Lam, H. Building and searching tandem mass spectral libraries for peptide identification. Mol. Cell. Proteomics 2011, 10 (12), R111− 008565. (20) Marx, H.; Lemeer, S.; Schliep, J. E.; Matheron, L.; Mohammed, S.; Cox, J.; Mann, M.; Heck, A. J. R.; Kuster, B. A large synthetic peptide and phosphopeptide reference library for mass spectrometrybased proteomics. Nat. Biotechnol. 2013, 31 (6), 557−564. (21) Palumbo, A. M.; Smith, S. A.; Kalcic, C. L.; Dantus, M.; Stemmer, P. M.; Reid, G. E. Tandem mass spectrometry strategies for phosphoproteome analysis. Mass Spectrom. Rev. 2011, 30 (4), 600− 625. (22) Rikova, K.; Guo, A.; Zeng, Q.; Possemato, A.; Yu, J.; Haack, H.; Nardone, J.; Lee, K.; Reeves, C.; Li, Y.; Hu, Y.; Tan, Z.; Stokes, M.; Sullivan, L.; Mitchell, J.; Wetzel, R.; MacNeill, J.; Ren, J. M.; Yuan, J.; Bakalarski, C. E.; Villen, J.; Kornhauser, J. M.; Smith, B.; Li, D.; Zhou, X.; Gygi, S. P.; Gu, T.-L.; Polakiewicz, R. D.; Rush, J.; Comb, M. J. Global survey of phosphotyrosine signaling identifies oncogenic kinases in lung cancer. Cell 2007, 131 (6), 1190−1203.

(23) Beausoleil, S. A.; Jedrychowski, M.; Schwartz, D.; Elias, J. E.; Villen, J.; Li, J.; Cohn, M. A.; Cantley, L. C.; Gygi, S. P. Large-scale characterization of HeLa cell nuclear phosphoproteins. Proc. Natl. Acad. Sci. U.S.A. 2004, 101 (33), 12130−5. (24) Li, X.; Gerber, S. A.; Rudner, A. D.; Beausoleil, S. A.; Haas, W.; Villén, J.; Elias, J. E.; Gygi, S. P. Large-scale phosphorylation analysis of α-factor-arrested Saccharomyces cerevisiae. J. Proteome Res. 2007, 6 (3), 1190−1197. (25) Hu, Y.; Li, Y.; Lam, H. A semi-empirical approach for predicting unobserved peptide MS/MS spectra from spectral libraries. Proteomics 2011, 11 (24), 4702−11. (26) Deutsch, E. W.; Lam, H.; Aebersold, R. PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows. EMBO Rep. 2008, 9 (5), 429−34. (27) Franz-Wachtel, M.; Eisler, S. A.; Krug, K.; Wahl, S.; Carpy, A.; Nordheim, A.; Pfizenmaier, K.; Hausser, A.; Macek, B. Global detection of protein kinase D-dependent phosphorylation events in nocodazole-treated human cells. Mol. Cell. Proteomics 2012, 11 (5), 160−70. (28) Rudnick, P. A.; Clauser, K. R.; Lane, W. S.; Martens, L.; McDonald, W. H.; Meyer-Arendt, K.; Searle, B. C.; Kowalak, J. A. iPRG: Informatic Evaluation of Phosphopeptide Identification and Phosphosite Localization. Association of Biomolecular Resource Facilities (ABRF) Annual Meeting, Sacramento, CA, 2010. (29) Huttlin, E. L.; Jedrychowski, M. P.; Elias, J. E.; Goswami, T.; Rad, R.; Beausoleil, S. A.; Villen, J.; Haas, W.; Sowa, M. E.; Gygi, S. P. A tissue-specific atlas of mouse protein phosphorylation and expression. Cell 2010, 143 (7), 1174−89. (30) MacLean, B.; Eng, J. K.; Beavis, R. C.; McIntosh, M. General framework for developing and evaluating database scoring algorithms using the TANDEM search engine. Bioinformatics 2006, 22 (22), 2830−2832. (31) Keller, A.; Eng, J.; Zhang, N.; Li, X.-j.; Aebersold, R. A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol. Syst. Biol. 2005, 1, 2005−0017. (32) Deutsch, E. W.; Mendoza, L.; Shteynberg, D.; Farrah, T.; Lam, H.; Tasman, N.; Sun, Z.; Nilsson, E.; Pratt, B.; Prazen, B.; Eng, J. K.; Martin, D. B.; Nesvizhskii, A. I.; Aebersold, R. A guided tour of the Trans-Proteomic Pipeline. Proteomics 2010, 10 (6), 1150−9. (33) UniProt Consortium, Update on activities at the Universal Protein Resource (UniProt) in 2013. Nucleic Acids Res. 2013, 41, (D1), D43-D47. (34) Elias, J. E.; Gygi, S. P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 2007, 4 (3), 207−14. (35) Keller, A.; Nesvizhskii, A. I.; Kolker, E.; Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 2002, 74 (20), 5383−5392. (36) Choi, H.; Ghosh, D.; Nesvizhskii, A. I. Statistical validation of peptide identifications in large-scale proteomics using the target-decoy database search strategy and flexible mixture modeling. J. Proteome Res. 2008, 7 (1), 286−92. (37) Shteynberg, D.; Deutsch, E. W.; Lam, H.; Eng, J. K.; Sun, Z.; Tasman, N.; Mendoza, L.; Moritz, R. L.; Aebersold, R.; Nesvizhskii, A. I. iProphet: multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates. Mol. Cell. Proteomics 2011, 10 (12), M111−007690. (38) Taus, T.; Köcher, T.; Pichler, P.; Paschke, C.; Schmidt, A.; Henrich, C.; Mechtler, K. Universal and confident phosphorylation site localization using phosphoRS. J. Proteome Res. 2011, 10 (12), 5354−5362. (39) Shteynberg, D. D. E.; Mendoza, L.; Slagel, J.; Lam, H.; Nesvizhskii, A.; Moritz, R. PTMProphet: TPP Software for Validation of Modified Site Locations on Post-Translationally Modified Peptides. 60th American Society for Mass Spectrometry (ASMS) Annual Conference, Vancouver, Canada, 2012. 5976

dx.doi.org/10.1021/pr4007443 | J. Proteome Res. 2013, 12, 5971−5977

Journal of Proteome Research

Technical Note

(40) Lam, H.; Deutsch, E. W.; Eddes, J. S.; Eng, J. K.; Stein, S. E.; Aebersold, R. Building consensus spectral libraries for peptide identification in proteomics. Nat. Methods 2008, 5 (10), 873−5. (41) Lam, H.; Deutsch, E. W.; Aebersold, R. Artificial decoy spectral libraries for false discovery rate estimation in spectral library searching in proteomics. J. Proteome Res. 2010, 9 (1), 605−10. (42) Christensen, G. L.; Kelstrup, C. D.; Lyngso, C.; Sarwar, U.; Bogebo, R.; Sheikh, S. P.; Gammeltoft, S.; Olsen, J. V.; Hansen, J. L. Quantitative phosphoproteomics dissection of seven-transmembrane receptor signaling using full and biased agonists. Mol. Cell. Proteomics 2010, 9 (7), 1540−53. (43) Blasius, M.; Forment, J. V.; Thakkar, N.; Wagner, S. A.; Choudhary, C.; Jackson, S. P. A phospho-proteomic screen identifies substrates of the checkpoint kinase Chk1. Genome Biol. 2011, 12 (8), R78. (44) Ahrne, E.; Masselot, A.; Binz, P. A.; Muller, M.; Lisacek, F. A simple workflow to increase MS2 identification rate by subsequent spectral library search. Proteomics 2009, 9 (6), 1731−6.

5977

dx.doi.org/10.1021/pr4007443 | J. Proteome Res. 2013, 12, 5971−5977