Increased Confidence in Large-Scale Phosphoproteomics Data by

22 Jun 2009 - two platforms produced complementary data as many of the low scoring ... ological processes by controlling signaling pathways involved...
0 downloads 0 Views 785KB Size
Increased Confidence in Large-Scale Phosphoproteomics Data by Complementary Mass Spectrometric Techniques and Matching of Phosphopeptide Data Sets Maria P. Alcolea,† Oliver Kleiner,‡ and Pedro R. Cutillas*,† Analytical Signalling Group, Centre for Cell Signalling, Institute of Cancer, Bart’s and the London School of Medicine, QMUL, London, United Kingdom, and Eisai London Research Laboratories Limited, Bernard Katz Building, University College London, Gower Street, London WC1E 6BT, United Kingdom Received November 4, 2008

Large-scale phosphoproteomics studies are of great interest due to their potential for the dissection of signaling pathways controlled by protein kinases. Recent advances in mass spectrometry (MS)-based phosphoproteomic techniques offer new opportunities to profile protein kinase activities in a comprehensive manner. However, this increasingly used approach still poses many analytical challenges. On one hand, high stringency criteria for phosphopeptide identification based on MS/MS data are needed in order to avoid false positives; however, on the other hand, these stringent criteria also result in the introduction of many false negatives. In the current report, we employ different mass spectrometric techniques for large-scale phosphoproteomics in order to reduce the presence of false negatives and enhance data confidence. A LTQ-Orbitrap LC-MS/MS platform identified ∼3 times more phosphopeptides than Q-TOF LC-MS/MS instrumentation (4308 versus 1485 identifications, respectively). In both cases, collision induced dissociation (CID) was used to fragment peptides. Interestingly, the two platforms produced complementary data as many of the low scoring phosphopeptide ions identified by LTQ-Orbitrap MS/MS gave rise to high score identifications by Q-TOF MS/MS analysis, and vice versa. In fact, approximately 450 phosphopeptides identified by the Q-TOF instrument were not identified by the LTQ-Orbitrap. Further data comparison revealed the extent of the problem: in one experiment, the estimated number of false negatives (1066) was close to the number of identified phosphopeptides (1485). This work demonstrates that by using standard procedures for phosphopeptide identification the number of false negatives can be even greater than the number of false positives. We propose using historical phosphoproteomic data and spectral matching algorithms in order to efficiently minimize false negative rates. Keywords: LC-MS/MS • false positive rate • false negative rate • spectral matching • cell signaling

Introduction Protein kinases play key roles in fundamental cell physiological processes by controlling signaling pathways involved in energy metabolism, proliferation, cell cycle progression, apoptosis, and migration.1,2 More than 500 protein kinase genes are present in the human genome, and in accordance with their roles in cell biology, aberrant regulation of protein kinase activity is associated with the molecular pathophysiology of several major diseases including neurodegeneration, diabetes, and cancer.3,4 Inhibitors of kinase activity are already licensed for the treatment of cancer and many other kinase inhibitors are in different stages of clinical development.2,3,5 * To whom correspondence should be addressed: Centre for Cell Signalling, Institute of Cancer, Barts and the London and Queen Mary Medical School, Third Floor John Vane Science Building, Charterhouse Square, London, EC1M 6BQ, U.K. Tel: +44 (0)20 7882 8264. E-mail: p.cutillas@ qmul.ac.uk. † Institute of Cancer Bart’s and the London School of Medicine, QMUL. ‡ Eisai London Research Laboratories.

3808 Journal of Proteome Research 2009, 8, 3808–3815 Published on Web 06/22/2009

Measuring the amount of protein kinases in cells is uninformative because these enzymes show a wide range of activation, and therefore, their relative abundance is a poor indication of their activation status.1 Thus, methods that can be used to follow protein kinase activity are needed to explore the mechanisms of pathways controlled by these enzymes.6 These methods can be applied to pharmacodynamic studies and identifying off-target effects of kinase inhibitors that may complicate their clinical development. Moreover, kinase activities may represent ideal prognostic and theranostic markers for patient stratification and personalized therapies.7 The most widely used approach to measure protein kinase activity is by performing in vitro kinase reactions using synthetic substrates, whose products are measured by incorporation of radioactive phosphate onto substrates.6 However, when the products of such reactions are measured by mass spectrometry (MS), increased specificity and sensitivity allow the use of total cell lysate as the enzyme source and make the use of radioactivity unnecessary.8 Another approach to measure 10.1021/pr800955n CCC: $40.75

 2009 American Chemical Society

research articles

Increased Confidence in Large-Scale Phosphoproteomics Data

Figure 1. Overlap of phosphoproteomics data identified by two different LC-MS/MS platforms. (A) The same phosphopeptide fractions were analyzed by two different LC-MS/MS platforms based on Q-TOF (platform 1) and LTQ-Orbitrap (platform 2) mass spectrometers. Platform 1 resulted in 1485 nonredundant phosphopeptide identifications with scores above the significance threshold, whereas platform 2 identified 4308. Identifications were considered statistically significant if the expectancy scores were 0.05)

total

1485 4308

37785 39588

39270 43896

We explored different approaches to enhance the confidence of phosphoproteomics data. For this purpose, we aimed at identifying phosphopeptides that could be measured from NIH3T3 fibroblasts, a cell model commonly used in cell signaling studies. Phosphopeptides were enriched by a combination of SCX, IMAC, TiO2 chromatography (see Materials and Methods). Enriched phosphopeptide fractions were then analyzed by two different LC-MS/MS platforms based on Q-TOF (platform 1) and LTQ-Orbitrap (platform 2) mass spectrometers. Mascot searches of data obtained from platform 1 resulted in the identification of 1485 nonredundant phosphopeptides with scores above the significance threshold (p < 0.05; Table 1), whereas searches of data obtained from platform 2 returned 4308 nonredundant phosphopeptide spectra with expectancy scores above of statistical significance (p < 0.05; Table 1).

Increased Confidence in Large-Scale Phosphoproteomics Data Experimental conditions used to operate the two different platforms were set up so that data acquisition resulted in approximately the same number of MS/MS spectra from each platform (the gradient runs for platforms 1 and 2 were 100 and 40 min, respectively; Table 1). These analyses resulted in the identification of 1485 phosphorylated and 966 nonphosphorylated peptides from platform 1 and 4308 and 2640 phosphopeptides and nonphosphopeptides from platform 2, respectively. In each case, the criterion for identification was based on peptides having Mascot scores above the statistically significant threshold (expectancy, p < 0.05) which resulted in about 3% false positive rate identification. These data indicate that the LTQ-Orbitrap platform was more effective at phosphopeptide identification than the Q-TOF (Table 1). However, it is noteworthy that about a third of the total number of phosphopeptides identified by the Q-TOF (i.e., 408) were not identified by the LTQ-Orbitrap platform (Figure 1A), even though this platform identified 3 times more phosphopeptides than the former (Table 1). This finding could be related to the normal variability between LC-MS/MS runs because of undersampling. Additionally, there might be intrinsic differences between the two instruments that could account for these results. To investigate these two possibilities, we compared replicate LC-MS/MS runs from the LTQ-Orbitrap platform. After normalizing to the total number of phosphopeptides identified from each run, the overlap in replicate LCMS/MS runs (Figure 1B) was 25% greater than that obtained when comparing individual data sets from the two different platforms (Figure 1A). In addition, it was interesting to observe that, for the total fraction of phosphopeptide ions detected by the two platforms, there was a large difference between the Mascot scores of data obtained from LTQ-Orbitrap and Q-TOF platforms. As illustrated in Figure 2A, many of the low scoring phosphopeptides from one platform gave rise to high score peptides from the other, and vice versa. This is illustrated by the poor correlation observed between the scores obtained from Q-TOF and LTQ-Orbitrap data (Figure 2A, r2 ) 0.30) when compared to the strong correlation observed in replicate LCMS/MS runs from the LTQ-Orbitrap platform (Figure 2B, r2 ) 0.68). Hence, the differences in Mascot scores do not seem to be a mere consequence of the normal variability between identical runs. Thus, we can conclude that the difference between the numbers of phosphopeptides identified with significant scores by the two platforms is, at least in great part, due to the intrinsic differences of the two different instruments studied, meaning that the two MS techniques are complementary for phosphopeptide identification. The reasons for the different results obtained from the two studied platforms are not clear. They may be related to mechanistic differences in terms of how these mass spectrometers detect and fragment the ions, which results in greater number of b ions detected in ion trap MS/MS data than in Q-TOF data. However, variations in the LC settings, duty cycle, sensitivity, and mass accuracy may represent additional factors contributing for the differences observed. In agreement with our results, a previous study showed that, when analyzing shotgun proteomics data, linear ion trap (LTQ) and Q-TOF platforms produced complementary data.21 The overlap of peptide identification between consecutive runs from the same platforms was reported to be greater than the overlap from runs on different platforms. They also showed that peptides exclusively identified by the linear ion trap were on average twice the length of Q-TOF identified peptides.21

research articles 21

However, in this study, Elias et al. did not examine whether these observations also apply to phosphopeptide data. We decided to investigate whether our ion trap data also favors the identification of larger phosphopeptides. Mascot scores were compared as a function of peptide molecular weight. We found that the LTQ-Orbitrap data returned larger scores for larger peptides than the Q-TOF data did (Figure 3). The ratio of Mascot scores obtained from LTQ-Orbitrap data for peptides in the molecular weight range of 4000-4500 Da versus 1000-12000 Da was 2.8, which was twice the ratio observed for Q-TOF data (1.4). The data in Figure 3 also show that the trend of Mascot scores as a function of peptide molecular weight is much more pronounced for LTQ-Orbitrap data (slope ) -3.2) than for Q-TOF data (slope ) -0.88). Taken together, these results reinforce the notion that there are intrinsic differences in the way these two LC-MS platforms fragment and detect phosphopeptides. Such differences would explain, at least in part, the limited overlap in phosphopeptide identification by the studied MS platforms. Further data analysis revealed that the presence of false negatives represents a major issue in phosphopeptide identification. Figure 4 shows that 5753 ions were returned by Mascot as phosphopeptides in both platforms (taking into account both those with significant and those with nonsignificant scores). Of these common 5753 ions, Q-TOF data returned 1439 significant identifications (p < 0.05) and 4314 nonsignificant identifications (p > 0.05). Interestingly, a large number of these nonsignificant identifications (1066 in total) were found to have scores above the statistically significant threshold in data obtained from the LTQ-Orbitrap platform (see Figure 4). Similarly, 470 nonsignificant hits from the LTQ-Orbitrap platform gave rise to significant identifications when analyzed by the Q-TOF platform. These results imply that there were at least 1066 and 470 false negatives in the data obtained from Q-TOF and LTQ-Orbitrap platforms, respectively, with a false positive rate of about 3% of the peptides identified with significant scores (as indicated by searches against a decoy database). To our knowledge, this is the first time that estimates of false negatives in phosphoproteomics are reported and the data reflect that this is a previously underappreciated problem in phosphoproteomic experiments. It is noteworthy to observe that the Q-TOF platform returned 1485 phosphopeptides with significant scores, which only represents 3.8% of the total number of phosphopeptides returned by Mascot (Table 1). Thus, considering we estimate the presence of at least 1066 false negatives, which equates to 2.7% of phosphopeptides returned by Mascot, we can say that standard procedures for phosphopeptide identification, such as those employed here, produce large number of false negatives and do not maximize the amount of information that can potentially be obtained from these experiments. As for the LTQOrbitrap, our results suggest that at least 1% of phosphopeptides returned by Mascot with scores below the statistically significant threshold were false negatives. Thus, even though this platform identified 9.8% of the total detected phosphopeptides (4308), there were still at least 470 false negatives. It should be noted that we almost certainly underestimate true false negative rates because our calculations are based on what the other LC-MS/MS technique identified. Nevertheless, irrespective of the actual numbers, our results indicate that, from the analytical standpoint, false negatives may be even more of a problem than false positives. However, our data also indicate that the confidence of phosphopeptide identification of low Journal of Proteome Research • Vol. 8, No. 8, 2009 3813

research articles scoring ions can be boosted by the employment of complementary mass spectrometric platforms, which results in a decreased number of false negatives. These conclusions insinuate an approach for alleviating the problem of false negatives and underreporting in phosphoproteomics experiments, which consists of using the historical data obtained from one or several phosphoproteomic experiments to enhance the identification of phosphopeptides in LCMS/MS database searches. Phosphopeptide hits with low scores as returned by Mascot and other search engines are normally regarded as nonsignificant and not considered for further experiments. However, according to our data, many of these low scoring ions are ‘real’ phosphopeptides and are thus false negatives (Figures 2 and 4), the presence of which is impossible to avoid because of the need to place stringent search parameters and acceptance criteria in order to minimize false positives. Our data (Figures 2 and 4) indicate that boosting phosphopeptide identification (that is, minimizing the presence of false negatives and false positives) could be achieved by exploiting the complementarity of different mass spectrometric platforms. However, not all laboratories have access to different instruments, and therefore, a more practical approach would be to search LC-MS/MS data against databases listing peptide sequences that have been previously identified as being phosphorylated with good confidence. To probe this concept, we investigated peptide matching to experimental MS/MS spectra as an alternative strategy to standard searches that use the theoretical fragmentation of proteins listed in databases.22,23 To explore whether unsupervised spectral matching of experimental data to a peptide library could be used to further enhance the analysis of phosphoproteomics data, we constructed a library including the ion trap CID spectra of the phosphopeptides identified with good confidence when using the LTQ-Orbitrap platform. We then used NIST software to match experimental quadrupole CID MS/MS spectra from the Q-TOF platform to the created phosphopeptide library. As shown in Figure 5, the software was able to match experimental data (in red) obtained from Q-TOF platform to that in the phosphopeptide spectral library (in blue) constructed using data from the LTQ-Orbitrap. Figure 5A shows the case of a spectral match of phosphopeptides for which original Mascot scores were significant and similar in both platforms. Figure 5B,C shows cases of false negatives, as returned by Mascot searches that were positively identified by spectral matching. The latter two examples reflect the great potential of spectral matching for increasing the confidence of large-scale phosphoproteomics experiments. Taken together, these results indicate that searching against historical phosphoproteomics data can be used to enhance the reliability and depth of phosphoproteomics experiments.

Conclusions Analysis of large-scale protein phosphorylation by LC-MS/ MS is becoming ‘routine’ in many proteomic laboratories. This is despite the fact that phosphopeptide identification is more challenging than the identification of proteins.15 For the identification of proteins, several low or medium scoring peptides have an additive effect on the total protein score. Thus, even when the scores of individual peptides are low, protein identification is still possible provided a sufficient number of peptides match a given protein.13,14 In contrast, identification of phosphopeptides relies on having good quality MS/MS spectra to achieve scores above the statistically significant 3814

Journal of Proteome Research • Vol. 8, No. 8, 2009

Alcolea et al. threshold for each of the identified peptides (as only one or two ion species is normally detected per phosphorylated peptide). Thus, confidence in these data is sometimes low, and the desire to minimize the number of false positives make researchers choose search and validation parameters that result in the introduction of many false negatives. The results presented here indicate that these problems can be circumvented, or at least minimized, by employing different and complementary mass spectrometers for analysis, and perhaps more intriguingly, by matching experimental data to historical phosphoproteomics data. As the number of published phosphoproteomics experiments increases, the sizes of known phosphoproteomes will also increase. At one point, the discovery of novel phosphorylated peptides will be very unlikely. Therefore, for studies aimed at the large-scale and deep comparison of phosphorylated sites across biological samples, it seems reasonable to search MS/MS data against databases that only list the relevant sequences (that is, restricted to contain peptides already known to be phosphorylated). Smaller sizes of these databases will therefore result in less random hits and greater confidence in the results. A caveat of this approach is that these types of searches are based on statistical assumptions which can result in many random matches if large numbers of peptides are searched against small databases.14 A parallel approach that can help to minimize the likelihood of random matches and further increase the confidence of the results is to construct spectral libraries of phosphopeptides, which are then matched to experimental data. Because the spectral matching approach is less dependent on statistical assumptions, it allows confident identification of peptides even when the size of the database is small.22,23 In conclusion, here we provide a proof-of-concept for approaches to minimize the occurrence of false negatives in phosphoproteomics experiments, which include repeated LCMS/MS analyses, preferably using different instruments, and searching MS/MS data against historical phosphoproteomics data. In the future, it would be important to develop comprehensive spectral libraries of phosphopeptides and user-friendly search engines for proteomics capable of querying these databases with experimental data (good progress in this respect is being made22-24). Searches against phosphopeptide spectral libraries would complement searches against conventional protein databases and the use of new fragmentation methods based on electron transfer dissociation.25-28 We therefore call for the proteomics community to collaborate in the creation of these bioinformatic tools, as they would represent an important resource for increasing the depth and reliability of large-scale phosphoproteomics experiments. Abbreviations: ACN, acetonitrile; FA, formic acid; IMAC, immobilized metal affinity chromatography; LC, liquid chromatography; MS/MS, tandem mass spectrometry; NIST, National Institute for Standards and Technology; TFA, trifluoroacetic acid; TiO2, titanium dioxide.

Acknowledgment. We gratefully acknowledge support from The Ludwig Institute for Cancer Research (to P.R.C.) and Bart’s and the London Charity (to P.R.C. and M.P.A.). We are also grateful to Richard Jacobs (MatrixScience) for suggesting the use of NIST spectral libraries, Neil Torbett for proofreading the manuscript, and Bart Vanhaesebroeck and members of the Centre for Cell Signalling for help and support.

research articles

Increased Confidence in Large-Scale Phosphoproteomics Data

References (1) Hunter, T. Protein kinases and phosphatases: the yin and yang of protein phosphorylation and signaling. Cell 1995, 80 (2), 225–36. (2) Mackay, H. J.; Twelves, C. J. Targeting the protein kinase C family: are we there yet. Nat. Rev. Cancer 2007, 7 (7), 554–62. (3) Faivre, S.; Kroemer, G.; Raymond, E. Current development of mTOR inhibitors as anticancer agents. Nat. Rev. Drug Discovery 2006, 5 (8), 671–88. (4) Hennessy, B. T.; Smith, D. L.; Ram, P. T.; Lu, Y.; Mills, G. B. Exploiting the PI3K/AKT pathway for cancer drug discovery. Nat. Rev. Drug Discovery 2005, 4 (12), 988–1004. (5) Wilhelm, S.; Carter, C.; Lynch, M.; Lowinger, T.; Dumas, J.; Smith, R. A.; Schwartz, B.; Simantov, R.; Kelley, S. Discovery and development of sorafenib: a multikinase inhibitor for treating cancer. Nat. Rev. Drug Discovery 2006, 5 (10), 835–44. (6) Johnson, S. A.; Hunter, T. Kinomics: methods for deciphering the kinome. Nat. Methods 2005, 2 (1), 17–25. (7) Sawyers, C. L. The cancer biomarker problem. Nature 2008, 452 (7187), 548–52. (8) Cutillas, P. R.; Khwaja, A.; Graupera, M.; Pearce, W.; Gharbi, S.; Waterfield, M.; Vanhaesebroeck, B. Ultrasensitive and absolute quantification of the phosphoinositide 3-kinase/Akt signal transduction pathway by mass spectrometry. Proc. Natl. Acad. Sci. U.S.A. 2006, 103 (24), 8959–64. (9) Trinidad, J. C.; Thalhammer, A.; Specht, C. G.; Lynn, A. J.; Baker, P. R.; Schoepfer, R.; Burlingame, A. L. Quantitative analysis of synaptic phosphorylation and protein expression. Mol. Cell. Proteomics 2008, 7 (4), 684–96. (10) Olsen, J. V.; Blagoev, B.; Gnad, F.; Macek, B.; Kumar, C.; Mortensen, P.; Mann, M. Global, in vivo, and site-specific phosphorylation dynamics in signaling networks. Cell 2006, 127 (3), 635–48. (11) Beausoleil, S. A.; Villen, J.; Gerber, S. A.; Rush, J.; Gygi, S. P. A probability-based approach for high-throughput protein phosphorylation analysis and site localization. Nat. Biotechnol. 2006, 24, 1285–1292. (12) Paradela, A.; Albar, J. P. Advances in the analysis of protein phosphorylation. J. Proteome Res. 2008, 7 (5), 1809–18. (13) Ducret, A.; Van Oostveen, I.; Eng, J. K.; Yates, J. R., III; Aebersold, R. High throughput protein characterization by automated reversephase chromatography/electrospray tandem mass spectrometry. Protein Sci. 1998, 7 (3), 706–19. (14) Perkins, D. N.; Pappin, D. J.; Creasy, D. M.; Cottrell, J. S. Probabilitybased protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999, 20 (18), 3551–67. (15) Bradshaw, R. A.; Burlingame, A. L.; Carr, S.; Aebersold, R. Reporting protein identification data: the next generation of guidelines. Mol. Cell. Proteomics 2006, 5 (5), 787–8.

(16) Bradford, M. M. A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. Anal. Biochem. 1976, 72, 248–54. (17) Thingholm, T. E.; Jensen, O. N.; Robinson, P. J.; Larsen, M. R. SIMAC (sequential elution from IMAC), a phosphoproteomics strategy for the rapid separation of monophosphorylated from multiply phosphorylated peptides. Mol. Cell. Proteomics 2008, 7 (4), 661–71. (18) Thingholm, T. E.; Jorgensen, T. J.; Jensen, O. N.; Larsen, M. R. Highly selective enrichment of phosphorylated peptides using titanium dioxide. Nat. Protoc. 2006, 1 (4), 1929–35. (19) Sano, A.; Nakamura, H. Chemo-affinity of titania for the columnswitching HPLC analysis of phosphopeptides. Anal. Sci. 2004, 20 (3), 565–6. (20) Ishihama, Y.; Rappsilber, J.; Andersen, J. S.; Mann, M. Microcolumns with self-assembled particle frits for proteomics. J. Chromatogr., A 2002, 979 (1-2), 233–9. (21) Elias, J. E.; Haas, W.; Faherty, B. K.; Gygi, S. P. Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations. Nat. Methods 2005, 2 (9), 667–75. (22) Wu, X.; Tseng, C. W.; Edwards, N. HMMatch: peptide identification by spectral matching of tandem mass spectra using hidden Markov models. J. Comput. Biol. 2007, 14 (8), 1025–43. (23) Hummel, J.; Niemann, M.; Wienkoop, S.; Schulze, W.; Steinhauser, D.; Selbig, J.; Walther, D.; Weckwerth, W. ProMEX: a mass spectral reference database for proteins and protein phosphorylation sites. BMC Bioinf. 2007, 8, 216. (24) Frewen, B. E.; Merrihew, G. E.; Wu, C. C.; Noble, W. S.; MacCoss, M. J. Analysis of peptide MS/MS spectra from large-scale proteomics experiments using spectrum libraries. Anal. Chem. 2006, 78 (16), 5678–84. (25) Wiesner, J.; Premsler, T.; Sickmann, A. Application of electron transfer dissociation (ETD) for the analysis of posttranslational modifications. Proteomics 2008, 8 (21), 4466–83. (26) Swaney, D. L.; McAlister, G. C.; Coon, J. J. Decision tree-driven tandem mass spectrometry for shotgun proteomics. Nat. Methods 2008, 5 (11), 959–64. (27) Molina, H.; Matthiesen, R.; Kandasamy, K.; Pandey, A. Comprehensive comparison of collision induced dissociation and electron transfer dissociation. Anal. Chem. 2008, 80 (13), 4825–35. (28) McAlister, G. C.; Berggren, W. T.; Griep-Raming, J.; Horning, S.; Makarov, A.; Phanstiel, D.; Stafford, G.; Swaney, D. L.; Syka, J. E.; Zabrouskov, V.; Coon, J. J. A proteomics grade electron transfer dissociation-enabled hybrid linear ion trap-orbitrap mass spectrometer. J. Proteome Res. 2008, 7 (8), 3127–36.

PR800955N

Journal of Proteome Research • Vol. 8, No. 8, 2009 3815