H-Score, a Mass Accuracy Driven Rescoring Approach for Improved

Sep 13, 2010 - E-mail: [email protected]. Synopsis. H-score has been developed for high mass accuracy tandem mass spectrometry data. ... ...
0 downloads 14 Views 1MB Size
H-Score, a Mass Accuracy Driven Rescoring Approach for Improved Peptide Identification in Modification Rich Samples Mikhail M Savitski,* Toby Mathieson, Isabelle Becher, and Marcus Bantscheff* Cellzome AG, Meyerhofstrasse 1, 69117 Heidelberg, Germany Received July 2, 2010

Abstract: Currently, scoring algorithms of many popular search engines for tandem mass spectrometry (MS/MS) data only partially utilize the information content of high mass accuracy MS/MS data. We have developed a new rescoring scheme, H-score, that employs high mass accuracy matching of all detected fragment ions to candidate peptide sequences in an abundance independent fashion. Peptides for which b or y ions are found for all or almost all backbone fragmentation sites are rewarded. For peptide hits generated by Mascot, rescoring proved to be particularly beneficial when applied on samples containing many different potential modifications. For a histone sample acquired on an Orbitrap Velos using HCD for peptide fragmentation, the H-score identified 24% more spectra at 0.01 false positive rate than Mascot scoring of spectra processed according to stateof-the-art methods and 61% better than Mascot scoring of unprocessed MS/MS spectra. For a low-abundance sample, where many weak spectra were detected, these numbers went up to 53 and 190%, respectively. When applied on a kinase-enriched sample containing only a few modifications, a smaller but still significant gain of 5% was observed. Keywords: mass accuracy • HCD • identification

Introduction Mass spectrometry-based protein identification relies on algorithms matching experimental tandem MS spectra to insilico generated spectra of a reference database using either probability-based or cross-correlation-based scoring algorithms.1 Popular search engines include Mascot,2 Sequest,3 X-Tandem,4,5 OMSSA,6 and many others. A scheme common to the majority of these search engines including Mascot is that the experimental m/z of the precursor ion is deconvoluted to the molecular mass and matched with ppm or Da accuracy to the theoretical masses of the peptides derived from a in silico digest (typically with trypsin) of a reference database (e.g., the human IPI database). Subsequently, the m/z values of the ions in the experimental MS/MS spectrum are matched to the * To whom correspondence should be addressed. Mikhail Savitski, Cellzome AG, Meyerhofstrasse 1, 69117 Heidelberg, Germany. Ph.: 49-622113757318. Fax: 49-6221-13757203. E-mail: [email protected]. Marcus Bantscheff, Cellzome AG, Meyerhofstrasse 1, 69117 Heidelberg, Germany. Ph.: 49-6221-13757310. Fax: 49-6221-13757210. E-mail: [email protected]. 10.1021/pr1006813

 2010 American Chemical Society

theoretical fragment ions of these peptides with a set accuracy and a score is calculated that reflects the quality of the match. In particular Mascot,2 one of the most widely used search engines in the field, favors MS/MS spectra where most of the abundant ions are explained by theoretical peptide fragments. A balance is struck between the number of the most abundant ions considered for the match and the percentage of the ions that are matched. Consequently, ions below a certain abundance are not matched to the theoretical fragment ions. This strategy is appropriate for the low mass accuracy MS/ MS spectra that are produced, for example, by ubiquitous ion trap and older Q-TOF instruments. In particular, ion trap MS/ MS spectra are characterized by low mass accuracy and relatively high noise levels. Consequently, confident assignment of low-abundance fragment ions is impaired by the high probability of false positive fragment ion matches to chemical and electronic noise. This is best explained by low-abundance noise ions covering a large range of the spectrum, for example, assuming a mass accuracy of 0.8 Da 25 singly charged noise signals within a 100 Th range will cover 20% of this range and hence have a high probability to match to any theoretical ion randomly. However, for high mass accuracy (0.02 Da) MS/MS spectra, such as HCD spectra generated on the Orbitrap,7,8 even low abundance peaks have a very low probability of a random match (e.g., 25 noise ions within a 100 Th range with a mass accuracy of 0.02 Da cover only 0.5% of this range). Previous work has described efficient spectrum filtering approaches for improving the Mascot score for high mass accuracy data.9-11 This is achieved by deisotoping the isotopic clusters, that is, removing all but the monoisotopic peak of a cluster and deconvoluting higher charge state ions (>1+) to the m/z of the corresponding 1+ charge state. This filtering yields cleaner spectra and allows Mascot to consider ions lower down the abundance scale. In doing so, the problem of ignoring lowabundance peaks is mitigated, but not entirely solved, because there are often still valid and reliable fragment ions left in the spectrum that are not considered by Mascot. In this report, we show how a rescoring scheme independent of fragment ion abundance (H-score) outperforms the traditional Mascot ion scoring for high mass accuracy MS/MS spectra acquired with higher energy collision dissociation (HCD)11-13 in an Orbitrap Velos8 mass spectrometer. The H-score assigns equal significance to all fragment ions and utilizes the fact, described in previous works on denovo sequencing,14,15 that completely or almost completely sequenced peptides are more reliable than their partially sequenced counterparts. In this study, we have focused on Journal of Proteome Research 2010, 9, 5511–5516 5511 Published on Web 09/13/2010

communications identification of peptides from purified histone samples. Since histones are heavily modified, Mascot searches need to allow for many modifications and a large number of missed cleavage sites. The search space is consequently large and reliable identification of peptides is more challenging than for searches aimed at standard protein identification. The application of the H-score led to superior identification results, with gains in identified spectra as high as 53% compared to Mascot at a false positive rate (FPR) of 0.01.

Methods Purification of Histone Proteins. Histones were extracted and purified as previously described.16 Briefly, 4 × 107 K562 cells cells were washed with phosphate buffered saline (PBS), centrifuged and resuspended in 4 mL of hypotonic lysis buffer (10 mM Tris, 1 mM KCl, 1.5 mM MgCl2, 1 mM DTT + 1 tablet protease inhibitor cocktail, Roche Switzerland) per 1 × 107 cells. The cells were divided and transferred into four tubes to gain four comparable samples and incubated for 30 min at 4 °C. The intact nuclei were spun in a cooled tabletop centrifuge. The nuclei were resuspended in 400 µL 0.4 N H2SO4 and incubated for 30 min at 4 °C. Following centrifugation, the supernatant containing the histones was used for acetone precipitation. The histone pellet was redissolved in 100 µL of 100 mM triethylammoniumbicarbonate buffer containing 1% SDS. For the in-gel digest 5 µL of the reduced and carbamidomethylated samples was placed onto a 10% Bis-Tris gel (Invitrogen). Coomassie-stained histone bands were cut into five slices, digested with trypsin, and labeled with isobaric tags (TMT, Thermo Fisher) according to the manufacturer’s instructions. Two samples, sample 1 and sample 2, were prepared in this way. For the latter sample, a 10-fold lower amount of material was used. Kinase Enriched Sample. Kinase enrichment using mixed kinase inhibitor resins were performed as described previously.17 Briefly, a 1:1 mixture of Jurkat and Ramos cell lysates was used at a final protein concentration of 5 mg/mL and 1-mL lysate aliquots were incubated with 35 µL Kinobead slurry for 1 h. Bound kinases were eluted after washing using 50 µL × 2× SDS sample buffer. Samples were run on a denaturing gel for 20 min to remove reagents incompatible with tryptic digestion. Tryptic digestion and TMT labeling were performed as described above. This sample is referred to as sample 3 in the remainder of the text. Mass Spectrometry. The samples were measured on an LTQ Orbitrap Velos (Thermo Fisher) coupled online to an Eksigent 1D+ nano-LC system (Eksigent, Dublin, CA). Peptide separation was performed for 120 min using a 75 um ID tip column filled with 3 um Reprosil C18 AQ (Dr. Maisch, Ammerbuch-Entringen, Germany) material. Eluting peptides were detected in the Orbitrap at 30 000 resolution and were subjected to HCD fragmentation with the following instrument settings: Target value FT, 1E5 ions; collision energy, 48%; maximum FT fill time, 300 ms; isolation width, 2.5 Da. Fragment ions were detected in the Orbitrap at a resolution of 7500 and a noise filter was automatically applied by the instrument acquisition software. This filter removes all (electronic and chemical noise) ions below an intensity cutoff of 2.4 standard deviations of all detected signals within the spectra (personal communication with Thermo Scientific). On average, 81 peaks were contained in each HCD spectrum, the standard deviation was 22. Peptide Identification using Mascot. The acquired raw data were processed with in-house developed software. MGF files 5512

Journal of Proteome Research • Vol. 9, No. 11, 2010

were created, optionally processed as described below and submitted to the Mascot (Matrix Science, London, U.K.) search engine. The following settings were used: 7.5 ppm precursor mass accuracy (monoisotopic mass), 0.02 Da fragment ion mass accuracy. The following modifications were selected for the histone samples: variable modifications acetylation (K), Acetyl (Protein N-term), citrullination (R), Dimethyl (KR), Methyl (KR), Phospho (STY), TMT6plex (K), TMT6plex (N-term), Trimethyl (K); fixed modifications carbamidomethyl (C), the maximum number of missed cleavages was set to 6. The following modifications were selected for the kinase enriched sample 3: variable modifications oxidation (M), TMT6plex (N-term), acetyl (protein N-term); fixed modifications TMT6plex (K), carbamidomethyl (C), the maximum number of missed cleavages was set to 3. The instrument type was chosen as ESI-TRAP, the enzyme specificity as Trypsin/P for both the histone and kinase sample. All data were searched against a nonredundant, in-house curated version of the human International Protein Index (versions 1.0-3.54) database supplemented with protein sequences of bovine serum albumin, porcine trypsin and mouse, rat, sheep and dog keratins combined with a decoy version thereof. Our database contains a total of 163 476 protein sequences (50% forward, 50% reversed). The FPR was calculated using the reversed database approach as previously described.18 Processing of HCD Spectra. The HCD spectra were deisotoped and deconvoluted as previously described.9,11 Briefly the isotopes of all clusters with a charge state equal or less to that of the selected precursor were removed, and the m/z of the monoisotopic peak was changed (if the charge state was greater than 1+) to the calculated m/z of the singly charged ion. In addition, TMT reporter ions were removed from the spectrum. An accuracy of 0.02 Da was used for all operations. The filtering procedure reduced the average number of peaks in an HCD spectrum to 62 peaks, with a standard deviation of 14. H-Score Calculation. All rank 1 peptide (bold red) Mascot hits were extracted from the Mascot generated .dat-files. A new score was calculated for these hits by comparing the Mascot suggested sequence (including suggested modifications) to the spectrum in the mgf file. Theoretical m/z for singly charged b and y ions were calculated for the suggested sequence. Ions in the deconvoluted and deisotoped experimental spectrum were compared to the theoretical ions. If the experimental ion matched to a theoretical b or y ion with an accuracy of 0.02 Da, (or optionally 20 ppm), it was counted as a match. Each time an ion was matched to a b or y ion corresponding to a cleavage site that had not been explained before, the H-score was incremented by 1. After matching, the score of a peptide is in the range of 0 to L - 1, where L is the length of the peptide. If the score was L - 1, that is, all cleavage sites were explained, an additional 3 score points were awarded to the peptide. If L-2 cleavages were explained, that is, only a single cleavage was missed, a single additional score point was awarded.

Results and Discussion Need for High Mass Accuracy MS/MS. Resonance activated collision-induced dissociation, CID is currently the most widespread method for acquisition of tandem MS spectra in high-throughput proteomics applications using ion trap or hybrid ion trap-ICR and LTQ-Orbitrap mass spectrometers. Because of superior scan speed and sensitivity fragment spectra are typically analyzed in the ion trap part of modern LTQ-

communications

Figure 1. HCD spectrum of the N-terminally TMT6plex labeled peptide KSTGGKAPR (m/z ) 405.585834, charge ) 3+), modified on lysines K1 and K6 with trimethylation and acetylation.

Orbitrap instruments rather than in the Orbitrap analyzer. Hence fragment ions are typically detected with low mass accuracy. This approach works well for standard protein identification and unmodified peptides. However, when looking for many different modifications at once and allowing for a large number of missed cleavages, as, for example, required for analysis of histone modifications, high mass accuracy detection of fragment ions is desirable. A particular challenge is the distinction of positional isomers as exemplified in Figure 1. Here the N-terminally TMT 6-plex labeled peptide KSTGGKAPR (m/z ) 405.585834, charge ) 3+) of histone H3.3, position 10-19, was found modified on the lysines, K1 and K6 with either trimethylation or acetylation. Trimethylation and acetylation are very close in mass 42.047 and 42.011 Da, ∆M ) 0.036 and high mass accuracy on the precursor ion clearly indicated the presence of one trimethylation and one acetylation in the same peptide. Fragment ions for the two different suggestions, K1(Ac), K6(Tri) and K1(Tri), K6(Ac), however, match to the theoretical fragment ions within 100 ppm or 0.05 Da accuracy. To make an unambiguous site assignment of the modifications in this case, an MS/MS accuracy of 18 corresponding to a 31% improvement. For the spectrum in Figure 1, the Mascot score improved from 10 (FPR of 0.055) to 25 (FPR of 0.004), thus becoming significant. H-Score Fundamentals. In a next step, we devised a simple rescoring procedure of Mascot hits using a fragment ion abundance independent approach that we dubbed H-score (Figure 2). This is based on two observations: acquisition of HCD spectra in the Orbitrap mass analyzer provides high mass accuracy (20 ppm) and the spectra contain low noise levels. The latter is due to an automatic noise filtering procedure applied by the instrument acquisition software (see Methods section). Consequently, the probability of false positive ion matches for low abundant signals is strongly reduced as compared to ion trap CID data (fragment mass tolerances typically 0.5-0.8 Da), and beneficial effects of the fragment ion abundance based feature selection currently employed by Mascot are likely to be minimized if not eliminated. Hence, in our scoring strategy, each detected fragment ion is given the same weight regardless of the abundance of the matched ion. We have also observed that with high mass accuracy (20 ppm or 0.02 Da) matching of the fragments, confidence in peptide identifications for which all or almost all cleavage sites are explained by either b or y ions is much higher than for peptides with lower sequence coverage (Figure 3a). This observation was also made in previous publications on high mass accuracy denovo sequencing.14,15 In sample 1, we found that spectra Journal of Proteome Research • Vol. 9, No. 11, 2010 5513

communications

Figure 3. (a) Distributions of peptides with different numbers of unassigned cleavage sites in the forward (blue) and reversed (red) databases. (b) Distributions of the lengths of the peptides (in amino acids). Green line, peptides with more than one unexplained cleavage site; orange line, peptides identified with one unexplained cleavage site; and purple line, peptides with all cleavage sites explained, for the hits in the Forward database. (c) Same as (b), but for the hits in the reversed database. (d) FPR as a function of the number of unassigned cleavage sites. For different fragment mass tolerances, solid line, 0.02 Da; dashed line, 0.05 Da; dotted line, 0.5 Da.

corresponding to peptides where all or all but one cleavage sites are explained have an FPR of 0.008 or 0.05 respectively, regardless of the Mascot ion score. Since the distribution of fragment ion abundances is not uniform, but exponential,19 peptides with all or almost all cleavage sites will be on average of shorter length (Figure 3b,c). To highlight the effect of mass accuracy on the reliability of completely or almost completely sequenced peptides, we researched sample 1 using 0.05 Da fragment ion tolerance for both Mascot and the subsequent H-score method and also with 0.5 Da (a fragment ion tolerance typical for low resolution ion trap data). The FPR of spectra identification as a function of unassigned cleavage sites shows a clear trend with decreasing mass accuracy (Figure 3d). For the 0.5 Da mass accuracy data, the FPR for spectra where all or all but one cleavage sites are explained is four times higher compared to the 0.02 Da and the 0.05 Da cases. These observations led to the inception of a “reward scheme” for completely and almost completely sequenced peptides. The score increments for completely sequenced peptides and for almost completely sequenced peptides were optimized to obtain the best separation between the hits from the reversed and forward databases. The figure of merit for the separation was the number of spectra identified at 0.01 and 0.005 FPR in a training set of data (a replicate analysis of sample 1). For fully sequenced peptides, the score increment where the highest increase of spectra at 0.01 FPR level was achieved was 3 or higher. Since in this case the greatest increase was identical for several values, the value with the highest increase at 0.005 FPR level was selected: 3 (supplementary Figure 2a, Supporting Information). The same procedure was used to optimize the reward value for peptides with one missed cleavage site, which turned out to be 1 (supplementary Figure 2b, Supporting Information). H-Score, Identification. All Mascot-suggested (0.02 Da accuracy) rank 1 peptides for sample 1 - best peptide to sequence 5514

Journal of Proteome Research • Vol. 9, No. 11, 2010

matches at 0.02 Da fragment mass tolerance - from the reversed and forward databases (Figure 4a) were rescored using the H-score, (Figure 4b). The distribution of the forward hits is bimodal, with two partially resolved components. The separation of the true hits and false hits is better than that achieved by using the Mascot scores alone. Consequently, a larger number of spectra led to reliable identifications (Figure 4c). In total 10887 spectra were identified at the 0.01 FPR level (Hscore >6), as compared to 8809 spectra at the same level using Mascot ion scoring (>18) on processed HCD spectra data (24% gain), and 6718 spectra for the Mascot scoring when using unprocessed spectra (62% gain). The spectrum in Figure 1 received an H-score of 11 which in our data set corresponds to an FPR of 0, since no reversed database hits exist with an H-score of above 10. The H-score scheme was tested on an additional sample (sample 2), where the amount of starting material was 10 times lower and more weak spectra were acquired. The relations between the number of identified spectra and the FPR are shown in Supplementary Figure 3 (Supporting Information). Because of the larger number of weak spectra, the H-score procedure had an even greater effect. One-thousand threehundred sixty-two spectra were identified at the 0.01 FPR level using H-score (>7) compared to 891 spectra at the same level using the Mascot scoring (>21) on the processed spectra data (53% gain), and 420 spectra for the Mascot scoring (>19) when using the unprocessed spectra (190% gain). Effect of MS/MS Accuracy on the FPR Rate. We evaluated the effect of MS/MS accuracy on the FPR by repeating the Mascot search of the processed MS/MS spectra from sample 1 and the subsequent H-score analysis, with 0.05 Da and also 0.5 Da fragment ion mass tolerance. The effect on the FPR rate was significant (Figure 4d, Table 1). For the 0.05 Da case the Mascot 0.01 FPR threshold went up to >27 (compared to >18 for 0.02 Da search) and the number of identified spectra at 0.01

communications

Figure 4. Analysis of data from a Mascot search with a high number of variable modifications (9) and allowed missed cleavages (6), sample 1. (a) Distributions of Mascot scores (0.02 Da accuracy) for the forward (blue line) and reversed (red line) databases. (b) Distributions of H-scores (0.02 Da accuracy) for the forward (blue line) and reversed (red line) databases. (c) Numbers of correctly identified spectra as a function of the false positive rate, calculated for the Mascot scores (0.02 Da accuracy, solid line), and for the H-score rescored spectra (dashed line) using 0.02 Da mass accuracy for fragment ion matching. (d) Same as (c) but using 0.05 Da for the Mascot search (solid line, orange) and for the fragment ion matching done by the H-score (dashed line, orange). For 0.5 Da accuracy: solid line, violet Mascot; dashed line, violet H-score. Table 1. Performance of Mascot and H-Score at Different MS/MS Mass Accuracy Settings fragment mass tolerance 0.02 Da

# spectra(threshold) at 0.01 FPR, H-score # spectra(threshold) at 0.01 FPR, Mascot

0.05 Da

0.5 Da

10887(>6)

9303(>7)

4533(>10)

8809(>18)

6777(>27)

4987(>36)

FPR dropped from 8809 to 6777 (23% reduction). The effect on the H-score (using 0.05 Da for fragment ion matching) was similar, the 0.01 FPR threshold went up to >7 and the number of identified spectra dropped from 10887 to 9303 (14.5% reduction). The accuracy of 0.05 Da was high enough for the H-score to still significantly outperform Mascot, 37% more spectra identified at 0.01 FPR. For the search using 0.5 Da

fragment ion mass accuracy, the effect on the FPR was even greater (Figure 4d). The Mascot 0.01 FPR threshold went up to >36 and the number of identified spectra at 0.01 FPR dropped to 4987. For the H-score the 0.01 FPR threshold increased to >10 and the number of spectra at 0.01 FPR dropped to 4533. At this mass accuracy the H-score performs worse than the Mascot ion score (9% less spectra identified at 0.01 FPR) which is in line with a higher probability of false positive fragment ion assignments for low mass accuracy data. For calculation of the H-score, we have the option of matching the fragment ions with relative (ppm) or absolute (Da) mass accuracy. When comparing the performance of the H-score using 20 ppm accuracy with 0.02 Da accuracy on sample 1, we found no significant impact on false discovery rates, (Supplementary Figure 4, Supporting Information). However, the total amount of matched fragment ions increased by 4.3%. This should have a

Figure 5. Analysis of data from a Mascot search with a limited number of variable modifications (3) and allowed missed cleavages (3). (a) Distributions of Mascot scores for the forward (blue line) and reversed (red line) databases. (b) Distributions of H-scores for the forward (blue line) and reversed (red line) databases. (c) Numbers of correctly identified spectra as a function of the false positive rate, calculated for the Mascot scores (solid line), and for the H-score rescored spectra (dashed line) using 0.02 Da mass accuracy for fragment ion matching. Journal of Proteome Research • Vol. 9, No. 11, 2010 5515

communications positive effect on the confidence of PTM site allocations and will be the subject of a subsequent study. Effect of H-Score on Data Searched with Few Modifications. We have also tested the H-score on a kinase enriched sample20 where the Mascot search is performed with few modifications and a lower number of allowed missed cleavages (see method section). These settings restrict the search space considerably, which should lead to fewer false positives and a more moderate gain in terms of identification by using the H-score. Indeed this is reflected in the relatively smaller distributions of hits in the reversed database for both Mascot and the H-score (figure 5a,b) and a more moderate, but still significant gain of 5% in reliably identified spectra at the FPR level of 0.01 (Mascot, 5027 spectra, threshold of >7, and H-score, 5276 spectra, threshold of >4) as depicted in Figure 5c. The number of proteins identified with more than one confident peptide matches (FPR below 0.01) increased from 391 to 402, a 2.8% gain.

Conclusions We have developed a new rescoring scheme for high mass accuracy HCD spectra acquired on the Orbitrap mass analyzer. The H-score uses all fragment ions detected in a tandem MS spectrum independent of their abundance and all matches are counted equally. Peptides for which b or y ions are found for all or almost all cleavage sites are rewarded. For samples searched with a high number of variable modifications, the rescoring scheme performs 24% better than the Mascot scoring of spectra processed according to state of the art methods and 61% better than Mascot scoring of unprocessed MS/MS spectra. For samples with low protein amount, where many weak spectra are present, these numbers increase to 53 and 190%, respectively. The H-score is most likely open for further development. The addition of more complex parameters might lead to even bigger gains. In this report, we focused mainly on the factors which led to large improvements in the identification of modified peptides when using high mass accuracy MS/MS data: the assignment of equal weights to fragment ions and rewarding identifications where all or almost all cleavage sites in a peptide are explained. Additionally, if one considers solely the FPR for the modified peptides (as reported previously21), the gains would be even higher. While all data presented here were acquired on an Orbitrap mass spectrometer, we expect that the method should be applicable with similar benefits to all instruments allowing acquisition of high mass accuracy tandem MS spectra. Similarly, the application of the H-score is not restricted to the widely used Mascot search engine but can be easily evaluated for rescoring hits from other search algorithms. An implementation of the H-score algorithm and the spectrum filtering procedure is available as a Python script in the Supporting Information.

Acknowledgment. This study was supported by the Bundesministerium fur Bildung und Forschung, Spitzencluster BioRN, Verbundprojekt Inkubator, BioRN-INE-TP01 and BioRN-IND-TP02. The authors would like to thank Frank Weisbrodt for help with preparing the figures, the Cellzome biology department for supply with K562 cells and Gerard Drewes and David Simmons for helpful discussions and support. Supporting Information Available: Supplementary Figures 1-4, as described in the main text. A table of identified peptides for samples 1-3, complete with Mascot and H-scores. Also, the H-score algorithm is available as a Python script. This material is available free of charge via the Internet at http://pubs.acs.org. 5516

Journal of Proteome Research • Vol. 9, No. 11, 2010

References (1) Steen, H.; Mann, M. The ABC’s (and XYZ’s) of peptide sequencing. Nat. Rev. Mol. Cell. Biol. 2004, 5 (9), 699–711. (2) Perkins, D. N.; Pappin, D. J.; Creasy, D. M.; Cottrell, J. S. Probabilitybased protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999, 20 (18), 3551–67. (3) Eng, J. K.; McCormack, A. L.; Yates Iii, J. R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 1994, 5 (11), 976–89. (4) Craig, R.; Beavis, R. C. TANDEM: matching proteins with tandem mass spectra. Bioinformatics 2004, 20 (9), 1466–7. (5) Craig, R.; Beavis, R. C. A method for reducing the time required to match protein sequences with tandem mass spectra. Rapid Commun. Mass Spectrom. 2003, 17 (20), 2310–6. (6) Geer, L. Y.; Markey, S. P.; Kowalak, J. A.; Wagner, L.; Xu, M.; Maynard, D. M.; Yang, X.; Shi, W.; Bryant, S. H. Open mass spectrometry search algorithm. J. Proteome Res. 2004, 3 (5), 958–64. (7) Olsen, J. V.; de Godoy, L. M.; Li, G.; Macek, B.; Mortensen, P.; Pesch, R.; Makarov, A.; Lange, O.; Horning, S.; Mann, M. Parts per million mass accuracy on an Orbitrap mass spectrometer via lock mass injection into a C-trap. Mol. Cell. Proteomics 2005, 4 (12), 2010–21. (8) Olsen, J. V.; Schwartz, J. C.; Griep-Raming, J.; Nielsen, M. L.; Damoc, E.; Denisov, E.; Lange, O.; Remes, P.; Taylor, D.; Splendore, M.; Wouters, E. R.; Senko, M.; Makarov, A.; Mann, M.; Horning, S. A dual pressure linear ion trap - Orbitrap instrument with very high sequencing speed. Mol Cell Proteomics 2009, 8, 2759–2769. (9) Nielsen, M. L.; Savitski, M. M.; Zubarev, R. A. Improving protein identification using complementary fragmentation techniques in fourier transform mass spectrometry. Mol. Cell. Proteomics 2005, 4 (6), 835–45. (10) Savitski, M. M.; Nielsen, M. L.; Zubarev, R. A. New data baseindependent, sequence tag-based scoring of peptide MS/MS data validates Mowse scores, recovers below threshold data, singles out modified peptides, and assesses the quality of MS/MS techniques. Mol. Cell. Proteomics 2005, 4 (8), 1180–8. (11) Zhang, Y.; Ficarro, S. B.; Li, S.; Marto, J. A. Optimized Orbitrap HCD for quantitative analysis of phosphopeptides. J. Am. Soc. Mass Spectrom. 2009, 20 (8), 1425–34. (12) Kocher, T.; Pichler, P.; Schutzbier, M.; Stingl, C.; Kaul, A.; Teucher, N.; Hasenfuss, G.; Penninger, J. M.; Mechtler, K. High Precision Quantitative Proteomics Using iTRAQ on an LTQ Orbitrap: A New Mass Spectrometric Method Combining the Benefits of All. J. Proteome Res. 2009, 8 (10), 4743–52. (13) Olsen, J. V.; Macek, B.; Lange, O.; Makarov, A.; Horning, S.; Mann, M. Higher-energy C-trap dissociation for peptide modification analysis. Nat. Methods 2007, 4 (9), 709–12. (14) Savitski, M. M.; Nielsen, M. L.; Kjeldsen, F.; Zubarev, R. A. Proteomics-grade de novo sequencing approach. J Proteome Res 2005, 4 (6), 2348–54. (15) Frank, A. M.; Savitski, M. M.; Nielsen, M. L.; Zubarev, R. A.; Pevzner, P. A. De novo peptide sequencing and identification with precision mass spectrometry. J. Proteome Res. 2007, 6 (1), 114–23. (16) Shechter, D.; Dormann, H. L.; Allis, C. D.; Hake, S. B. Extraction, purification and analysis of histones. Nat. Protoc. 2007, 2 (6), 1445– 57. (17) Bantscheff, M.; Eberhard, D.; Abraham, Y.; Bastuck, S.; Boesche, M.; Hobson, S.; Mathieson, T.; Perrin, J.; Raida, M.; Rau, C.; Reader, V.; Sweetman, G.; Bauer, A.; Bouwmeester, T.; Hopf, C.; Kruse, U.; Neubauer, G.; Ramsden, N.; Rick, J.; Kuster, B.; Drewes, G. Quantitative chemical proteomics reveals mechanisms of action of clinical ABL kinase inhibitors. Nat. Biotechnol. 2007, 25 (9), 1035–44. (18) Elias, J. E.; Gygi, S. P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 2007, 4 (3), 207–14. (19) Zubarev, R. A.; Zubarev, A. R.; Savitski, M. M. Electron capture/ transfer versus collisionally activated/induced dissociations: solo or duet. J. Am. Soc. Mass Spectrom. 2008, 19 (6), 753–61. (20) Savitski, M. M.; Fischer, F.; Mathieson, T.; Sweetman, G.; Lang, M.; Bantscheff, M. Targeted data acquisition for improved reproducibility and robustness of proteomic mass spectrometry assays. J. Am. Soc. Mass Spectrom, available online 25 January 2010. (21) Choudhary, C.; Kumar, C.; Gnad, F.; Nielsen, M. L.; Rehman, M.; Walther, T. C.; Olsen, J. V.; Mann, M. Lysine acetylation targets protein complexes and co-regulates major cellular functions. Science 2009, 325 (5942), 834–40.

PR1006813