SALAMI (Spectrum ALignments Using High Accuracy Mass and HIgh

Department of Pathology, Harvard Medical School and Children's Hospital Boston, Boston, Massachusetts 02115. Received October 23, 2007. We present a ...
0 downloads 0 Views 1MB Size
SALAMI (Spectrum ALignments Using High Accuracy Mass and HIgh Sensitivity Data): How To Make the Best out of Hybrid MS/MS Data Flavio Monigatti* and Hanno Steen* Department of Pathology, Harvard Medical School and Children’s Hospital Boston, Boston, Massachusetts 02115 Received October 23, 2007

We present a software algorithm that combines ion trap and orbitrap product ion spectra acquired in parallel. The hybrid product ion spectra identify more peptides than when using two separate searches for the orbitrap and LTQ data. The program extracts the high-accuracy mass data from the Orbitrap mass analyzer and combines it with the high-sensitivity data analyzed in the LTQ linear ion trap; the m/z values of the high-confidence fragment ions are corrected to orbitrap mass accuracies and the fragment ion intensities are amplified. This approach utilizes the parallel spectrum measurement capabilities of the LTQ-Orbitrap. We present our approach to handling this type of hybrid data, explain our alignment program, and discuss the advantages of the chosen methodology. Keywords: spectrum-alignments • high-accuracy data • high-sensitivity data • hybrid data

Introduction The significant technological advances in mass spectrometry over the last two decades have made it the preferred technology for the analysis of proteins, that is, identification and characterization. Initially, peptide mass fingerprinting (PMF) was developed in the early 1990s for rapid and sensitive protein identification.1–3 An alternative protein identification strategy was established in the mid-1990s utilizing sequence-revealing peptide fragmentation data that were generated in tandem mass spectrometers.4,5 Initially, low-resolution/low-accuracy quadrupole-based mass spectrometers were used for peptide fragmentation-based protein identification routines. The subsequent introduction of quadrupole-TOF hybrid tandem mass spectrometers6,7 enabled sensitive peptide fragmentation, that is, protein identifications based on high-resolution/high-accuracy data in an LC-time scale. The low-resolution/low-accuracy quadrupolebased mass spectrometers were used in a “brute force” approach, acquiring as much data as possible with subsequent database searches to identify the good spectra in the rather extensive background of bad product ion spectra. The highaccuracy/high-resolution instruments allowed a “charge-state dependent” approach, utilizing charge state information and accurate m/z values to make the decision as to whether to fragment an observed precursor.8 The advent of next generation hybrid tandem mass spectrometers with independent mass analyzers and detectors as first realized in the LTQ-FT and later in the LTQ-Orbitrap (both Thermo Scientific) led to new acquisition strategies as it became possible to simultaneously acquire low- and high* Flavio Monigatti, Children’s Hospital Boston, Department of Pathology/ Enders 308, 320 Longwood Ave., Boston, MA 02115; tel, +1 617 919 2709; fax, +1 617 730 0168; e-mail, [email protected]. Hanno Steen, Children’s Hospital Boston, Department of Pathology/Enders 1130, 320 Longwood Ave., Boston, MA 02115; tel, +1 617 919 2629; fax, +1 617 730 0168; e-mail, [email protected].

1984 Journal of Proteome Research 2008, 7, 1984–1993 Published on Web 04/12/2008

accuracy data in the two independent mass analyzers. However, due to the superior detection limits in the linear quadrupole ion trap as compared to the FT-ICR mass analyzer (LTQ-FT), a highly accurate survey mass spectrum is generated while acquiring low-accuracy product ion spectra in the linear ion trap, generating hybrid data for the database search comprising highly accurate precursor and low-accuracy product ion m/zvalues. The second implementation of independent mass analyzers and detectors was the LTQ-Orbitrap, which was an improvement over the LTQ-FT as acquiring highly accurate and resolved product ion spectra becomes possible in a time scale and with sensitivity comparable with values observed for the linear quadrupole ion trap. The current realization of the orbitrap instrument allows the simultaneous acquisition of high- and low-accuracy product ion spectra in the two mass analyzers with minimal loss in the duty cycle, generating two sets of independent product ion mass spectra of the same precursors. In this paper, we describe an algorithm that extracts the highly accurate fragment ion m/z-values acquired in the orbitrap and combines this information with the high-sensitivity but low-accuracy fragmentation data obtained from a linear ion trap. The resulting hybrid spectrum produced by our algorithm combines the advantages of the two different mass analyzers, that is, the high accuracy and high sensitivity. This data surpasses the data obtained from the Orbitrap as well as the LTQ in terms of identifiable spectra when considering each spectra category separately and together. We show the advantage of using such an algorithm in conjunction with a widely used database search program Mascot. Mascot only uses a small number of peaks selected to have particular properties in any list using an iterative mechanism to reduce the number of peaks to the minimum necessary for a particular identification. By identifying fragment ions for which we have additional spectral evidence from two independent mass analyzers, we are able to provide Mascot 10.1021/pr7006895 CCC: $40.75

 2008 American Chemical Society

research articles

SALAMI: How To Make the Best out of Hybrid MS/MS Data

Figure 1. Scheme of the scan methodology employed to acquire the spectra. A maximum of 11 scan events were recorded per cycle: 1 full MS scan measured in the orbitrap mass analyzer, followed by 10 MS2 scans in total of the top 5 peptides measured in the MS1 scan. Each peptide was analyzed in the orbitrap and in the linear ion trap.

with data that is more relevant to the correct identification of a peptide and thus results in a higher score. By providing Mascot with signals which are complementary and overlapping, we reduce the likelihood of supplying Mascot with spurious noise signals.

Material and Methods Zebrafish Sample Preparation. One tricane-euthanized zebrafish was shock-frozen in liquid nitrogen prior to slicing it into 5 mm segments. These segments were suspended in 1% Triton X-100, 1% sodium deoxycholate, 0.1% SDS, 150 mM NaCl, 10 mM sodium phosphate (pH 7.2), 1% Trasylol, and protease inhibitors (Roche Complete Mini) prior to homogenization. The suspension was centrifuged at 15 000 rpm for 15 min at 4 °C. The supernatant was removed and stored at -80 °C. An aliquot of the extract was separated by SDS-PAGE using a 4-12% NuPAGE gel (Invitrogen). The gel lane was divided into 20 slices of similar size, and the proteins were ingel digested using established protocols. The digests were desalted using STAGE tips9 prior to LC/MS analysis. Liquid Chromatography Mass Spectrometry (LC/MS). All data were acquired on an LTQ-Orbitrap (Thermo Scientific, San Jose, CA) equipped with an Agilent 1100 nanoflow HPLC system. Dynamic exclusion and charge state screening were enabled, rejecting singly charged precursors and precursors of undefined charge state. Additional details about the acquisition routine are given in the text. The LC-parameters are as follows: 20 min of loading at 3% B (600 nL/min); 3-7.5% B for 1 min 50 s (200 nL/min); 7.5-50% B for 38 min (200 nL/min); 50-90% B in 5 min (200 nL/min); hold 90% B for 10 min (600 nL/min); hold 3% B for 15 min (600 nL/min); buffer A, 0.1% formic acid; buffer B, 90% acetonitrile (ACN)/ 0.1% formic acid. Acquisition Method. The order of scan events during the LC/MS experiments is shown in Figure 1: first a high-resolution/high-accuracy survey mass spectrum was acquired in the Orbitrap mass analyzer with a target resolution of 60 000. The achieved average mass accuracy for the m/z values of the precursor was below 3 ppm. Subsequently, the five most abundant multiply charged precursors were selected for fragmentation, whereby two product ion spectra were acquired for each precursor, one in the Orbitrap and in parallel, one in linear ion trap. The sample was separated using a 75 min gradient as we dealt with a highly complex sample, a zebrafish whole cell lysate, fractionated by SDS-PAGE; for this study, we focused on the gel slice covering the 50-60 kDa molecular weight fraction. A dynamic exclusion of 60 s was used during the data dependent acquisition. Data Processing. An in-house written program was used to extract the correct precursor mass for the OT-MS2 as well as for the IT-MS2 spectra. Centroided peaks were extracted from profile scans by using local peak maxima plus their flanking

points, and peak intensity is determined by integrating between consecutive local minima. For each spectrum, the top 200 most intense raw data peaks were taken into account. In total, 1807 product ion spectra pairs were extracted from the 75 min LC/ MS experiments. Spectra acquired in the Orbitrap are internally calibrated via the Lock Mass functionality.10 We did not perform a recalibration of the MS/MS spectra that were acquired in the linear ion trap as we designed the algorithm such that no processing of the spectra is required before or after the alignment. However, depending on the calibration state of the instrument, a recalibration of the spectra may be necessary. A variety of solutions and algorithms have been presented in literature, mainly describing recalibration approaches for tentatively identified peaks.11–14 Once extracted, the different sets of product ion spectra are aligned. This process is detailed in the next section. Upon processing using the alignment program described above, we had three different sets of MS spectra at hand: (i) a set of lowaccuracy/low-resolution IT-MS2 spectra with superior sensitivity; (ii) a set of high-resolution/high-accuracy OT-MS2 spectra of inferior sensitivity as compared to the IT data; and (iii) a set of aligned hybrid product ion spectra, combining the sensitivity of the IT and the accuracy of the OT mass analyzer. Database Search. All data were searched on a 2-CPU server running Mascot 2.1.04 (Matrix Science, London, U.K.). We allowed a false-positive peptide identification rate of ,1% by merging the results from a target database search and the results of a decoy database search. Since we distrust peptide identifications with a reported MOWSE score lower than a cutoff value of 25 for this type of data, we decided to base our analysis only on peptide hits providing a score above the mentioned threshold after merging of the target/decoy database searches.15,16 The precursor ion tolerance we used for all the searches was set to 5 ppm.

The Program The algorithm is written in C++ and run as .Net assembly using managed extensions. The source code, as well as the binary, is available upon request. The program combines lowaccuracy, high-sensitivity linear ion trap (IT) product ion spectra (from now on termed IT-MS2 spectra) and highaccuracy, low-sensitivity Orbitrap (OT) fragmentation data (called OT-MS2 spectra) to produce hybrid or aligned (AL-MS2 spectra) spectra combining the advantages of the two mass analyzers that are hyphenated in the current version of the LTQ-Orbitrap. Alignment Program Description. The basic idea of our algorithm is a comparison of two fragment ion pair lists: (i) One list contains all the m/z values of fragment ions that were deemed to be part of a complimentary ion pair found in the IT-MS2 spectrum (see below). (ii) The second list comprises all fragment ions that were detected in the IT and the OT mass analyzers. This comparison leads to composite IT/OT product ion spectra, which are used for database searches for protein identification purposes. For the processing, the IT-MS2 spectrum is considered as the master template as it is assumed to be richer in information due to the superior sensitivity of the linear ion trap mass analyzer. In total, four different steps are executed: (I) Step 1 identifies the two product ion spectra that share the same parent ion, as identified in a full MS scan in the OT mass analyzer. As our acquisition method (see below) was designed such that the MS/MS spectra of the same precursor Journal of Proteome Research • Vol. 7, No. 5, 2008 1985

research articles were acquired simultaneously, the scan numbers and the precursor m/z values were used to identify spectra pairs. Although we expect that most of the acquisition methods will follow similar experimental designs, our program allows the definition of spectra pairs based on a variety of parameters. (II) In the second step, a product ion spectrum acquired in the linear ion trap (IT-MS2) is investigated for all possible combinations of complementary fragment ion pairs. By definition, these complementary ion pairs comprise an N-terminal fragment (b ion) and a C-terminal fragment (y ion) that are generated upon cleavage of the same peptide bond. The sum of the neutral mass of these two fragment ions corresponds to the neutral mass of the precursor. As the parent ion m/z values are derived from the Orbitrap mass analyzer, the mass is highly accurate (25). At a fragment ion search tolerance of 0.5 Da, 480 product ion spectra are identified, which is slightly above the number of assigned OTMS2 spectra (421, see below). The plateau for the IT-MS2 spectra is reached at a product ion search tolerance of 0.6, where 586 scans are successfully assigned to peptide sequences. By increasing the daughter ion search tolerance, more spectra could be assigned to peptides. Although we expect to identify more peptides based on the IT-MS2 data because of the superior sensitivity, we expected a concomitant increase in false-positive identifications, despite the fact that for each search tolerance we employed decoy database searches to minimize the number of false-positive spectra identifications. For the Orbitrap product ion spectra, the number of identified spectra remained fairly constant at approximately 420 for any given product ion search tolerance (blue curve in Figure 4A). That number of identified spectra increases slightly at an employed search tolerance of g1 Da. This might be due to deamidation of glutamine and asparagines or the selection of the most intense isotope peak instead of the monoisotopic one. The outcome of this curve was not surprising, since Orbitrap mass analyzed fragments are highly accurate and widening the error tolerance will not allow any additional peptide assignments.

These two curves clearly show the difference between IT and OT-data. The OT-MS2 data provide unambiguous, binary-type peptide assignments, that is, either the spectrum is of sufficient quality to provide a good hit or the quality is lacking such that even increasing the search tolerance does not give more assignments. Thus, low quality is not in terms of m/z-accuracy but in terms of sparseness and intensity of fragment ions. In contrast, the IT-MS2 data are much more ambiguous whereby the number of assigned spectra increases with a widening search tolerance. However, this increase comes at the expense of additional false-positive hits. The number of assigned product ion spectra as a function of the product ion search tolerance for the aligned IT/OT hybrid spectra is hardly sigmoidal (green curve in Figure 4A). For this aligned MS2 data set, we always observe an increased number of assigned spectra in comparison with the number of assigned IT and OT product ion spectra irrespective of the tested product ion search tolerances. A steady increase in the number of assigned spectra can be observed as the search tolerances applied is increased, as expected. For an allowed m/z error of 0.5 Da, 611 spectra are observed with a peptide assignment score higher than 25 after merging target and decoy database searches. Using an error tolerance of 1.1, 721 peptide spectra are identified. Interestingly, a sudden increase at a search tolerance of 1 Da is observed. This sudden increase coincides with the step-like increase of hits by searching the OT- MS2 data allowing a product ion search tolerance of 1 Da. Since the Mascot search engine does not take into account fragment ion accuracy within the set search tolerances for scoring purposes, changing the low-accuracy IT mass measurements to the high-accuracy OT masses is futile. We show the Journal of Proteome Research • Vol. 7, No. 5, 2008 1989

research articles relevance of this statement in Figure 4B: for hybrid spectra where signal amplification (as described in the program outline) and no m/z correction is performed (black curve in Figure 4B), one observes a similar curve as seen for the ITMS2 data (red curve in Figure 4B), when the number of assigned spectra is plotted against the product ion search tolerance. The curves are distinguished by an offset, that is, by the fact that more hits are observed for the amplified spectra. This aligned spectrum achieves a higher score than that of the IT spectra because the relevant ion signals are amplified in comparison with the IT spectra such that Mascot needed less signals to identify the peptide. With increasing product ion search tolerance, the curve coincides with the number of assigned spectra obtained from the aligned data (green curve). This observation confirmed that our approach does not negatively interfere with the Mascot scoring process; on the contrary, it is advantageous. Thus, we perform an intensity amplification of relevant peaks. When carrying out a similar analysis of hybrid data where the m/z values of the relevant peaks are corrected without any signal amplification (blue curve in Figure 4B), the observation that peak intensities are more relevant for Mascot than varying mass accuracies within the search tolerance is confirmed; once the search tolerance corresponds to the experimental error of the fragment ion m/z units (at ∆m/z of approximately 0.7) the curves for the unmodified IT data (red curve) and the m/zcorrected data (blue curve) coincide, that is, no additional spectra are assigned. In summary, we tested the effect on aligning IT-MS2 and OT- MS2 data with and without signal amplification and correction for high-accuracy m/z values. At a product ion search tolerance of 0.7 Da, we could assign 427 OT-MS2 spectra, 626 IT-MS2 spectra, 641 mass corrected MS2 spectra, 663 signalamplified MS2 spectra, and 666 aligned, that is, mass corrected spectra and signal-amplified MS2 spectra. For this analysis, any product ion search tolerance could have been chosen since the difference of original IT spectra versus the aligned spectra stays approximately constant for search tolerances above 0.6 Da. The observed increase is mainly due to the amplification of the intensities of the most significant peaks, as Mascot does not account for differences in the m/z deviation as long as they are within the user-defined search tolerance. Nevertheless, having highly accurate fragment ion m/z values available is invaluable for further data validation (see below). Comparing Peptide Hits. For a more in-depth analysis of the different sets of successfully assigned product ion spectra, we compared the overlap of the identified peptides and proteins, employing the database search results that were performed applying a product ion search tolerance of 0.7 Da. The results are shown in Figure 5A. The majority of the nonoverlapping IT and OT spectra are captured by the alignment, and in addition, the alignment results in an increase in previously unassigned spectra. In total, 408 spectra could be observed in all data sets, which correspond to 95% of the total amount of the 427 Orbitrap spectra being assigned to a peptide. Furthermore, 90% of all spectra assigned either to IT-MS2 or the OT-MS2 data set are also identified in the AL-MS2 data set. In addition, 47 extra spectra could be assigned upon merging the high-accuracy OT data with the high-sensitivity IT data into aligned MS2 spectra. These 47 product ion spectra had previously not scored above the threshold of 25. Only 5 OT-MS2 and 21 IT-MS2 spectra remained unexplained after alignment - however, these as1990

Journal of Proteome Research • Vol. 7, No. 5, 2008

Monigatti and Steen

Figure 5. Peptide and Protein identifications. (A) Venn diagram of all the spectra successfully identified at an ion score cutoff of 25 (false-positive rate of ,1%), using a product ion search tolerance of 0.7. (b) Venn diagram of the corresponding protein hits shows. The numbers in parentheses are the number of single-peptide identifications (Aligned/Orbitrap/Ion Trap).

signments are still accessible as searching all three data sets will maximize the number of assigned spectra. Comparing Protein Hits. An interesting question that arises is whether the increase in assigned spectra after alignment is a result of an inflated number of single-peptide protein identifications or whether spectra were assigned to peptides derived from proteins identified by other product ion spectra. Considering the size of the zebrafish protein database used for this study (>47 700 protein entries), any peptide identifying an already identified protein can be considered a true hit as the chances are well below 0.1% that an already identified proteins is randomly matched by another peptide. The numbers of protein identifications from the same set of Mascot searches are given in Venn diagram depicted in Figure 5B. We extracted a list of in total 566 distinct peptides found and generated a minimal list of proteins. The distinct peptides, which are able to explain all detected proteins, have a Mascot ion score >25, which was applied after merging target and decoy database searches, in order to guarantee a low falsepositive rate (, 1%). The protein grouping step was performed in order to avoid ambiguities due to homologous protein sequences. We took such an approach because we cannot distinguish protein hits arising from the same set of degenerate peptides (peptides that occur in multiple proteins). In case of a tie (proteins identified by exactly the same set of peptides), only one protein is reported. On the basis of the reverse database search, all peptides found with a score above the 1% false-positive rate threshold are supposedly correct (Occam’s razor). A more sophisticated approach using probabilistic measures to eliminate false-positive protein identifications is described by Nesvizhskii and colleagues.20,21 The proteincentric Venn diagram in Figure 5B nicely demonstrates that this increase in assigned spectra upon alignment is indeed not simply an increase in questionable single-peptide protein identifications; this rate is independent of the data type: OT, 6%; AL, 7%; IT, 9% (the absolute numbers are given by the numbers in parentheses in Figure 5B). Instead, the 47 additional identified peptides using the aligned MS2 data set produced only 5 additional single-peptide protein identifications, or in other words, 42 spectra could be assigned to proteins that were already identified and thus correspond to the minimum number of additional true positive peptide assignments (this discussion takes the very conservative view that all single-peptide protein identifications are indeed false positives). Although the discussion as to how to evaluate single-

SALAMI: How To Make the Best out of Hybrid MS/MS Data

research articles

Table 1. Score Distribution of Spectra Identifications for a Product Ion Search Tolerance of m/z 0.7 AL > 25, AL > OT, AL > IT AL > 25, AL > IT > OT IT > 25, IT > AL > OT IT > 25, IT > OT > AL OT > 25, OT > AL > IT OT > 25, OT > IT > AL

115 189 41 10 43 10

peptide protein identifications is beyond the scope of the paper, it is interesting to note that half of our single-peptide protein identifications are in the same molecular weight range as all the other proteins identified in this example study. Interestingly, even though fewer proteins could have been assigned using the aligned spectra set, the total number of matched peptides is higher than the number of total peptide identifications obtained from the IT-MS2 data set; thus, the alignment significantly contributes to confirmation of existing protein assignments by increasing the sequence coverage of the identified proteins. In summary, 101 proteins were identified by the set of aligned product ion spectra as compared to 72 for the OT-MS2 and 103 for the IT-MS2. Similar increases are also observed when the single-peptide protein identifications are not considered: the AL-MS2 data set gives rise to 58 protein identifications as compared to 43 for the OT-MS2 data set and 60 for the IT-MS2 data set. Improving Mascot Scores. The data presented above clearly demonstrates the advantages of using hybrid spectra by evaluating the peptide and protein identifications. They receive a generally a higher overall score than their counterparts measured in the orbitrap or in the linear ion trap. This is demonstrated by the numbers shown in Table 1 (all numbers refer to product ion search tolerances of 0.7 Da): from the 408 product ion spectra identified in all three data sets, only 51 linear ion trap product ion spectra scored higher than the corresponding OT- and AL-MS2 spectrum. This is contrasted by 53 top-scoring OT-MS2 spectra and 304 top-scoring AL-MS2 spectra (6-fold improvement over the IT-MS data). In other words, 3/4 of all aligned product ion spectra are top-scoring and 1/4 is second highest scoring. Only a mere 3% of all aligned spectra give scores below the corresponding IT- or OT-MS2 spectrum. To demonstrate the increase in Mascot scores upon alignment, an example is shown in Figure 6. Figure 6 shows the OT-, IT-, and Al-MS2 spectra of the peptide AALQGLLK derived from database entry IPI00494197, annotated as hydroxyacylcoenzyme A dehydrogenase. The OT-MS2 and IT-MS2 resulted in Mascot scores of 43 and 39, respectively, for a product ion search tolerance of 0.7 Da. The OT-MS2 spectrum matched 21 fragment ions using the 48 most intense ions and the IT-MS2 matched 14 fragments by using the 30 most intense peaks. Upon aligning, the Mascot score was boosted to 60 by matching 13 out of the 19 most intense fragment ions in the spectrum. This significant increase in the ion score is mainly attributed to the amplification of the relevant fragment ions. Benefit of High Mass Accuracy. As briefly discussed above, the observed increase in assigned product ion spectra upon aligning product ion spectra is the primarily the result of the amplification of the significant product ions, that is, those for which supporting evidence are available and not the correction of the fragment ion m/z values to Orbitrap accuracies. This in turn is the direct result of the Mascot algorithm which does

Figure 6. Score improvements. Panels A-C show the OT-MS2 spectrum, the IT-MS2 spectrum, and the aligned spectrum for the peptide with the sequence AALQGLLK (m/z 407.2631, 2+). In all spectra shown, the arrowhead marks the precursor m/z value and the asterisk denotes the water loss peak.

not account for different product ion mass accuracies as long as they are within the user-defined search tolerance. Nevertheless, the corrected, that is, highly accurate fragment ion m/z values are very useful for the validation of peptide identifications, thereby minimizing false peptide assignments. Two examples of peptide assignments are given in Figure 7 where the high mass accuracy of the product fragment ions leads to a different identification than observed with the low-accuracy IT data. Figure 7A-C shows the OT-, IT-, and AL-MS2 spectra of a peptide identified by Mascot as EALELLK (exp. m/z, 408.2468; theoretical, 408.247), derived from enolase 1 (IPI00483215: MW, 47.3 kDa) by the assignment of OT - and AL-MS2 spectra. However, when using the IT spectrum, the peptide assigned is ELAELLK, from the protein IPI00504081 (MW, 32.0 kDa), an unknown protein similar to transmembrane protein 41A. No other peptide matches to this protein. The aligned spectrum (Figure 7C) received an ion score of 35 in contrast to the ITand OT-MS2 spectra which scored only 29 and 32, respectively. This increase was again achieved by an amplification of ‘confirmed’ fragment ions and the assignment of additional fragment ions with an error 10 ppm. As these examples show, even with high-accuracy precursors as provided by FT-ICR instruments and the Orbitrap, similar 1992

Journal of Proteome Research • Vol. 7, No. 5, 2008

peptides can easily be assigned to the wrong spectrum with high confidence scores if the mass accuracy of the product ions is not sufficient. The hybrid spectra produced by our algorithm overcomes this limitation since it corrects the low-accuracy ion trap mass values to highly accurate orbitrap m/z values while maintaining the sensitivity advantage of the linear ion traps.

Conclusions and Perspectives We present an algorithm for aligning high-accuracy/highresolution Orbitrap and high sensitivity linear ion trap product ion spectra from the same precursor, thereby combining the strengths of two different mass analyzers that are operated simultaneously in the current version of the LTQ-Orbitrap. The algorithm assesses the likelihood of an observed signal in the product ion spectrum based on the presence of complimentary fragment ions and the simultaneous presence in the OT- and IT-MS2 spectra. Apart from correcting the observed IT m/zvalues to the highly accurate OT m/z-value, the more relevant fragment ions are emphasized by amplifying their signal intensities. The program is very fast and processes raw data with ease. The aligned IT-/OT-MS2 data provide superior product ion spectra that results in a generally increased Mascot score for the aligned spectra as compared to the pure IT- and/or OTMS2 data. This increase in Mascot scores results in more

research articles

SALAMI: How To Make the Best out of Hybrid MS/MS Data peptides identified with a significant score (defined as >25). Interestingly, this increase in assigned peptide identifications results in a concomitant increase in validations of proteins formerly identified based on single peptides and improved sequence coverage for the identified proteins; the number of questionable single-peptide protein identifications is not increased. The latter is especially important for protein characterization studies. The fact that the majority of the additionally assigned peptides are derived from proteins already identified by other MS2 spectra in the data set validates our approach. This also provides credibility to additional protein identifications based on single peptides when searching the aligned hybrid data set. More importantly, the high mass accuracy of the product ions minimizes the likelihood of false-positive peptide assignments. Currently available sequence database search algorithms are unable to take advantage of the information present in our hybrid data. Current algorithms treat all fragment ions equally as long as they are within the user-defined mass tolerance, that is, the well-defined different error tolerances of the individual fragment ions are ignored. By amplifying the intensity of aligned peaks, our algorithm addresses this limitation in this first implementation for users of the Mascot search algorithm. However, a much more elegant way to integrate and take advantage of this additional data would be a search engine designed to steadily increase the search tolerance until enough ions are acquired to sufficiently evaluate the spectrum. Until such a search engine has been realized in practice, we utilize the expected m/z accuracies of the observed fragment ions for validation purposes. This validation process is greatly facilitated by this additional information. As the Orbitrap is increasingly the proteomics’ instrument of choice, our algorithm would be of use to the community. The concept presented here has been applied to LTQOrbitrap data, but is also easily applicable to fragment ion data derived from related precursors. This includes complimentary ECD/ETD and CAD data from the same precursors, as well as the alignment of MS2 and MS3 data from phosphopeptides and the corresponding first generation fragment ions that showed a neutral loss of a phosphoric acid molecule. Abbreviations: IT-MS2, ion trap product ion spectrum; OTMS2, orbitrap product ion spectrum; AL-MS2, aligned product ion spectrum.

Acknowledgment. We thank Joseph Barillari, Shao-En Ong and Thomas Patterson for sample preparation, data acquisition and .raw-file handling. We thank Judith J. Steen for correcting the manuscript and her helpful comments. We also thank Bogdan Budnik and Kenneth Parker for ideas and fruitful discussions. Flavio Monigatti is supported by a fellowship from the Swiss National Science Foundation. References (1) James, P.; Quadroni, M.; Carafoli, E.; Gonnet, G. Protein identification by mass profile fingerprinting. Biochem. Biophys. Res. Commun. 1993, 195 (1), 58–64.

(2) Mann, M.; Hojrup, P.; Roepstorff, P. Use of mass spectrometric molecular weight information to identify proteins in sequence databases. Biol. Mass Spectrom. 1993, 22 (6), 338–345. (3) Pappin, D. J. C.; Højrup, P.; Bleasby, A. J. Rapid identification of proteins by peptide-mass fingerprinting. Curr. Biol. 1993, 3 (6), 327–332. (4) Eng, J. K.; McCormack, A. L.; Yates, J. R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 1994, 5 (11), 976– 989. (5) Mann, M.; Wilm, M. Error-tolerant identification of peptides in sequence databases by peptide sequence tags. Anal. Chem. 1994, 66 (24), 4390–4399. (6) Morris, H. R.; Paxton, T.; Dell, A.; Langhorne, J.; Berg, M.; Bordoli, R. S.; Hoyes, J.; Bateman, R. H. High sensitivity collisionallyactivated decomposition tandem mass spectrometry on a novel quadrupole/orthogonal-acceleration time-of- flight mass spectrometer. Rapid Commun. Mass Spectrom. 1996, 10 (8), 889–896. (7) Shevchenko, A.; Chernuschevich, I.; Ens, W.; Standing, K. G.; Thomson, B.; Wilm, M.; Mann, M. Rapid ′de novo′ peptide sequencing by a combination of nanoelctrospray, isotopic labeling and a quadrupole/time-of-flight mass spectrometer. Rapid Commun. Mass Spectrom. 1997, 11, 1015–1024. (8) Aebersold, R.; Mann, M. Mass spectrometry-based proteomics. Nature 2003, 422 (6928), 198–207. (9) Rappsilber, J.; Ishihama, Y.; Mann, M. Stop and go extraction tips for matrix-assisted laser desorption/ionization, nanoelectrospray, and LC/MS sample pretreatment in proteomics. Anal. Chem. 2003, 75 (3), 663–70. (10) Olsen, J. V.; de Godoy, L. M.; Li, G.; Macek, B.; Mortensen, P.; Pesch, R.; Makarov, A.; Lange, O.; Horning, S.; Mann, M. Parts per million mass accuracy on an Orbitrap mass spectrometer via lock mass injection into a C-trap. Mol. Cell. Proteomics 2005, 4 (12), 2010– 21. (11) Bern, M.; Goldberg, D. De novo analysis of peptide tandem mass spectra by spectral graph partitioning. J. Comput. Biol. 2006, 13 (2), 364–378. (12) Bo¨cker, S.; Ma¨kinen, V. Combinatorial approaches for mass spectra recalibration. IEEE/ACM Trans. Comput. Biol. Bioinf. 2008, 5 (1), 91–100. (13) Gobom, J.; Mueller, M.; Egelhofer, V.; Theiss, D.; Lehrach, H.; Nordhoff, E. A calibration method that simplifies and improves accurate determination of peptide molecular masses by MALDITOF MS. Anal. Chem. 2002, 74 (15), 3915–3923. (14) Ma, B.; Zhang, K.; Hendrie, C.; Liang, C.; Li, M.; Doherty-Kirby, A.; Lajoie, G. PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom. 2003, 17 (20), 2337–2342. (15) Moore, R. E.; Young, M. K.; Lee, T. D. Qscore: an algorithm for evaluating SEQUEST database search results. J. Am. Soc. Mass Spectrom. 2002, 13 (4), 378–386. (16) Peng, J.; Elias, J. E.; Thoreen, C. C.; Licklider, L. J.; Gygi, S. P. Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. J. Proteome Res. 2003, 2 (1), 43–50. (17) Mann, M.; Meng, C. K.; Fenn, J. B. Parent Mass Information From Sequences of Peaks of Multiply Charged Ions. Proceeedings of the 36th ASMS Conference on Mass Spectrometry and Allied Topics, San Francisco, CA, June 5-10, 1988; pp 1207-1208. (18) Foster, L. J.; de Hoog, C. L.; Zhang, Y.; Zhang, Y.; Xie, X.; Mootha, V. K.; Mann, M. A mammalian organelle map by protein correlation profiling. Cell 2006, 125 (1), 187–199. (19) Elias, J. E.; Gygi, S. P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 2007, 4 (3), 207–214. (20) Nesvizhskii, A. I.; Aebersold, R. Interpretation of shotgun proteomic data: the protein inference problem. Mol. Cell. Proteomics 2005, 4 (10), 1419–1440. (21) Nesvizhskii, A. I.; Keller, A.; Kolker, E.; Aebersold, R. A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 2003, 75 (17), 4646–4658.

PR7006895

Journal of Proteome Research • Vol. 7, No. 5, 2008 1993