Spectrum-based Method to Generate Good Decoy Libraries for

Apr 8, 2013 - Notably, it is efficient in time and memory usage for constructing decoy libraries. A software tool called Precursor-Swap-Decoy-Generati...
1 downloads 3 Views 590KB Size
Technical Note pubs.acs.org/jpr

Spectrum-based Method to Generate Good Decoy Libraries for Spectral Library Searching in Peptide Identifications Chia-Ying Cheng,† Chia-Feng Tsai,‡,§ Yu-Ju Chen,‡,§ Ting-Yi Sung,*,† and Wen-Lian Hsu† †

Institute of Information Science, Academia Sinica, Taipei 115, Taiwan Institute of Chemistry, Academia Sinica, Taipei 115, Taiwan § Department of Chemistry, National Taiwan University, Taipei 106, Taiwan ‡

S Supporting Information *

ABSTRACT: As spectral library searching has received increasing attention for peptide identification, constructing good decoy spectra from the target spectra is the key to correctly estimating the false discovery rate in searching against the concatenated target-decoy spectral library. Several methods have been proposed to construct decoy spectral libraries. Most of them construct decoy peptide sequences and then generate theoretical spectra accordingly. In this paper, we propose a method, called precursor-swap, which directly constructs decoy spectral libraries directly at the “spectrum level” without generating decoy peptide sequences by swapping the precursors of two spectra selected according to a very simple rule. Our spectrum-based method does not require additional efforts to deal with ion types (e.g., a, b or c ions), fragment mechanism (e.g., CID, or ETD), or unannotated peaks, but preserves many spectral properties. The precursor-swap method is evaluated on different spectral libraries and the results of obtained decoy ratios show that it is comparable to other methods. Notably, it is efficient in time and memory usage for constructing decoy libraries. A software tool called Precursor-Swap-Decoy-Generation (PSDG) is publicly available for download at http://ms.iis.sinica.edu.tw/PSDG/. KEYWORDS: spectral libraries, spectral library searching, target-decoy search, false discovery rate, peptide identification



INTRODUCTION As more high-quality mass spectral data sets acquired from proteomics experiments on different species and different instruments are becoming publicly available, spectral library searching is shown to be an alternative and complementary approach to sequence database searching for peptide identification.1 Though spectral library searching can only identify peptides which have already been discovered from MS/ MS spectra by sequence database searching, it outperforms the sequence database searching in searching spectra of low-quality or of high-charged precursors.2 Moreover, it is efficient since the search space is much reduced. An inevitable and critical issue in spectral library searching is to correctly estimate the false discovery rate (FDR), which is defined as the fraction of false-positive identifications in the reported identification results, due to some noisy peaks or unanticipated posttranslational modifications in large-scale experiments.3−5 Target-decoy search approach has been widely used to estimate the FDR in both sequence database searching and spectral library searching.2,4 Two basic assumptions are considered necessary in the construction of decoy databases or spectral libraries.3,6 First, target and decoy databases or libraries do not overlap. Second, false-positive identifications from the target and the decoy databases/libraries are equally likely. In sequence database searching, many methods for constructing decoy databases have been proposed to meet these two assumptions, for example, reversing or shuffling peptide © 2013 American Chemical Society

sequences. But none captures all significant features of falsepositive identifications and outperforms others in all cases.4 These methods start at the “sequence level”, that is, generating decoy sequences, and then proceed to the “spectrum level” by generating only theoretical fragment ions (e.g., b, y ions in CID spectra) with equal intensities to construct the decoy spectra. However, in spectral library searching, constructing decoy spectra is not so intuitive since not only theoretical fragment ions but also some unknown fragment ions should be considered in the decoy spectra to resemble observed spectra, and more importantly, their peak intensities should not be equal. Several methods have been proposed to construct decoy spectral libraries. Some suggested using spectra from another organism as the decoy spectra, and this approach seems convenient. However, since identical and homologous peptides are common even between phylogenetically distant species, the target and decoy libraries may have overlaps. Furthermore, the size of different species’ spectral libraries may be quite different. Therefore, this approach has been shown inappropriate.3 Later, SpectraST’s shuffle-and-reposition method3 and DeLiberator7 were proposed. Both methods first generate the decoy peptide sequences at the “sequence level” and then generate their spectra at the “spectra level”. To be specific, Received: November 3, 2012 Published: April 8, 2013 2305

dx.doi.org/10.1021/pr301039b | J. Proteome Res. 2013, 12, 2305−2310

Journal of Proteome Research

Technical Note

SpectraST’s shuffle-and-reposition method first shuffles each target (identified) peptide sequence as its decoy peptide sequence and then repositions all of the annotated fragment ions according to the decoy sequence, that is, keeping the types and intensities of all fragment ions unchanged but changing their m/z accordingly. For the unannotated peaks in the target spectra, the shuffle-and-reposition keeps them unchanged in the decoy spectra. However, the decoy spectra constructed by the shuffle-and-reposition method are not necessarily dissimilar to the target spectra7 as different peptide sequences do not guarantee their spectra to be dissimilar.7,8 Furthermore, leaving the unannotated peaks unchanged contributes to the spectral similarity between a pair of target and decoy spectra, especially when there are more unannotated peaks in the target spectrum. Therefore, it would be more difficult for shuffle-and-reposition to confidently identify target peptides, and false-negative identifications would be unavoidable. DeLiberator proposed the following two procedures to tackle the above limitations observed in shuffle-and-reposition. First, if the spectral similarity of a pair of target and decoy spectra is higher than 0.5, DeLiberator iteratively reshuffles the decoy peptide sequence until the spectral similarity is lower than 0.5 or cannot be further reduced. Second, DeLiberator repositions each unannotated peak by calculating the probability that it is present at the given precursor m/z and charge independent of peptide sequence. In addition to the above methods that generate decoy spectra from “sequence level” to “spectrum level”, an intuitive peakshift (m/z-shift) method was proposed to construct decoy spectra directly at the spectrum level.3 This method shifts each peak of a target spectrum by a fixed m/z and maintains the same precursor m/z to generate the decoy spectrum. In this paper, we propose a method, called precursor-swap, which generates the decoy spectra directly from the “spectrum level”. After evaluating different decoy-generation methods, including our precursor-swap, SpectraST’s shuffle-and-reposition, DeLiberator and the peak-shift, precursor-swap method is shown to be comparable to other methods, and more efficient in time and memory. Besides, the FDR variance based on decoy fractions of different decoy-generation methods in a human data set was also tested, and using the decoy library generated by the precursor-swap method could obtain a calculated FDR close to the reference FDR. Finally, we provide a software tool for automatically generating decoy libraries, and the tool is publicly available for download.



algorithm allows generating more decoy libraries by running the algorithm multiple times on a given target library. This mechanism is useful especially to generate a much larger targetdecoy library when searching a small target library and requiring a large-sized decoy library. In addition, we integrate the frequently used peak-shift operation into precursor-swap and name this method precursor-swap-peak-shift to compare it with the precursorswap and other methods. The precursor-swap-peak-shift method selects a pair of spectra A and B as the above method. Then the decoy spectrum of A is obtained from the spectrum B with the precursor swapped to M m/z and shifting each peak in the spectrum by −Δ. Similarly, the decoy spectrum of B is obtained from shifting each peak in A by +Δ and setting the precursor to be M′ m/z. If the number of spectra in the target library is odd, we first swap three spectra of the same charge with pairwise precursor m/z difference at least d and then process the remaining spectra. Spectral Searching in Target-Decoy Libraries

To evaluate different decoy-generation methods, we followed the same protocol used in Lam et al.,3 that is, searching a human data set from PeptideAtlas against E. coli, yeast, chicken libraries, respectively, using data from the same instrument type concatenated with their decoy libraries generated by different methods. This unusual searching assures that there is no correct identification in both the target library and the decoy library.3,7 The decoy fraction, that is, the fraction of incorrect identifications obtained from the decoy library, is defined as the number of identifications from the decoy library divided by the number of all identifications and is used as the performance evaluation measure. When the sizes of both the target and decoy libraries are the same and the searching has no bias in both libraries, the decoy fractions should be ideally very close to 0.5 for good decoy-generation methods. Before spectral matching, we first group peaks into 1 m/zbins and take square root of each peak intensity to better stabilize the peak intensity variation and improve spectral matching accuracy.9 In the spectral library searching, only spectra with precursors within the precursor m/z tolerance (default ±3 m/z, same as SpectraST’s default setting) and the same charge were considered as search candidates. To determine the spectral similarity measure, we compared the dot product and SpectraST’s F-value in Supplementary Figure 1, Supporting Information, which shows these two measures are quite similar. For the sake of efficiency and without loss of generality, we adopted the dot-product similarity measure. Furthermore, we tested the effects of including only top-N peaks, that is, N most intensive peaks, in each spectrum for calculating the spectral similarity. The results of including top50 peaks and all peaks in each spectrum are shown in Supplementary Figure 2, Supporting Information. Although using top-50 peaks achieved slightly better decoy fractions, that is, closer to 0.5, we still used all of the peaks in each spectrum to calculate the spectral similarity since in most cases of applying spectral library search, all peaks, instead of top-50 peaks, are suggested to use in order to avoid losing sensitivity and accuracy.1,10

MATERIALS AND METHODS

Constructing Decoy Spectral Libraries

Our proposed method, called precursor-swap, constructs decoy spectral libraries directly at spectrum level. Predetermining a swap distance d, precursor-swap selects two spectra for precursor swapping as follows. For a spectrum A with the precursor M m/z, we select a spectrum B with the same charge and precursor M′ = M + Δ m/z, where Δ ≥ d. We simply swap the precursors of these two spectra to obtain their respective decoy spectra. In other words, the decoy spectrum of A is the spectrum B with the precursor simply setting to be M m/z, and similarly we determine the decoy spectrum of B as spectrum A with the precursor M′ m/z. To add randomness to our algorithm, we generate a random number r when considering swapping two spectra. If r mod 2 is equal to 0, the algorithm swaps the chosen two spectra. The randomness of the 2306

dx.doi.org/10.1021/pr301039b | J. Proteome Res. 2013, 12, 2305−2310

Journal of Proteome Research



Technical Note

DATA SETS We downloaded seven spectral libraries of different species, including human, E. coli, yeast, chicken from the National Institute of Standards and Technology (NIST) Libraries of Peptide Tandem Mass Spectra available at http://peptide.nist. gov/ (dated May 24, 2011), and seven human data sets from PeptideAtlas (http://www.peptideatlas.org/repository/), as listed in Table 1. The seven libraries from NIST were also

their constructed decoy libraries from the ion trap and Q-TOF instruments, respectively.



SOFTWARE DEVELOPMENT We implemented in C# a software tool, called Precursor-SwapDecoy-Generator (PSDG), which includes the precursor-swap, precursor-swap-peak-shift and peak-shift methods as users’ choices for decoy library construction. PSDG accepts input of .msp and .sptxt formats and generates output of .msp and .splib formats. For a 2.03GB spectral library of 310 688 spectra downloaded from NIST libraries, PSDG needs about 30 M memory and 8.0 min to perform the precursor-swap for generating its decoy library by a PC with Intel(R) Xeon(R) CPU (E5420 @2.50 GHz) and 16.0GB of RAM, showing PSDB is quite efficient in time and memory. The constructed decoy library can be integrated into the Trans-Proteomic Pipeline (TPP) or other spectral library searching tools for FDR estimation of peptide identification results of spectral library searching. The tool PSDG is available for download at http://ms.iis.sinica.edu.tw/PSDG/.

Table 1. Summary of the Libraries and Data Sets Downloaded from the NIST Libraries and PeptideAtlas, Respectively library

instrument

species

number of spectra

Human Ion Trap H. sapiens 310 688 Human Ion Trap H. sapiens 29 109 Human Q-TOF H. sapiens 15 560 E.coli Ion Trap E. coli 49 343 Chicken Ion Trap Chicken 3125 Yeast Ion Trap E. coli 92 609 sample title organism instrument HUPO34_b1-serum Breakfast_qtof08 Caex_qtof Cat_ex_qtof HUPO12_run32 HUPO12_run32 HUPO12_run333

Human Human Human Human Human Human Human

LCQ DECA XP Micromass/Q-TOF Micromass/Q-TOF Micromass/Q-TOF Micromass/Q-TOF Micromass/Q-TOF Micromass/Q-TOF

exp./sample

561 40 78 53 45 111 number of spectra



432 662 5855 36 740 24 966 54 310 46 117 40 338

RESULTS

Our Two Precursor-swap Methods Achieve Nonoverlapping Target and Decoy Search Spaces

The assumption that the target and decoy search spaces do not overlap is essential for constructing good decoy libraries since otherwise, the FDR calculation will be incorrect. The precursorswap method, at first glance, seems to have overlapping target and decoy search spaces since two identical spectra are in the target-decoy library, that is, one is the original spectrum and the other is a duplicate as the decoy with the precursor swapped. Since these two spectra have different precursor m/z, both of them cannot be simultaneously candidates in a target-decoy library search when the swapping distance d (here we set to be 8 m/z) is greater than the precursor tolerance Δ (usually ±3 m/z). The precursor-swap-peak-shift method involves shifting peaks in each spectrum, and it is very unlikely that the shifted spectrum will coincide with an observed spectrum.3 Therefore, this method also fulfills the assumption.

used by SpectraST and DeLiberator.3,7 Since the human data sets from PeptideAtlas did not provide peptide identification results, we used the human data sets from NIST instead, to filter out overlapping peptides in the nonhuman libraries. Furthermore, we filtered out peptides with length shorter than 7 from the nonhuman libraries since these short peptides will likely generate more similar decoy spectra.3,6 Then the resulting four nonhuman libraries were used as target libraries and also used to construct decoy libraries based on different decoygeneration methods. The HUPO34 human data set and the concatenation of the remaining six data sets from PeptideAtlas were used to search against the above nonhuman target libraries concatenated with

Figure 1. Decoy fractions of searching the human ion-trap data set against the E. coli target-decoy library. The precursor-swap method performed slightly better than the others although its decoy fractions were still below 0.5. Note that except for the peak-shift method, the other methods would generate different target-decoy libraries each time. Thus, we ran these methods three times each and show the average and standard deviation of their respective performances in all figures. 2307

dx.doi.org/10.1021/pr301039b | J. Proteome Res. 2013, 12, 2305−2310

Journal of Proteome Research

Technical Note

Precursor-swap Achieves Less Bias in Target-decoy Searching

level” and the “spectrum level” strategies. Figure 3 illustrates the similarity distributions of all target-decoy pairs of the E. coli

The assumption that false-positive identifications are equally likely to occur in the target and decoy library searching is also a requirement of correct estimation of FDR in a target-decoy searching since otherwise, FDR estimation would become overestimated or underestimated. Figure 1 illustrates the results of searching the human iontrap data set against the E. coli target-decoy library, where the decoy fractions from rank 1 to rank 10 hits in the identification candidate lists from different decoy-generation methods are shown. Observing from Figure 1, the simple precursor-swap method performed well, however, the decoy fractions of all compared methods still deviated from 0.5. It is probably because even though peptides identical to those in the human library have been removed from the nonhuman libraries, some peptides in the target (nonhuman) library may be still homologous to human peptides.3 We furthered evaluated the stability of precursor-swap by running it 30 times on the E. coli library to generate 30 different decoy libraries. After searching the human data set against the 30 E. coli target-decoy libraries, the variations of decoy fractions of rank 1 to rank 10 hits in the generated decoy libraries were about 0.001−0.003 as shown in Supplementary Figure 3, Supporting Information, showing that the precursor-swap is quite stable. Furthermore, we illustrate the average decoy fractions of rank 1 hits of searching the human data sets against the different libraries of ion-trap and Q-TOF data in Figure 2, which shows

Figure 3. Distribution of spectral similarity between target and decoy spectra of the E. coli library. The strategies based on “spectrum level”, that is, the first three methods in the figure, produce more dissimilar decoy spectra than the strategies based on “from sequence level to spectrum level”, that is, the last two methods.

library generated by different methods. The figure shows that the methods based on “spectrum level” have much lower spectral similarities between target and decoy spectra than the methods based on “from sequence level to spectrum level”, including the shuffle-and-reposition method and DeLiberator. The precursor-swap method achieves the smallest spectral similarity between target and decoy spectra. We also noticed that in the shuffle-and-reposition method and DeLiberator, even though the peptide sequence similarity between targets and decoys is guaranteed to be smaller than 0.5, still about 13% generated decoy spectra are highly similar (with similarity >0.7) to their target spectra. The decoy-construction methods based on “from sequence level to spectrum level” need to consider annotations of peaks and determine how to reposition both the annotated peaks and the unannotated peaks. There are two issues needed to be considered. First, according to the fragmentation technique, different fragment ions (e.g., mostly b, y ions for CID and c, z ions for ETD),11,12 as well as their neural losses, are annotated in most cases; and other fragment types, including some types currently unknown to us, are ignored even though they may really come from the peptide.7 Second, if unannotated peaks come from electronic noise or coeluting impurity,3,13 fixing their m/z to construct decoy spectra is quite reasonable. But if not, similar to annotated fragment ions mentioned in the above issue, their m/z may need to be repositioned. For example, DeLiberator repositions the m/z of unannotated peaks by calculating the peaks’ m/z, intensities and their precursor patterns in all of the target spectra. Instead, the precursor-swap method and the other two decoy-generation methods based on “spectrum level” totally avoid the above two issues since they do not consider fragment types at all. Decoy libraries constructed by the three methods based on spectrum level preserve the following spectral properties of the target spectral library: (1) same precursor m/z and charge distributions, which make the searching spaces in decoy library and target library equal; (2) same distributions of overall peak intensities and the number of peaks, which make decoy spectra more realistic. Notably, the precursor-swap method, unlike precursor-swap-peak-shift and peak-shift methods, can generate decoy libraries that preserve the same distributions of unannotated peaks’ m/z and intensities, which avoid the process of analyzing the properties of unannotated peaks.

Figure 2. Decoy fractions of rank 1 hits of searching against different target-decoy libraries with decoy libraries constructed by different methods.

that the precursor-swap is comparable to other methods. The decoy fractions obtained from searching different libraries are reported in Supplementary Figures 4−6, Supporting Information. Constructing Decoy Spectra Directly at the Spectrum Level Improves the Methods Based on “From Sequence Level to Spectrum Level”

The precursor-swap, precursor-swap-peak-shift and peak-shift methods generate decoy libraries directly at the spectrum level, without the step of generating decoy peptide sequences, where different peptide sequences do not necessarily mean very different spectra.8 Since similarity between target and decoy spectra may affect the accuracy of calculated FDR, we evaluated the similarities between target spectra and their decoy spectra constructed by using the “from sequence level to spectrum 2308

dx.doi.org/10.1021/pr301039b | J. Proteome Res. 2013, 12, 2305−2310

Journal of Proteome Research

Technical Note

its decoy library by using five different decoy-generation methods. We used the first human spectral library listed in Table 1, which contains 310 688 spectra and all of the spectra have peptide annotation, as a query data set to search against the above target-decoy library and calculated three kinds of FDR. First, since the query data set has peptide annotation, we calculated the reference FDR (rFDR) introduced in Ahrne et al.,7 given by incorrect target SSM/total target SSM, where total target SSM = correct target SSM + incorrect target SSM. Second, we calculated FDR based on decoy fraction, called Decoy-fraction-based FDR (dFDR), given by (decoy SSM/ decoy fraction)/(total target SSM + decoy SSM). Third, we calculated FDR based on the theoretical decoy fraction of 0.5, called Theoretical-decoy-fraction-based FDR (tFDR), given by (decoy SSM/0.5)/(total target SSM + decoy SSM). We then used the difference between rFDR and dFDR to investigate how decoy libraries affect the estimated FDR. The decoy fractions of searching the human data set against the five different target-decoy libraries are shown in Supplementary Figure 9, Supporting Information. As Lam et al.3 and Elias and Gygi5 suggested, lower rank results could more correctly reflect decoy fraction, here we used the average decoy fractions from rank 8 to rank 10 hits to approximately estimate “real” decoy fractions for different decoy libraries. Supplementary Figure 9, Supporting Information, shows that all methods, except peak-shift, have close decoy fractions. We searched each spectrum in the query data set against the targetdecoy library and obtained rank 1 hit and the spectral similarity between the query spectrum and the hit. Considering all rank 1 hits matched to the target library, we determined whether these rank 1 hits were correct or not according to the peptide annotation. Then we could determine the threshold of dotproduct spectral similarity such that rank 1 hits having spectral similarity equal to or above the threshold are considered as “target SSM” and the given rFDR is satisfied. This similarity threshold was also applied to rank 1 hits matched to the decoy library to obtain “decoy SSM”. Moreover, we could calculate dFDR and tFDR. Comparisons of both dFDR and tFDR obtained from different decoy-generation methods with respect to rFDR are shown in Figure 4 and Supplementary Figure 10, Supporting Information, respectively. It can be observed from Figure 4 that the precursor-swap slightly overestimated dFDR when rFDR ≥ 0.04 and shuffle-and-reposition and DeLiberator

Finally, since these three methods based on spectrum level do not require peak annotation information, they are directly applicable to PTM spectra or HCD/ETD spectra.



DISCUSSION

Selection of Swap Distance Depends on the Precursor m/z Tolerance

In the precursor-swap method, when the swap distance d, d = 8 by default, is set to be bigger than the precursor m/z tolerance (usually ±3) in spectral library searching, spectrum A and its decoy spectrum B will not be simultaneously selected in one spectral search. We tested the effect of d chosen to be larger than 8 in Supplementary Figure 7, Supporting Information. Since there was only a slight difference among different d’s, we adopted d = 8 in all our experiments. Swapping Spectra with the Same Charge is Necessary in the Precursor-swap

In the precursor-swap method, only two spectra having same charge can be considered as each other’s decoy spectrum since otherwise, their peptide mass and spectra may be very different. We analyzed the charge effects in the Supplementary Figure 8, Supporting Information, which shows that the peptides of same charge have more similar fragment patterns than the ones of different charges. A Limitation of the Precursor-swap

Since the precursor-swap method needs to set the swapping distance higher than the precursor m/z tolerance, it is not suitable for “blind” spectral library searching,14 in which there is no precursor tolerance for searching unanticipated posttranslational modification. The Estimated Decoy Fractions Affect the Calculation of FDR

In the peptide identification based on mass spectral data, FDR has become the most accepted statistical confidence measure for the peptide-sequence matching (PSM) or spectrumspectrum matching (SSM) in large-scale experiments.4 We investigated how the estimated decoy fractions of different decoy-generation methods affect FDR, which is given by ((1/ decoy fraction) × the number of decoy SSMs)/the total number of SSMs.6 It is hard to know beforehand the exact decoy fraction of a decoy-generation method on a given library. As shown in Figure 2, a decoy-generation method may generate very different decoy fractions on different spectral libraries. The variation of estimated decoy fractions affects the calculation of FDR. For example, when searching the HUPO ion-trap data set against a human target-decoy library, Lam et al.3 first searched a HUPO data set against the E. coli target-decoy library and used the decoy ratios of lower-rank hits to determine the decoy fraction of the shuffle-and-reposition used to calculate FDR. Then, the spectral library searching was applied to search again the HUPO data set against the human target-decoy library and calculated the FDR.3 However, the decoy fraction obtained on the human target-decoy library may not be close to that obtained on the E. coli target-decoy library. As shown in Figure 2, the shuffle-and-reposition obtained decoy fractions varying from 0.448 to 0.503 on different libraries, which will slightly affect the calculated FDR. To further investigate how different decoy libraries affect calculated FDR and the percentage of correct identification, we used the second human spectral library listed in Table 1 containing 29 109 spectra as the target library and constructed

Figure 4. Comparison between rFDR and dFDR of searching the human data set with peptide annotation against the human targetdecoy library. dFDR obtained by the decoy library constructed by the precursor-swap was quite close to rFDR. 2309

dx.doi.org/10.1021/pr301039b | J. Proteome Res. 2013, 12, 2305−2310

Journal of Proteome Research



ACKNOWLEDGMENTS This work was supported in part by the National Science Council of Taiwan under grants NSC101-2221-E-001-022 and NSC100-2628-M-001-003-MY4.

underestimated dFDR about 2%. dFDR obtained by the precursor-swap was slightly closer to rFDR than that obtained by the other methods. We further examined the correct identification (ID) ratio, which is given by 1 − rFDR = correct target SSM/total target SSM, of search results from different target-decoy libraries. The correct ID ratios obtained by different decoy-generation methods given dFDR = 0.05 and tFDR = 0.05 are shown in Supplementary Figures 11 and 12, Supporting Information, respectively. For an ideal target-decoy library, having decoy ratio equal to 0.5, the correct ID ratio will be 0.95 when dFDR is 0.05. The precursor-swap is shown to have a correct ID ratio relatively closer to 0.95 when dFDR is 0.05 or tFDR is 0.05, as decoy fraction obtained by the precursor-swap was relatively closer to 0.5. The peak-shift method has the worst correct ID ratio. Furthermore, we show in Supplementary Figure 13, Supporting Information the numbers of total target SSM and correct target SSM obtained by different decoy-generation methods. The precursor-swap has relatively fewer correct target SSM though having relatively higher correct ID ratio. The peakshift has the highest number of correct target SSM, though having the lowest correct ID ratio.



CONCLUSION Decoy-generation methods usually adopt two different strategies, that is, constructing decoy spectra “from sequence level to spectrum level” or directly at the spectrum level. Methods based on the latter strategy, including the precursorswap and the precursor-swap-peak-shift, do not require peak annotation information and are easy to implement. Furthermore, the constructed decoy library can preserve some spectral properties of the target library. Our simple precursor-swap method is shown to generate good decoy spectra with decoy fractions close to the ideal 0.5. The FDR calculated on the basis of decoy fraction obtained by the precursor-swap is also close to the reference FDR. In the future, the precursor-swap may be considered to apply to sequence database searching for constructing the decoy spectra from a sequence database. This application requires more studies on the FDR calculation at the protein level. ASSOCIATED CONTENT

S Supporting Information *

Comparison of usig dot product and SpectraST’s F-value as spectral similarity measures. Comparison of using top 50 peaks and all peaks in each library spectrum in spectral library searching. Validation of various shifting distances in the precursor-shifted search. Decoy fractions of using precursorshifted search to search the human data set against the nonhuman target-decoy library. The charge effect of the precursor-swap method. This material is available free of charge via the Internet at http://pubs.acs.org.



REFERENCES

(1) Lam, H.; Deutsch, E. W.; Eddes, J. S.; Eng, J. K.; Stein, S. E.; Aebersold, R. Building consensus spectral libraries for peptide identification in proteomics. Nat. Methods 2008, 5, 873−875. (2) Zhang, X.; Li, Y.; Shao, W.; Lam, H. Understanding the improved sensitivity of spectral library searching over sequence database searching in proteomics data analysis. Proteomics 2011, 11, 1075− 1085. (3) Lam, H.; Deutsch, E. W.; Aebersold, R. Artificial decoy spectral libraries for false discovery rate estimation in spectral library searching in proteomics. J. Proteome Res. 2010, 9, 605−610. (4) Nesvizhskii, A. I. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. J. Proteomics 2010, 73, 2092−2123. (5) Elias, J. E.; Gygi, S. P. Target-decoy search strategy for mass spectrometry-based proteomics. Methods Mol. Biol. 2010, 604, 55−71. (6) Elias, J. E.; Gygi, S. P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 2007, 4, 207−214. (7) Ahrne, E.; Ohta, Y.; Nikitin, F.; Scherl, A.; Lisacek, F.; Muller, M. An improved method for the construction of decoy peptide MS/MS spectra suitable for the accurate estimation of false discovery rates. Proteomics 2011, 11, 4085−4095. (8) Sherman, J.; McKay, M. J.; Ashman, K.; Molloy, M. P. How specific is my SRM?: The issue of precursor and product ion redundancy. Proteomics 2009, 9, 1120−1123. (9) Liu, J.; Bell, A. W.; Bergeron, J. J.; Yanofsky, C. M.; Carrillo, B.; Beaudrie, C. E.; Kearney, R. E. Methods for peptide identification by spectral comparison. Proteome Sci. 2007, 5, 3. (10) Craig, R.; Cortens, J. C.; Fenyo, D.; Beavis, R. C. Using annotated peptide mass spectrum libraries for protein identification. J. Proteome Res. 2006, 5, 1843−1849. (11) Neuhauser, N.; Michalski, A.; Cox, J.; Mann, M. Expert system for computer assisted annotation of MS/MS spectra. Mol. Cell. Proteomics 2012, 11 (11), 1500−1509. (12) Frese, C. K.; Altelaar, A. F.; Hennrich, M. L.; Nolting, D.; Zeller, M.; Griep-Raming, J.; Heck, A. J.; Mohammed, S. Improved peptide identification by targeted fragmentation using CID, HCD and ETD on an LTQ-Orbitrap Velos. J. Proteome Res. 2011, 10, 2377−2388. (13) Wang, J.; Bourne, P. E.; Bandeira, N. Peptide identification by database search of mixture tandem mass spectra. Mol. Cell Proteomics 2011, 10, 12. (14) Ye, D.; Fu, Y.; Sun, R. X.; Wang, H. P.; Yuan, Z. F.; Chi, H.; He, S. M. Open MS/MS spectral library search to identify unanticipated post-translational modifications and increase spectral identification rate. Bioinformatics 2010, 26, i399−i406.





Technical Note

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Tel: +886-2-27883799#1711. Fax: 886-2-2782-4814. Notes

The authors declare no competing financial interest. 2310

dx.doi.org/10.1021/pr301039b | J. Proteome Res. 2013, 12, 2305−2310