Unbiased False Discovery Rate Estimation for Shotgun Proteomics

However, eqs 7 and 8 above imply that T can assume any values from 0 to infinity. ... at 1% FDR or 1000 PSMs at 0.1% FDR, based on the uncorrected est...
0 downloads 0 Views 556KB Size
Article pubs.acs.org/jpr

Unbiased False Discovery Rate Estimation for Shotgun Proteomics Based on the Target-Decoy Approach Lev I. Levitsky,†,‡ Mark V. Ivanov,†,‡ Anna A. Lobas,†,‡ and Mikhail V. Gorshkov*,†,‡ †

Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region 141701, Russia V.L. Talrose Institute for Energy Problems of Chemical Physics, Russian Academy of Sciences, Moscow 119991, Russia



J. Proteome Res. 2017.16:393-397. Downloaded from pubs.acs.org by UNIV OF KENTUCKY on 08/19/18. For personal use only.

S Supporting Information *

ABSTRACT: Target-decoy approach (TDA) is the dominant strategy for false discovery rate (FDR) estimation in mass-spectrometry-based proteomics. One of its main applications is direct FDR estimation based on counting of decoy matches above a certain score threshold. The corresponding equations are widely employed for filtering of peptide or protein identifications. In this work we consider a probability model describing the filtering process and find that, when decoy counting is used for q value estimation and subsequent filtering, a correction has to be introduced into these common equations for TDA-based FDR estimation. We also discuss the scale of variance of false discovery proportion (FDP) and propose using confidence intervals for more conservative FDP estimation in shotgun proteomics. The necessity of both the correction and the use of confidence intervals is especially pronounced when filtering small sets (such as in proteogenomics experiments) and when using very low FDR thresholds. KEYWORDS: proteomics, false discovery rate, target-decoy approach



INTRODUCTION In shotgun proteomics, a search engine typically produces a large set of peptide-spectrum matches (PSMs), which raises the problem of their quality assessment.1 While p value has been traditionally central for determination of statistical significance of experimental measurements,1 it in itself does not account for the multiple testing scenario of database searches and thus must be corrected.2 For this reason, the notion of false discovery rate (FDR)3−5 was readily adopted, as it is inherently well-suited for shotgun proteomics.2 A typical FDR controlling procedure starts with a list of PSMs sorted by search score and considers all possible score thresholds while estimating the corresponding values of FDR. A number of estimation procedures have been proposed, including ones based on empirical p values4 and posterior error probabilities (PEPs).6,7 The introduction of the target-decoy approach (TDA)8−10 provided new means of estimating the statistical significance of PSMs. While major postsearch validation algorithms assume certain score distributions and employ decoy PSMs for model training and subsequent estimation of PEPs,6,7,11 a number of popular search engines12−16 implement a simpler algorithm based solely on TDA. As of version 2.10, Percolator6 also uses the “target-decoy competition” mode by default. This algorithm is agnostic of score distributions17 and uses PSM scores for ranking only.18 The decoy counting approach has been shown to be consistent with PEP-based FDR estimation using several major scoring schemes.11 In this work, we discuss the naı̈ve method of FDR estimation and filtering based on PSM sorting and decoy counting. © 2016 American Chemical Society

Throughout this paper, we use the following terminology adopted from the earlier works.19,20 By false discovery proportion (FDP), we mean the actual proportion of false positive matches above a certain score threshold.19 By false discovery rate (FDR), we mean the expected value of FDP: FDR = (FDP),19 where the expectation value is taken with regard to a fixed score threshold.20 To denote the estimated value of FDR, we use FDR , ̂ which can be considered as a function of the score threshold. We refer to q values as the minimal FDR threshold at which a given PSM is accepted, following the definition by Käll et al.21,22 Note that making a distinction between “real” and estimated FDR naturally leads to making the same distinction between “real” and estimated q values, and any method of q value estimation is based on the corresponding method of FDR estimation. The main assumption behind TDA is that the probability of a false match coming from the target or decoy database is proportional to their relative sizes. If the size of the decoy database is equal to that of the target database, then there are equal probabilities for a false PSM to originate from the target or decoy database. Having the above assumption in mind, the numbers of decoy and target false PSMs are usually considered equal, and FDR is estimated using the following formula:18,23 ̂ d FDR = t

(1)

Received: February 17, 2016 Published: November 28, 2016 393

DOI: 10.1021/acs.jproteome.6b00144 J. Proteome Res. 2017, 16, 393−397

Journal of Proteome Research



where d and t are the number of decoy and target PSMs in the set, respectively. In the more general case, when the sizes of decoy and target databases are not equal,18 eq 1 takes the form: ̂ d FDR = rt

During the spectrum matching process, each spectrum is considered and assigned independently from the others. According to TDA, false PSMs are distributed uniformly across the target and decoy databases, so for any given spectrum which is not assigned the correct peptide sequence, the probability p of this spectrum being assigned a peptide from the target database is given by the decoy-to-target database size ratio r as follows:

(2)

p= (3)

j⩾i

(4)

By definition, the calculated q values grow monotonically in the list of sorted PSMs. The last step in the filtering procedure is to locate the position in the list where the estimated q value exceeds the desired FDR level: n = max j : qĵ ⩽ F

1 1+r

(6)

Thus, each false PSM can be represented by a Bernoulli trial, and if one iterates over the filtered PSM list Sn then all occurring false PSMs can be viewed as a Bernoulli process, i.e., a series of independent Bernoulli trials, or black and white balls being drawn from an urn (with replacement). Black and white balls correspond to decoy PSMs and false PSMs from the target database, respectively. According to the filtering procedure described above, if the stoppage criterion is tied to a specific FDR value estimated using eq 2, the FDR threshold is always set immediately before a decoy PSM. Let d be the number of decoy PSMs above the FDR threshold and T be the random variable denoting the number of false target PSMs above the same threshold. Although T has been modeled before using the binomial distribution,25 that approach requires artificial normalization to obtain the probability distribution of T. On the other hand, it does not seem to take into account that the iteration stops when the (d + 1)-th decoy PSM is found (and not included in the output). This dictates the following formulation of our urn problem: how many white balls will be drawn from the urn before the (d + 1)-th black ball is drawn (given the constant probability p of drawing a white ball in a trial)? In this case, the number T of false PSMs from the target database above the threshold follows the negative binomial distribution with parameters d + 1 and p: T ∼ NB(d + 1; p). Its probability mass function is then:

After that, q values for all PSMs can be estimated using the above-mentioned definition as follows:

qî = min fĵ

RESULTS AND DISCUSSION

Probability Model

where r is the size ratio between decoy and target databases. The procedure of filtering a set of PSMs to a desired FDR level can be modeled in the following way. First, the list of PSMs is sorted according to their scores. Then, FDR is estimated for each sublist containing the top i PSMs, Si:

fi ̂ = FDR(Ŝ i)

Article

(5)

The sublist Sn is the filtered PSM list corresponding to the desired FDR threshold F. The described procedure is abstract in the sense that it does not specify how FDR is estimated in eq 3. In case the estimation is performed using eq 2, the sequence fî is not monotonic because fî decreases with each target PSM and increases with each decoy PSM. In the subsequent step (eq 4), the sequence is “monotonized”, but the estimated q values also increase at decoy PSMs only. As a result, the score threshold is always set so that a decoy PSM occurs next in the list, immediately below the threshold. Intuitively, this fact must introduce an imbalance between target and decoy false matches above the score threshold, which is not accounted for in the equations above. This effect may contribute to certain drawbacks of TDA pointed out elsewhere,24 such as the estimated zero FDR value for PSM sets without any decoy matches. It was also mentioned in the cited work that TDA yields inaccurate results for data sets with small numbers of spectra.24 It should also be emphasized that, because of the random nature of false matches, the number of false target PSMs above the threshold (and, accordingly, the FDP) is a random variable, and as such should not be characterized exclusively by its expected value. The random variance of FDP typically remains unaddressed in proteomics studies, although error estimations have been performed both theoretically1,25 and using simulations.8 Herein, we analyze the filtering process and propose a probability model of TDA-based filtering which allows direct calculation of the expected value and variance of FDP and introduces the necessary correction to eq 2 as an attempt to address the above-mentioned drawbacks of TDA. We also propose using confidence intervals for conservative FDR estimation and present a freely available implementation of the proposed algorithm. While we use PSM-level FDR for the following discussion, the same reasoning applies equally to peptide- and protein-level FDR filtering if it is based on TDA.

(T = k) =

(k +k d )p (1 − p) k

d+1

(7)

where  denotes probability. Then, the expected value for T, (T ) is given by (T ) = (d + 1) ·

p d+1 = 1−p r

(8)

This means that, because we consider a sublist of PSMs that is “cut” just before a decoy, we should expect that, on average, it contains d + 1 false targets rather than d , as implied by the r r common formula. Interestingly, this is consistent with the results reported by He et al.26 and Huttlin et al.25 Indeed, the normalization procedure used by Huttlin et al. effectively converts the binomial distribution into a negative binomial (see the Supporting Information for the formal proof). However, eqs 7 and 8 above imply that T can assume any values from 0 to infinity. In fact, the range of possible values for T is limited by the number t of all target matches above the applied FDR threshold. The corresponding corrections are discussed in the Supporting Information (Figure S1). We show 394

DOI: 10.1021/acs.jproteome.6b00144 J. Proteome Res. 2017, 16, 393−397

Article

Journal of Proteome Research that, in the typically used range of FDR ⩽ 5%, eq 8 can be safely used for FDR estimation (Figure S1). By substituting d with the corrected estimation from eq 8, we r obtain the more correct form of eq 2: ̂ (T ) = d + 1 FDR ≈ t rt

α. The last k then is the confidence value α(T ). This procedure, as well as the corrected FDR estimation, is implemented in the Pyteomics library.28 To obtain a simple estimate for the scale of FDP variance without convoluted iterative calculations, one can use the known expression for standard deviation of the negative pr binomial distribution NB(r; p), σ = : 2

(9)

(1 − p)

Eq 9 represents the main outcome of the considered model. In the most common case of r = 1, it takes the form: ̂ d+1 FDR ≈ t

σ (T ) =

(1 − p)2

(12) 1

or σ(T ) = 2(d + 1) for p = 2 . Eq 12 is based on the negative binomial distribution and thus only applicable when FDR does not significantly exceed 5% (see the Supporting Information for details).

(10)

Thus, with r = 1, the model predicts that the expected number of false target PSMs above the FDR threshold does not equal the number of decoys d but is actually closer to d + 1. This means that eqs 1 and 2 have an optimistic bias and should be corrected if employed for FDR filtering. Eq 10 can also be obtained if FDR is estimated using decoy-based empirical p values adjusted to get the correct type I error rate.2,27

Effect of Monotonization

Eq 4 can be rewritten as q̂i = min (fî ; q̂i+1). This has the following implications: (1) q̂i ⩽ q̂i+1 (monotonicity) and (2) q̂i ⩽ fî . The latter means that, in general, q values will have an optimistic bias relative to the FDR estimation for the corresponding subset of PSMs. Thus, if q values are reported for a list of PSMs, they generally present a biased estimation of the corresponding FDR levels. This holds as long as eq 4 is used for q value calculation, regardless of the FDR estimation method. The scale of this bias depends on the score distributions (more specifically, on local FDR). However, if the PSM list is filtered using q values, then the threshold position k is obviously chosen so that q̂k < q̂k+1, which means that q̂k = fk̂ . Thus, filtering based on q values does not introduce any additional bias into FDR estimation. Figure 1 shows the difference between q̂i and fî in cases when fî values are calculated using eq 1.

Calculation of Confidence Intervals

If the number of decoy matches above the threshold is low, the effect of the correction becomes more apparent. This is often the case in proteogenomics studies and other settings aimed at detecting rare events. However, even with the correction, relative deviation of FDP from the unbiased FDR estimation may be very significant. For example, if there is one decoy PSM above the threshold (corresponding to 100 PSMs at 1% FDR or 1000 PSMs at 0.1% FDR, based on the uncorrected estimation), then the uncorrected expected number of false target PSMs in the filtered set is 1, while the correct expected value is 2. Direct calculation of (T ⩽ 2) with d = 1 using eq 7 shows that there is a ∼31% probability that 3 or more false target PSMs are in the filtered set, which means a 3-fold deviation of FDP from the FDR estimation without correction or a 1.5-fold deviation in case the correct expected value is used. For this reason, it may be more practical to consider confidence intervals of T rather than just its expected value. Although Huttlin et al. propose using pseudosymmetric confidence intervals,25 we argue that it may be more sensible to consider intervals with their left boundary fixed at 0 so that they correspond to the probability of FDP exceeding a certain value. Given the already calculated probability distribution (eq 7), we can introduce the confidence value α(T ) as the right border of such an interval for confidence level α: α(T ) ≡ inf{k ∈ +0 : (T ⩽ k) ⩾ α}

(d + 1)p

(11)

Figure 1. FDR and q value estimation based on eqs 1, 3, and 4. The sequence fî is not monotonic: it increases at decoy PSMs and decreases at target PSMs. q̂i = fî when q̂i < q̂i+1 (before decoy PSMs).

+0

where is the set of all non-negative integers. Thus, α(T ) is the lowest non-negative integer k, such that (0 ⩽ T ⩽ k) ⩾ α . Confidence value is the upper bound for the number of false targets above a certain threshold. In the example above (d = 1), the 95% confidence value for T is  0.95(T ) = 6, i.e., with probability ⩾95%, there are no more than 6 false targets in the filtered set when there is 1 decoy above the threshold. Here, 6 is a more conservative and meaningful estimation of the possible number of false targets than the expected number 2 and especially the uncorrected expected number 1. Direct implementation of eq 11 implies calculation and summing of the probabilities (T = k) (e.g., using eq 7) for consecutive integers k starting from zero until the sum exceeds

The correction discussed in this work applies when filtering is based on a predefined FDR value, and eq 1 or similar is used for FDR estimation (which results in a decoy PSM directly following the calculated threshold). In other cases, e.g., when using a predefined search score threshold or simply considering a certain number of top matches, the +1 correction is not applicable. For instance, Percolator v. 2.10 uses eqs 1 and 4 by default for q value estimation (see the Supporting Information for a demonstration). In this case, if one considers the top 500 results and the 501st PSM is not a decoy, q̂500 reported by Percolator will be a biased estimation of FDR, while a more 395

DOI: 10.1021/acs.jproteome.6b00144 J. Proteome Res. 2017, 16, 393−397

Article

Journal of Proteome Research ̂ = correct estimation would be f500

( dt )500 (the +1 correction

is not needed). However, if one filters the list by q value, e.g., q̂i < 0.01 (the case described by the proposed model), then the d , while the correct threshold q value will be q ̂ = f ̂ =

( t )k d+1 FDR estimation for the filtered list will be ( t ) . For another k k

k

example, MaxQuant v. 1.5.2.8 reports q values for identified protein groups. Analysis shows that these values are in fact not d , monotonic and correspond to FDR estimations, f ̂ =

( t )i

i

rather than to q values as defined by eq 4. This means that they do not have the monotonization bias, but the correction proposed in this work still applies. Simulated Experiments

The following experiment was performed to test the abovementioned theoretical conclusions. Two score distributions were chosen arbitrarily to model “true” and “false” PSMs; normal distributions were used for convenience. One thousand true PSMs were generated by drawing random numbers from the true score distribution. They were all labeled as true and target. Then, 5000 “false” PSMs were also generated by drawing random numbers from the false distribution. They were all labeled as false; labels target or decoy were assigned randomly to each false PSM with equal probabilities, corresponding to 1 p = 2 or r = 1. The resulting set of PSMs was then filtered to 1% FDR using the standard TDA approach. This corresponds t to d = 100, which allows using eq 10 for FDR estimation, as mentioned above. After filtering, the number of false PSMs from the target database (i.e., the actual value of T), was calculated and compared to the number of decoy PSMs, d. The T ̂ d were actual values of FDP = and the estimated FDR =

Figure 2. (a) Result of a single simulated experiment: distribution of simulated PSMs by score (lower is better) and the calculated 1% FDR threshold. (b) Average true FDP converges to the corrected estimator ̂ d + 1 rather than the commonly used estimator FDR = ̂ d. FDR = t t

t

t

averaged over the repeated experiments. The results are shown in Figure 2. Expectedly, the mean difference tends to 1 after a t

sufficient number of repeats. The experimental workflow described above was also employed to evaluate the effect of using confidence values instead of expected values of FDP. For this purpose, a set of PSMs was generated once. Apart from regular q values, a more conservative estimation of q values was made by using  0.95(T ) instead of (T ) in FDR estimation. The results are shown in Figure 3. The q values calculated using expected values oscillate around the true values, but the curve calculated using the confidence values provides a more conservative estimate. ROC curves plotted using both sets of q values are shown in Figure S2. The results of the simulation do not depend on the shape of the score distributions or the extent of overlap between them because the employed filtering method considers only PSM ranking. Thus, the overlap affects only the number of PSMs above the threshold corresponding to a certain FDR setting, but the error in the number of false positive target PSMs always tends to 1. However, in case of high discrimination between true and false PSMs, the sorting-based procedure itself can be substantially suboptimal. This effect is illustrated in the Supporting Information (Figure S2).

Figure 3. True q values compared to their estimations. The expected value can be higher or lower than the true q value; confidence value estimation is much more conservative.



CONCLUSIONS Several major search engines and postsearch algorithms employ a naı̈ve approach to FDR filtering using only ranked lists of PSMs and not score values for q value calculation. If these q values are used for FDR filtering, a decoy identification is always immediately below the calculated threshold. Careful examination of the corresponding filtering process shows that, in this case, an important correction to the commonly used equation is needed for unbiased estimation of FDR in filtered sets of peptide and protein identifications. When filtering small sets or using very low FDR thresholds, random deviation of actual FDP from the FDR threshold becomes very significant. Confidence values of FDP can be effectively used instead of point estimations of FDR. The amended FDR calculation procedures proposed in this work have been implemented in 396

DOI: 10.1021/acs.jproteome.6b00144 J. Proteome Res. 2017, 16, 393−397

Article

Journal of Proteome Research the open-source Pyteomics library,28 allowing more accurate TDA-based filtering, especially in case of small sets of PSMs.



(12) Wenger, C. D.; Coon, J. J. A proteomics search algorithm specifically designed for high-resolution tandem mass spectra. J. Proteome Res. 2013, 12, 1377−86. (13) Cox, J.; Neuhauser, N.; Michalski, A.; Scheltema, R. A.; Olsen, J. V.; Mann, M. Andromeda: a peptide search engine integrated into the MaxQuant environment. J. Proteome Res. 2011, 10, 1794−805. (14) Cox, J.; Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 2008, 26, 1367−72. (15) Dorfer, V.; Pichler, P.; Stranzl, T.; Stadlmann, J.; Taus, T.; Winkler, S.; Mechtler, K. MS Amanda, a universal identification algorithm optimized for high accuracy tandem mass spectra. J. Proteome Res. 2014, 13, 3679−3684. (16) Kim, S.; Pevzner, P. A. MS-GF+ makes progress towards a universal database search tool for proteomics. Nat. Commun. 2014, 5, 5277. (17) Nesvizhskii, A.; Vitek, O.; Aebersold, R. Analysis and validation of proteomic data generated by tandem mass spectrometry. Nat. Methods 2007, 4, 787−797. (18) Elias, J. E.; Gygi, S. P. In Methods Mol. Biol.; Hubbard, S. J., Jones, A. R., Eds.; Humana Press: Totowa, NJ, 2010; Vol. 604, pp 55− 71. (19) Pawitan, Y.; Calza, S.; Ploner, A. Estimation of false discovery proportion under general dependence. Bioinformatics 2006, 22, 3025− 3031. (20) Fan, J.; Han, X.; Gu, W. Estimating false discovery proportion under arbitrary covariance dependence. J. Am. Stat. Assoc. 2012, 107, 1019−1035. (21) Käll, L.; Storey, J. D.; MacCoss, M. J.; Noble, W. S. Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. J. Proteome Res. 2008, 7, 29−34. (22) Käll, L.; Storey, J. D.; MacCoss, M. J.; Noble, W. S. Posterior error probabilities and false discovery rates: Two sides of the same coin. J. Proteome Res. 2008, 7, 40−44. (23) Jeong, K.; Kim, S.; Bandeira, N. False discovery rates in spectral identification. BMC Bioinf. 2012, 13, S2. (24) Gupta, N.; Bandeira, N.; Keich, U.; Pevzner, P. A. Target-decoy approach and false discovery rate: when things may go wrong. J. Am. Soc. Mass Spectrom. 2011, 22, 1111−20. (25) Huttlin, E. L.; Hegeman, A. D.; Harms, A. C.; Sussman, M. R. Prediction of error associated with false-positive rate determination for peptide identification in large-scale proteomics experiments using a combined reverse and forward peptide sequence database strategy. J. Proteome Res. 2007, 6, 392−8. (26) He, K.; Fu, Y.; Zeng, W.-F.; Luo, L.; Chi, H.; Liu, C.; Qing, L.Y.; Sun, R.-X.; He, S.-M. A theoretical foundation of the target-decoy search strategy for false discovery rate control in proteomics. Cornell University Library; Cornell, NY, 2015; http://arxiv.org/abs/1501. 00537, arXiv:1501.00537 [stat.AP]. (27) Davison, A.; Hinkley, D. Bootstrap Methods and their Application; Cambridge University Press: Cambridge, UK, 1997; pp 175−180. (28) Goloborodko, A. A.; Levitsky, L. I.; Ivanov, M. V.; Gorshkov, M. V. Pyteomics − a Python framework for exploratory data analysis and rapid software prototyping in proteomics. J. Am. Soc. Mass Spectrom. 2013, 24, 301−304.

ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jproteome.6b00144. Corrected form of eq 8 (section S1) and visualization of the effect of correction (Figure S1), source code for the simulated experiments and the generation of all figures in this work, in the form of an IPython Notebook (section S2), comparison of ROC curves based on expected value and confidence value (Figure S2), proof of equivalence between the equations used by Huttlin et al.25 and the negative binomial distribution used in this work (section S3), analysis of q values reported by Percolator v. 2.10 (section S4) (PDF)



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]; Phone: +7 499 1378257. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work was supported by the Russian Science Foundation Grant 14-14-00971. The authors thank Anton Goloborodko and Julia Bubis for useful discussion of the presented results.



REFERENCES

(1) Granholm, V.; Käll, L. Quality assessments of peptide-spectrum matches in shotgun proteomics. Proteomics 2011, 11, 1086−1093. (2) Granholm, V.; Navarro, J. F.; Noble, W. S.; Käll, L. Determining the calibration of confidence estimation procedures for unique peptides in shotgun proteomics. J. Proteomics 2013, 80, 123−131. (3) Soric, B. Statistical “Discoveries” and Effect-Size Estimation. J. Am. Stat. Assoc. 1989, 84, 608−610. (4) Benjamini, Y.; Hochberg, Y. Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B: Statistical Methodology 1995, 57, 289−300. (5) Storey, J. D.; Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. U. S. A. 2003, 100, 9440−9445. (6) Käll, L.; Canterbury, J. D.; Weston, J.; Noble, W. S.; MacCoss, M. J. Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat. Methods 2007, 4, 923−925. (7) Keller, A.; Nesvizhskii, A. I.; Kolker, E.; Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 2002, 74, 5383− 92. (8) Elias, J. E.; Gygi, S. P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 2007, 4, 207−214. (9) Moore, R. E.; Young, M. K.; Lee, T. D. Qscore: An algorithm for evaluating SEQUEST database search results. J. Am. Soc. Mass Spectrom. 2002, 13, 378−386. (10) Peng, J.; Elias, J. E.; Thoreen, C. C.; Licklider, L. J.; Gygi, S. P. Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: The yeast proteome. J. Proteome Res. 2003, 2, 43−50. (11) Käll, L.; Storey, J. D.; Noble, W. S. Non-parametric estimation of posterior error probabilities associated with peptides identified by tandem mass spectrometry. Bioinformatics 2008, 24, 42−48. 397

DOI: 10.1021/acs.jproteome.6b00144 J. Proteome Res. 2017, 16, 393−397