Correction to “Improved False Discovery Rate ... - ACS Publications

Oct 21, 2016 - We have determined that the analysis of real data sets reported in “Improved False Discovery Rate ... distributions, this is the same...
1 downloads 3 Views 489KB Size
Addition/Correction pubs.acs.org/jpr

Correction to “Improved False Discovery Rate Estimation Procedure for Shotgun Proteomics” Uri Keich,* Attila Kertesz-Farkas, and William Stafford Noble* J. Proteome Res. 2015, 14 (8), 3148−3161. DOI: 10.1021/acs.jproteome.5b00081 S Supporting Information *



W

e have determined that the analysis of real data sets reported in “Improved False Discovery Rate Estimation Procedure for Shotgun Proteomics” is problematic due to the way that precursor charge state was handled. Correcting the error leads to systematic changes in some of those results; however, the overall trends that we observe and the main conclusions of our study remain unchanged. We would also like to take this opportunity to clarify that we implemented TDC by comparing two competing separate searches. This approach coincides with our model, and, because all scores are calibrated using 10K spectrum-specific empirical distributions, this is the same as searching the concatenated DB. We also clarify that in this corrected merging of the precursor charge states we estimated π0, the proportion of foreign spectra, using the R package qvalue with the option bootstrap. Finally, in the published paper we defined a score as calibrated as follows: “More precisely, if Si is the score of the best match to spectrum σi in a randomly drawn database then the score is calibrated if for any spectra σi and σj, P(Si ≥ Sj) = P(Sj ≥ Si).” However, in practice, we needed a slightly stronger definition: “More precisely, if Si is the score of the best match to spectrum σi in a randomly drawn database, then the score is calibrated if for any spectra σi and σj and α ∈ , P(Si ≥ α) = P(Sj ≥ α). In other words, the null distribution of the optimal PSM score is invariant of the spectrum.” This Correction includes updated versions of Table 2, Figure 6, and Supplementary Figure 6.

ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jproteome.6b00853. Supplementary Figure 6. Median number of T-TDC and C-TDC discoveries in the yeast data set. (PDF)

Table 2. Discrepancy in PSM Discoveries Reported by Different Applications of T-TDC and Mix-Maxa % only in one T-TDC set yeast

worm

Plasmodium

% only in one mix-max

FDR

0.01

0.05

0.10

0.01

0.05

0.10

0.05 quantile median 0.95 quantile 0.05 quantile median 0.95 quantile 0.05 quantile median 0.95 quantile

0.00 0.05 3.6 0.00 0.09 5.2 0.00 0.08 2.9

0.1 0.3 1.9 0.00 0.2 3.5 0.00 0.2 1.7

0.4 0.6 2.2 0.07 0.4 3.3 0.1 0.4 2.4

0.00 0.00 3.2 0.00 0.00 4.8 0.00 0.00 2.7

0.00 0.00 1.6 0.00 0.00 3.0 0.00 0.00 1.5

0.00 0.00 1.7 0.00 0.00 2.8 0.00 0.00 1.7

a

For each of 2000 pairs of applications of T-TDC/mix-max to analyze the Tide searches of the target database, coupled to two independently drawn decoys, we found the percentage of PSM discoveries (across the two largest charge sets of each species spectra sets) that were reported at the given FDR by only one of the two T-TDC/mix-max runs. The Table gives the quantiles of these percentages. The results show that mix-max consistently exhibits less decoy-dependent variability than T-TDC.

© 2015 American Chemical Society

A

DOI: 10.1021/acs.jproteome.6b00853 J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research

Addition/Correction

Figure 6. Median ratios of number of discoveries in the yeast data set. Each panel plots, as a function of estimated FDR, the median ratio of the number of mix-max to T-TDC/STDS/STDS-PIT discoveries. (For reference, the number of T-TDC discoveries is given in Supplementary Figure 6.) The spectrum sets are the yeast, worm, and malaria data sets. In the corresponding published paper figure the yeast data were separated by charge state and hence are correct as is. For completeness, we now provide the yeast data-set-wide analysis (charges 1−3). In each plot, the medians were taken with respect to 1000 corresponding discovery ratios, estimated using that many randomly drawn decoy databases. Each pair of target-decoy databases was searched using two different search engines: Tide and MS-GF+. In all cases the scores were calibrated using spectrum-specific empirical distributions constructed from 10K randomly drawn decoy databases, as described in ref 6. Overall, the graphs are qualitatively similar to the results from the simulated data (Figure 4).

B

DOI: 10.1021/acs.jproteome.6b00853 J. Proteome Res. XXXX, XXX, XXX−XXX