The Problem with Peptide Presumption and Low Mascot Scoring

Sep 27, 2011 - Cooper1 states in both the text and in Table 1 that the Mascot identity score threshold drops to 0 in the limit of a single candidate s...
0 downloads 0 Views 848KB Size
LETTER pubs.acs.org/jpr

Response to: The Problem with Peptide Presumption and Low Mascot Scoring John S. Cottrell*,† and David M. Creasy† Matrix Science Limited, London, United Kingdom ABSTRACT: A letter published in January 2011, “The Problem with Peptide Presumption and Low Mascot Scoring”, raised concerns about the reporting of peptide identifications based on mass spectrometry data with high precursor mass accuracy. We explain why we believe these concerns are unfounded.

ooper1 states in both the text and in Table 1 that the Mascot identity score threshold drops to 0 in the limit of a single candidate sequence. This is not how Mascot results are reported. Mindful of the dangers of accepting matches on the sole basis of molecular mass, the threshold score is never allowed to drop below 13, equivalent to 20 candidate sequences. The same behavior will be observed by anyone using the Mascot Parser API to query the result file programatically. This has been the case since March 2007, when Mascot 2.2 was released. Hence, it is incorrect to say that “As both scores approach zero, the Mascot rule for assessing match significance is no longer meaningful ...”. We share the author’s concern at low scoring matches being accepted as meaningful by nonexpert users. However, this does not apply to the examples cited in Cooper’s references 1 5. All of these publications originate from the same group of collaborators, who are very experienced in database search matters. Their publications explain that Mascot was used as a prefilter, to create an initial short-list of matches that was then filtered by their own software to achieve a peptide match false discovery rate (FDR) of 1%. Cooper raises concerns about the use of target-decoy searches with a narrow precursor mass tolerance and presents data, in his Table 2, for matches to a single spectrum. Clearly, the number of candidate sequences for a single spectrum will fluctuate widely between target and decoy; the numbers can only be expected to balance out when a large number of spectra are considered. The accuracy of the target-decoy approach depends on the total number of significant matches, not on the total number of candidates, and a certain number of significant matches will be required to obtain useful precision on the FDR estimate. For example, if the only source of error was

C

r 2011 American Chemical Society

counting statistics, and there were 1000 significant matches from the target and 10 significant matches from the decoy, the estimated FDR would be between 0.7 and 1.3% (standard deviation on a count of 10 is 100.5). Trying to estimate FDR by target-decoy when there are insufficient matches is a serious mistake, particularly when there are no matches at all from the decoy, but this is not a consequence of using a tight precursor mass tolerance. Using a wider tolerance will give more candidates, but it will not give significant matches to more spectra unless the original tolerance was too narrow compared with the experimental mass error. The common observation is that FDR increases rather than decreases for very narrow precursor tolerances because the reliability of the scoring is reduced by the small numbers of candidates.2 In summary, we agree that using a narrow mass tolerance and a small database can result in very few candidate sequences for a given spectrum, which is not a good basis for scoring the quality of the peptide spectrum match. If the analyte sequence is missing from the database, there is a theoretical danger of accepting a molecular mass match. We cannot find evidence in the literature to indicate that this is a widespread problem, and Mascot safeguards against this by having a minimum significance threshold of 13. We have confidence in the target-decoy approach as a means of estimating FDR as long as the number of significant matches is large enough to deliver acceptable precision. Anyone who performs a target decoy search, gets no matches from the decoy, and believes this means their FDR is zero is sadly mistaken. Received: August 2, 2011 Published: September 27, 2011 5272

dx.doi.org/10.1021/pr200726c | J. Proteome Res. 2011, 10, 5272–5273

Journal of Proteome Research

LETTER

’ AUTHOR INFORMATION Corresponding Author

*John S. Cottrell, Matrix Science Ltd., 64 Baker Street, London W1U 7GB, U.K. Phone: +44 20 7486 1050. Fax: +44 20 7224 1344. E-mail: [email protected]. Notes †

The authors declare financial and employment interests in Matrix Science Ltd., publisher of Mascot.

’ REFERENCES (1) Cooper, B. The Problem with Peptide Presumption and Low Mascot Scoring. J. Proteome Res. 2011, 10, 1432–5. (2) Brosch, M.; Swamy, S.; Hubbard, T.; Choudhary, J. Comparison of Mascot and X!Tandem performance for low and high accuracy mass spectrometry and the development of an adjusted Mascot threshold. Mol. Cell. Proteomics 2008, 7, 962–70.

’ NOTE ADDED AFTER ASAP PUBLICATION This paper was published on the Web on Sep 29, 2011, with the Conflict of Interest statement missing. The corrected version was reposted on Oct 6, 2011.

5273

dx.doi.org/10.1021/pr200726c |J. Proteome Res. 2011, 10, 5272–5273