Two are not always better than one - ACS Publications - American

To ad- dress the issue of data qual- ity, a group of scientists formulated the. MCP guidelines, which describe a set of publication standards. Over th...
0 downloads 10 Views 180KB Size
news

Two are not always better than one

peptide rule,” notes Gupta. So, when the error rate is the same, including only proteins with g2 peptide hits is We’ve all heard the saying “more is detrimental compared with keeping better”, but is that always true? In a high-quality single-peptide hits. paper recently published in JPR (DOI Gupta and Pevzner calculate a mea10.1021/pr9004794), Nitin Gupta and sure of individual protein reliability Pavel Pevzner at the University of (the protein false-positive rate) with California San Diego argue that, for special consideration for the number protein identifications, a single pepof identified spectra and the protein tide hit with a high peptidelength, as Kolker and Higdon spectrum matching (PSM) did. However, Gupta and score sometimes can be betPevzner’s method is an extenter than two hits with sion of the generating-funcmediocre or low PSM tion approach and involves scores. spectral dictionaries and lexiHowever, the conventional cons. The method outperwisdom among some experiformed a decoy database mentalists has been that two strategy and ProteinProphet, hits are better than one. This which combines peptide “two-peptide rule” has its oriscores to assess protein gins in a paper known as the identifications. “MCP guidelines” (Mol. Cell. Millions of spectra were Proteomics 2004, DOI analyzed in this work. “It 10.1074/mcp. would be interesting to see T400006-MCP200). The prohow their numbers behave for teomics field was exploding smaller-scale, more typical in 2004, and most papers inexperiments, e.g., 10,000cluded long lists of hundreds A case when 2 < 1. More correct human proteins were identified 100,000 spectra against or thousands of identified with a one-peptide rule than with the two-peptide rule. 200,000-2 million proteins,” proteins. Some skeptics grew notes John Cottrell of Matrix concerned about the quality Science, Inc. Gupta says that of the identifications. To adall of the Shewanella spectra dress the issue of data qualare publicly available at the Pacific edemonstrated that 68-98% of correct ity, a group of scientists formulated the Northwest National Laboratory webone-hit wonders could be salvaged by MCP guidelines, which describe a set site. The human data set is part of applying their statistical approach of of publication standards. another manuscript and will be made searching random decoy databases in Over the years, guideline #3 became available after that paper is published, combination with logistic regression known as the “two-peptide rule”; it he adds. models and cross-validation (Biostates that if a protein is identified on In the end, all of the evidence indiinformatics 2006, DOI 10.1093/ the basis of a single peptide, then adcates that the two-peptide rule should bioinformatics/btl595). With this stratditional information supporting the be abandoned, say researchers. egy, Kolker and Higdon calculated the identification must be provided. “The “Blindly applying the two-peptide rule, probabilities that single-hit proteins two-peptide rule was discussed in the without taking PSM score into account, were valid, taking into account several context of studies where there was may cause good protein hits to be disvariables, such as the certainty of idenlittle or no analysis done at all,” says carded,” Cottrell states. “So, if the rule tified spectra and the protein length. Alexey Nesvizhskii of the University of isn’t doing anything useful, the net reNow, Gupta and Pevzner offer their Michigan, a coauthor of the MCP sult can be negative.” Nesvizhskii extwo cents. “Error rates need to be guidelines. “The two-peptide rule was plains, “If you have a really good computed more rigorously instead of better than just throwing all data in a single-hit protein identification, and it resorting to intuitive rules that sound publication.” receives a high probability, then it’s as reasonable but have no theoretical jusLike a game of telephone, in which a good as a protein identified by two tification,” says Gupta. In the JPR work, line of people repeat a phrase to the peptides. In that sense, what [Gupta the researchers analyzed a human data person sitting beside them, the twoand Pevzner] say completely agrees set and a Shewanella oneidensis data peptide message got garbled. In pracwith what our message has been for a set with two popular database search tice, some researchers erroneously inlong time.” tools. “We show that in every case, the terpreted it to mean that only those —Katie Cottingham one-peptide rule outperforms the twoproteins identified with two or more NITIN GUPTA AND PAVEL PEVZNER, DOI 10.1021/PR9004794

peptides should be published, and those identified with only one peptide should be disregarded. To counter that notion, investigators such as Eugene Kolker of the Seattle Children’s Hospital provided experimental evidence that these “one-hit wonders” can have value. In a paper published online in 2006, Kolker and Roger Higdon of the BIATECH Institut-

4172

Journal of Proteome Research • Vol. 8, No. 9, 2009

10.1021/pr900703w

© 2009 American Chemical Society