U.S. HUPO Fourth Annual Conference - Journal of Proteome

U.S. HUPO Fourth Annual Conference. Katie Cottingham. J. Proteome Res. , 2008, 7 (6), pp 2190–2191. DOI: 10.1021/pr083742g. Publication Date (Web): ...
0 downloads 0 Views 525KB Size
8000 6000 4000 2000 0

What started as a simple competition to compare the performances of search engines for peptide identifications turned out to be a cautionary lesson regarding the calculation of false discovery rates (FDRs). As Stephen Master at the University of Pennsylvania pointed out in his talk at the U.S. HUPO conference, peptide identification algorithms that include protein-level information in their analyses can bias the calculation of the peptide FDR. The competition, called Critical Assessment of Mass Spectral Identifications (CAMSI), was first discussed at the inaugural meeting of the Statistical Proteomics Initiative at last year’s U.S. HUPO conference in Seattle. “The idea behind CAMSI is that we recognized that there was a rapidly expanding group of search engines that needed to be compared,” explains Master. Although comparisons have been reported previously, they usually included only the most popular, established search engines. Also, high-accuracy data from FT mass spectrometers and Orbitraps had not been analyzed in these reports. Therefore, Master, Alexey Nesvizhskii of the University of Michigan, and William Stafford Noble and Lukas Kall of the University of Washington devised CAMSI, a competition in which participants could use any search algorithm they wanted on three data sets that were provided as mzXML files. The first set consisted of C. elegans data that were obtained with a linear ion trap with low mass accuracy at the MS1 and MS2 stages. The second set was Orbitrap data with high mass accuracy at the MS1 stage but low mass accuracy at the MS2 stage. This information was obtained for a known 12-protein mixture. Finally, a human neuroblastoma cell line was analyzed with an Orbitrap instrument with high mass accuracy in the MS1 and MS2 scans. A total of 11 groups participated, and each could submit up to five rankings per data set. The researchers also gave the participants the same decoy database, which

Peptides identified

CAMSI: a word of caution about false discovery rates

Data set 1

10,000

Katie Cottingham reports from the U.S. HUPO Fourth Annual Conference—Bethesda, Md.

STEPHEN MASTER

m ee t i n g n e w s

0.005

0.01 Cutoff

0.05

Diversity in algorithm performance. The number of peptides that competitors successfully identified from data set 1 is shown for various FDR cutoffs.

comprised forward peptide sequences and scrambled versions of those sequences. Hits to a decoy sequence allow scientists to estimate the FDRs for peptide identifications. CAMSI participants didn’t know which sequences were forward or scrambled. Only Master had access to the key, and he scored the hits as correct (a forward sequence) or incorrect (a scrambled sequence). “We realized in retrospect that a number of groups and a number of algorithms are importing protein information into their peptide identifications,” says Master. “One of the take-home messages was how pervasive this actually was.” Algorithms, such as X!Tandem or Percolator, that include protein-level information can bias the calculation of FDRs. In some cases, programs initially search a peptide database for possible identifications. Then, the best hits are pulled, and the proteins from which these peptides originated are identified. Finally, the algorithm searches the database a second time, but only for additional peptides from those identified proteins. Search engines that perform a second pass on a subset of a database bias the process, and the FDR is skewed. Another way that a researcher can introduce bias is by eliminating “one-hit wonders”, or those peptides that are the sole representatives of a particular protein in an identification list. “If you had a decoy hit and then you filtered out one-hit wonders, you might realize that was a

2190 Journal of Proteome Research • Vol. 7, No. 6, 2008

decoy hit,” explains Master. Taking protein information into account isn’t always a bad thing, if the ultimate goal is to identify the proteins that are present in a sample, Master points out. “You can be misled if you think you’re estimating peptide false positive rates by using a decoy database and you’re using protein information,” he says. “However, I think one of the ways you could look at the whole CAMSI experience is that if you’re looking at protein false positive rates and use protein information, it can be very valuable.” Overall, the results were varied. No single algorithm produced the best results, but in general, those teams that included protein information did somewhat better than the others. “At the end of the day, this exercise is not so much about having a competition and finding a winner as it is about figuring out how we all compare with each other and what we can do as a community to improve the performance of search methods,” says Master. Now that the organizers know how the use of protein data impacts the peptide FDR, they plan to refine the decoy database strategy in the next round of CAMSI. In addition, they will obtain new data sets and provide these in raw formats so that researchers will have more flexibility. Also, data from the current CAMSI competition may be made public to facilitate discussion in the bioinformatics and proteomics fields.

Imaging MS for the clinical toolbox

microscope, you couldn’t tell.” He says that these data could help clinicians determine how much diseased tissue to Over the years, Richard Caprioli’s group remove from a patient. at Vanderbilt University and other reTo discover molecular search teams have refined signatures of disease for and optimized imaging MS diagnosis and prognosis, (IMS). At the recent U.S. (b) (a) 100% 100% Caprioli and colleagues HUPO conference, Caprioli wanted to examine samples explained that IMS may with known disease outsoon be ready for the clinic. comes at various time The method is compatible 0% 0% points. However, these with histological stains; types of well-documented can be applied to formalinsamples often are kept in fixed, paraffin-embedded large tissue banks and have (FFPE) samples; and can been preserved with formahelp clinicians visualize lin and embedded in parafmolecular changes in tissue (c) (d) fin blocks. While the rebefore these alterations are 100% searchers were working on visible with conventional a protocol to analyze FFPE microscopic techniques. samples, Alan Solomon of Most histological stains the University of Tennessee bind a broad range of mol0% approached Caprioli with a ecules, Caprioli points out. 109-year-old tissue sample Also, even when antibodies obtained from a medical are used in a clinical lab, museum in Sweden. Caprithey typically are not speoli figured that “if we can go cific. With IMS, however, no in and get useful informastains or antibodies are necMS imaging. A coronal section of a mouse brain with a tumor is visualized tion out of that, then maybe essary. Instead, molecules with IMS. The color intensity indicates the abundance of each protein. (a) there is hope of progress,” are analyzed directly from Histone H4 is an abundant protein in tumors. (b) Guanine nucleotide binding he says. Sure enough, the a piece of tissue by MALDI protein γ7 and (c) cytochrome c oxidase polypeptide VIIC are more abundant in normal brain tissue. (d) Overlay of individual images. team identified proteins MS. “What IMS brings to the from the sample that corhistologists is a molecular roborated the diagnosis of dimension,” he says. “So, β-amyloidosis. They also successfully a tumor sample was that the margin now we’re actually looking at the molapplied their protocol to more recently between normal and diseased tissue ecules themselves in the tissues.” The preserved FFPE specimens arranged in for some soft-tissue sarcoma samples MALDI laser desorbs proteins and other tissue arrays. was not the same with IMS and convenmolecules in consecutive spots as it is The next step is to bring IMS into the tional images. The IMS results indicated moved across a tissue or cells, thereby pathology lab and, perhaps, directly that molecular changes were occurring preserving spatial information. “This to the bedside, says Caprioli. “It’s not outside the margin that appeared to be technology fits very well in a clinical impossible to imagine a nurse sticking a normal tissue by histological methods. setting because we produce images of needle in a vein or scraping the skin and “We found that the tissue was already tissue—exactly what pathologists and then placing that into a portable mass being compromised,” explains Caprioli. histologists are used to looking at,” says spectrometer to do specific molecular “It was in the early stages of transCaprioli. analyses,” he says. formation, but if you looked under a In head-to-head comparisons, most RICHARD CAPRIOLI

IMS images were similar to those produced by histologists at a basic level. However, at the molecular level, one striking difference in the analysis of

people

New associate editor Setsuko Komatsu is a laboratory head at the National Institute of Crop Science and is a professor at the University of Tsukuba (both in Japan). She obtained her Ph.D. from Meiji

Pharmaceutical University (Japan), and her dissertation focused on the role of protein kinase-dependent phosphorylation during mammalian fertilization. She was previously employed at Meiji Pharmaceutical University and then at the Keio University School of Medicine (Japan). In 1993, Komatsu was a senior researcher at the National Institute of

Agrobiological Sciences. Since 1990, she has used protein sequencing and MS to study plant proteomics. Her main research interests are in the fields of crop proteomics, biochemistry, and molecular biology, with a special focus on signal transduction in cells. She joined the JPR Editorial Advisory Board in January 2008.

Journal of Proteome Research • Vol. 7, No. 6, 2008 2191