news
Katie Cottingham reports from the U.S. National Cancer Institute’s International Summit on Proteomics Data Release and Sharing Policy—Amsterdam Proteomics researchers now agree on some aspects of data sharing Early on, genomics researchers decided that their data should be freely accessible, either as the data were generated or soon after publication. Once the Human Genome Project was in full swing, this notion was solidified by a group of leaders in the field who met in 1996 in Bermuda to formally discuss the issue. The outcome was the Bermuda principles, in which genomicists were instructed to make data available as soon as possible and to deposit sequences into public databases. Proteomics scientists, however, have not been as quick to jump on the datasharing bandwagon as their genomics counterparts were. Rolf Apweiler, who is at the European Bioinformatics Institute (EBI; U.K.), remembers co-chairing a U.S. National Institutes of Health meeting on proteomics data sharing in early 2005. “There was quite a strong faction that had objections to shared proteomics data,” he recalls. But times have changedsa bit. On August 14, 2008, ∼25 proteomics researchers, journal editors, and representatives of funding agencies gathered in Amsterdam, just a few blocks from where the HUPO Seventh Annual World Congress would take place 4 days later. The International Summit on Proteomics Data Release and Sharing Policy was organized by Apweiler; Henry Rodriguez of the U.S. National
Cancer Institute; Michael Snyder of Yale University; Henning Hermjakob of EBI; and Mathias Uhle´n of AlbaNova University Center, KTH-Royal Institute of Technology (Sweden) to determine where proteomics researchers stand on this issue right now and to define quality metrics for data that are shared. As the meeting got under way, Rodriguez realized that the community wasn’t ready to discuss technical issues associated with sharing data, such as how to determine whether the data are of sufficient quality to be posted. He says that proteomics researchers still seemed to be grappling with some of the same issues that they were debating years ago; those include whether the data should be made public and what types of data should be released. So, the aim of the meeting evolved from how to define quality to “answering some of those 10,000-foot-level questions,” says Rodriguez. Apweiler points out that, if nothing else, the mood has changed since 2005. “There’s much more willingness to share the data because people see more and more the value of comparing their own data to [those] of other people,” he explains. Stepping back a bit, the attendees focused on the types of data that should be made public and when those data should be made available. One key outcome was that the attendees agreed that raw data should be shared. “Raw data is extremely important because it is the information that comes off the instrumentation that has not gone through some sort of algorithm developed by an individual,” explains Rodriguez. The fact that the attendees agreed that raw data, not just protein or peptide lists, should be ac-
Standards update Several standards efforts have been announced in a recent issue of Nature Biotechnology (2008, 26, 860-866). In a series of letters to the editor, members of various committees of the HUPO Proteomics Standards Initiative describe four new tools to help researchers report the minimum information about a proteomics experiment (MIAPE). With the MIAPE-MS module, scientists can provide information about an MS experiment, including the instrument manufacturer, the ion source used, and
4612
Journal of Proteome Research • Vol. 7, No. 11, 2008
cessible was “the single best outcome, which actually took me by surprise,” he says. Another key point that participants agreed on was the timing of data release. Rodriguez says that attendees made a distinction between data generated by large consortia, which should be made available as soon as possible, and data produced by individual investigators, which could wait until publication. Also discussed was where researchers could put raw data. Currently, the only database that can accept large, raw data files is Tranche (http://tranche.proteome commons.org), developed by Phil Andrews and Jayson Falkner of the University of Michigan. Attendees were concerned that only one database of this type exists. Journal editors were not yet ready to mandate processed or raw data submission to databases, but they strongly encouraged deposition. Rodriguez says, “One of the things [editors] did recommend was that the funding organizations should be applying the policy to the grantees.” In addition, editors suggested that standards be adjusted every few years to keep up with technological developments. As the organizers discovered, not all issues can be resolved in a 1-day meeting. Already, additional workshops on data sharing are being planned by various groups for 2009. Also, Rodriguez is working on a white paper to inform the larger community of the topics discussed at the meeting. “The scientists, the funding agencies, and the journals see [data sharing] as a very important thing now to move the field forward, and we should all work together to make this happen,” says Apweiler.
the results. The MIAPE-MSI (MS informatics) checklist specifies the software applied to analyze MS results, as well as the inputs, outputs, and interpretations. Gel electrophoresis (GE) information, such as the gel matrix components and the image acquisition process, is covered by MIAPE-GE. Finally, PSI-MOD is proposed as a community-standard ontology for natural and nonnatural protein modifications. For information about MIAPE, visit www.psidev.info/miape.
10.1021/pr800781d
2008 American Chemical Society