International Summit on Proteomics Data Release and Sharing Policy

Jan 1, 2008 - In fact, the lack of widely followed data-release policies in proteomics is currently seen by many as a stumbling block for the progress...
0 downloads 7 Views 107KB Size
International Summit on Proteomics Data Release and Sharing Policy

O

n August 14, 2008, the National Cancer Institute (NCI) of the National Institutes of Health (NIH) sponsored a summit in Amsterdam that included members from the international proteomics community. The summit had one goal: to begin defining policies and practices that would govern the release of proteomics data into the public domain. Data sharing has become standard practice in the large-scale genomic sequencing community, as outlined in the Bermuda principles (www.ornl.gov/sci/techresources/Human_Genome/ research/bermuda.shtml). These principles, which were developed at a 1996 Bermuda gathering sponsored by the Wellcome Trust and ultimately endorsed by all of the major parties involved in the Human Genome Project, state that the primary genomic sequence should be “freely available and in the public domain as soon as possible in order to encourage research and development, and to maximize its benefit to society.” This widespread sharing of prepublication data greatly accelerated the pace of scientific discovery in the field of genomics, and the organizers of this Amsterdam summit (myself; Rolf Apweiler of the European Bioinformatics Institute [EBI]; Mike Snyder of Yale University; Henning Hermjakob of EBI; and Mathias Uhle´n of AlbaNova University Center, KTH-Royal Institute of Technology [Sweden]) believe the same will hold true for proteomics. In fact, the lack of widely followed data-release policies in proteomics is currently seen by many as a stumbling block for the progress and support of the field as a whole. When should data be released? It was agreed that investigators taking part in community resource projects (where the goal is to produce resources for the broader scientific community) should be required to submit data once they are produced. In contrast, investigators working on individual projects should be required to submit data upon publication. However, the issue was raised that if proteomic data are released prior to publication, data producers may not have an opportunity to publish the first analyses of their own work because other investigators will have access to the data. So far, this issue has been successfully handled within the genomics community by clearly defining tripartite responsibility: that of data producers, users, and funders. These responsibilities also are applicable to the proteomics community. What types of proteomic data should be released? What types of metrics define data quality for proteomics, including MS and protein/affinity array data? These are difficult questions, given that this is a complex and burgeoning field, but

10.1021/pr800779q

Not subject to U.S. Copyright. Publ. 2008 Am. Chem. Soc.

they must be addressed now if proteomics is ever to live up to its promise. Taking part in these discussions were representatives from European Union funding agencies, NCI/NIH, EBI, the Wellcome Trust, Genome Canada, the National Center for Biotechnology Information, the National Institute of Standards and Technology, proteomics journals, and many international universities. It was agreed that, at a minimum, what the community both wants and needs are high-quality, well-annotated raw data (MS and protein/affinity array). Raw data were also deemed to be the most reliable interchange format for a repository. Access to these data will require the proper infrastructure: communitysupported standardized formats, controlled vocabularies and ontologies, minimal reporting requirements, and publicly available online repositories. In addition to raw data, it is essential to include metadata, information on data quality, and identification quality control. So how can we make this happen? Data sharing cannot be purely voluntary. Scientists, journals, and funding agencies will need to take the necessary steps to ensure that all parties adhere to the standards for data release. The central repositories, in turn, will need to reduce the burden on data producers by clearly defining minimum requirements while encouraging rich annotation. The submission process will need to be seamless. With respect to defining metrics for data quality, the consensus was that it will be important for central repositories to come up with their own thresholds for metrics, in a coordinated manner with users and one another to ensure interoperability. Over time, we believe that the value of these central resources will become obvious and create momentum. This international 1-day summit was a major step forward for the proteomics community. We anticipate that after the release of a white paper resulting from the summit, the principles developed at this meeting can be readily adopted by the field as guidelines for releasing and sharing proteomics data.

HENRY RODRIGUEZ Director, Clinical Proteomic Technologies for Cancer, NCI, NIH

Journal of Proteome Research • Vol. 7, No. 11, 2008 4609