Ending the “Publish and Vanish” Culture: How the Data

Jul 6, 2009 - Increasingly there is a move to gather such data in repositories such as the PRIDE database,(1) annotated with detailed metadata to desc...
1 downloads 8 Views 107KB Size
Ending the “Publish and Vanish” Culture: How the Data Standardization Process Will Assist in Data Harvesting

P

roteomics experiments generate rich data sets listing, and increasingly quantitating, protein expression from a variety of subcellular locations and produced under any experimental conditions the researcher chooses to designate. This information can collectively be used to understand how the changing patterns of protein expression drive cell differentiation, the cell cycle, cellular response to a particular agonist or to explain the difference between healthy and diseased tissue. In isolation, however, this data represents only a list of proteins at a single moment of time and one, moreover, which is impossible to search when the data is scattered across the pdf files and supplementary materials of a number of journal articles. This makes the simple question “In which tissues is my protein expressed?” impossible to answer. Increasingly there is a move to gather such data in repositories such as the PRIDE database,1 annotated with detailed metadata to describe the precise conditions under which it was gathered. Not only do these public domain databases enable any bench scientist to ask questions about the protein complement which drives a specific cellular process, but the deposition of the accompanying spectra used to generate protein lists ensures that data sets can be updated as the protein sequence databases, on which the initial identifications depend, improve in both sequence quality and completeness. To facilitate the move to make all published data easily available to the community, there are two contributions which can be made by all authors while preparing manuscripts for submission. First, the Minimum Information About a Proteomics Experiment (MIAPE) reporting guidelines should be consulted, and adhered to, during the writing process.2 These guidelines were prepared by the HUPO-Proteomics Standards Initiative (PSI) group, to ensure that a paper contains all the relevant information required for an eventual reader to fully understand how the experiment was performed. Second, authors need to be encouraged to regard data submission to a public domain resource as an integral part of the publication process. The time is now judged to be ripe to encourage all authors to follow this route. To this end, the editors of most of the major journals publishing in this domain met with the HUPO-PSI and Publication Committees to further this aim (Orchard et al., in preparation). The field of proteomics is now reaching the level of maturity such that it is no longer seen as acceptable that the data from any experiment not be readily available for

10.1021/pr900527r

© 2009 American Chemical Society

computational data mining. Over the last 2 years, the databases have been requested to play their part in this process by making the submission process simpler for the average research worker, and much progress has been made in this area, with robust, open-source tools becoming increasingly available to assist in importing data generated by proteomics laboratories into public repositories, although it is widely accepted there is still room for improvement. The MIAPE reporting guidelines are currently being further refined, in consultation with the domain-specific journals, to differentiate between the information required within a manuscript and the additional information which should be added to a database submission. The onus is now on the scientific community to respond to this challenge and no longer bury their valuable data from the reach of even the most dedicated search engine. As this journal moves to positively encourage both the adoption of the HUPO-PSI standards by its submitting authors and the deposition of the accompanying data into the public domain, it is hoped that leading proteomics laboratories will recognize the benefits of this move and actively contribute to the process. Increased data availability will enable more complex queries to be both asked and answered, enriching our understanding of the intricate protein expression patterns which drive cellular biology. Increased visibility of a data set through database searching will also result in an improved citation rate for the original submitter, ensuring a direct reward for making the data publicly available. Continuing to ‘publish and vanish’, and restrict public access to valuable data should no longer be considered an option by this community. SANDRA ORCHARD EMBL Outstation - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, U.K.

References (1) Jones, P.; Cote, R. G.; Cho, S. Y.; Klie, S.; Martens, L.; Quinn, A. F.; Thorneycroft, D.; Hermjakob, H. PRIDE: new developments and new datasets. Nucleic Acids Res. 2008, 36, d878–883. (2) Taylor, C. F.; Paton, N. W.; Lilley, K. S.; Binz, P.-A.; Julian, R. K., Jr.; Jones, A. R.; Zhu, W.; Apweiler, R.; Aebersold, R.; Deutsch, E. W.; Dunn, M. J.; Heck, A. J.; Leitner, A.; Macht, M.; Mann, M.; Martens, L.; Neubert, T. A.; Patterson, S. D.; Ping, P.; Seymour, S. L.; Souda, P.; Tsugita, A.; Vandekerckhove, J.; Vondriska, T. M.; Whitelegge, J. P.; Wilkins, M. R.; Xenarios, I.; Yates, J. R., III; Hermjakob, H. The minimum information about a proteomics experiment (MIAPE). Nat. Biotechnol. 2007, 25, 887–893.

Journal of Proteome Research • Vol. 8, No. 7, 2009 3219