The Proteome Browser Web Portal - Journal of ... - ACS Publications

Dec 5, 2012 - In this article, we describe the development of an open-source platform containing an interactive web browser as a dynamic portal to the...
0 downloads 0 Views 3MB Size
Article pubs.acs.org/jpr

The Proteome Browser Web Portal Robert J. A. Goode,†,‡ Simon Yu,†,‡ Anitha Kannan,† Jeffrey H. Christiansen,§ Anthony Beitz,† William S. Hancock,∥ Edouard Nice,*,† and A. Ian Smith*,† †

Monash University, Clayton, Victoria, Australia Australian National Data Service (ANDS), Melbourne, Victoria, Australia ∥ Barnett Institute and Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts, United States; Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, NSW, Australia; Department of Integrated Omics for Biomedical Science, World Class University, Yonsei University, Seoul, Korea §

S Supporting Information *

ABSTRACT: In 2010, the Human Proteome Organization launched the Human Proteome Project (HPP), aimed at identifying and characterizing the proteome of the human body. To support complete coverage, one arm of the project will take a gene- or chromosomal-centric strategy (C-HPP) aimed at identifying at least one protein product from each protein-coding gene. Despite multiple large international biological databases housing genomic and protein data, there is currently no single system that integrates updated pertinent information from each of these data repositories and assembles the information into a searchable format suitable for the type of global proteomics effort proposed by the C-HPP. We have undertaken the goal of producing a data integration and analysis software system and browser for the C-HPP effort and of making data collections from this resource discoverable through metadata repositories, such as Australian National Data Service’s Research Data Australia. Here we present our vision and progress toward the goal of developing a comprehensive data integration and analysis software tool that provides a snapshot of currently available proteomic related knowledge around each gene product, which will ultimately assist in analyzing biological function and the study of human physiology in health and disease. KEYWORDS: Protein Browser, HUPO, C-HPP, database integration, database search engine, chromosome



INTRODUCTION The Human Proteome Project (HPP) was announced at the Human Proteome Organization (HUPO) 2010 meeting in Sydney as a HUPO initiative with the primary aim of characterizing the proteins in the human body with respect to their function, role in disease, physiology, site and level of expression. To achieve this goal, the project has been divided into two orthogonal arms: one focused on biology and disease (B/D-HPP) and the other taking a gene or chromosomecentric approach (C-HPP). This second approach aims to link the proteome with the underlying genome, thus extending the Human Genome Project of the previous decade into the future, and has been identified as a hot topic of 2013 by Wired UK.1 Both arms of the HPP are based upon three operational pillars: mass spectrometry-based data, antibody-based data and the scientific knowledgebase. The aims of C-HPP have recently been outlined by both Legrain et al.2 and Paik et al.,3 with a key component being a web portal to access and interrogate our current knowledge of the human proteome − a so-called “parts list”. This parts list is a key deliverable of the C-HPP,3 and in addition to defining which proteins are expressed, it will catalogue known isoforms or alternative splice variants (ASVs), single amino acid variants (SAAVs) caused by nonsynonymous single nucleotide poly© XXXX American Chemical Society

morphisms (nsSNPs), post-translational modifications (PTMs) and transcriptomic evidence for gene expression. This is critical for defining the “missing proteins list” that will aid targeted experimentation (e.g., IHC, antibody-based proteomics) to achieve a goal of the C-HPP, which is to experimentally identify at least one representative isoform for each protein-coding gene.4 To achieve this, the plethora of data that currently resides in numerous databases around the world needs to be collated into a coherent and searchable catalogue of the entire human proteome, acknowledging the need to assess data quality and to provide regular updates integrated from all supported and characterized data sites. This is in contrast to the majority of protein databases that currently provide relatively simple information about individual proteins, often gathered from single experimental modalities. In this article, we describe the development of an opensource platform containing an interactive web browser as a dynamic portal to the current status of the human proteome as well as having an underlying extendable framework that Special Issue: Chromosome-centric Human Proteome Project Received: October 24, 2012

A

dx.doi.org/10.1021/pr3010056 | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research

Article

framework each child data type must be a wholly subsumed subset of the parent data type, therefore antibody validation experiments conducted using recombinant protein would not contribute to the protein expression arm of the hierarchy. The currently planned data types (Figure 1 and Supplementary Figure 1, Supporting Information) span all aspects of the initial

delineates the proteome in the context of the linear genome (i.e., chromosome format) by drawing biological data from numerous sources, mapping it to the genome, and categorizing the data based on quality and information content and collating similar data types to answer common questions central to the aims of the C-HPP. This approach, first implemented for chromosome 7 as part of the Australian and New Zealand arm of the C-HPP initiative,5 has subsequently been extended to include all chromosomes to maximize its utility in the C-HPP.



DESIGN APPROACH Our overarching aim was to develop a web browser (“The Proteome Browser” − TPB) to access an up-to-date picture of the scientific community’s current knowledge of the proteome across large gene sets, such as chromosomes, to aid in the CHPP’s central goal of characterizing the human proteome. Achieving this goal requires accessing numerous data sources available in the community that contain multiple types of data (e.g., mass spectrometry-based, antibody-based, structural, etc.) that provide information on a range of gene and protein properties (e.g., protein and gene expression, tissue localization, post-translational modifications, interactions, etc.) in both a quantitative and qualitative manner. Thus, to ensure robustness and data quality in the system as well as meeting the C-HPP guidelines,4 we have only imported data from curated and normalized public data repositories, such as neXtProt,6 Peptide Atlas,7 GPM8 and Human Protein Atlas.9 Ensembl gene identifiers (currently GRCh37 build, version 68)10 have been employed as the basis of the gene set, although additional genes can be inserted into the gene set where evidence exists from data sources that do not map to genes annotated in Ensembl. To ensure the ongoing validity of data presented in TPB, only data deposited in public repositories will be maintained within the database.



DATA STRUCTURE Recently Uhlen et al.11 presented a pilot study for the C-HPP based on chromosome 21 in which the group presented a status matrix to demonstrate the known evidence for each gene from UniProtKB/Swiss-Prot and the Uhlen developed Human Protein Atlas.9 This presented several different pieces of evidence for each gene in a simple matrix, with each point color coded in a traffic light system (green, yellow, red or black) to qualify the level and quality of evidence. We have endeavored to extend this traffic light matrix to incorporate multiple data sources and types by creating a tiered hierarchy of data types, and applying defined thresholds to provide a reliability classification based on the particular data and source. The implementation of hierarchical data types enables grouping of similar information from different experiment types to address key questions such as; what evidence is there for protein expression from genes and from what experimental modality does it originate? Thus a primary (parent) data type is protein expression, with evidence for this sourced for example from mass spectrometry data, antibody-based experiments and curated databases, which each form subtypes (children) of the primary protein expression data type. Similarly, antibody-based evidence for protein expression may be sourced from various experimental processes, such as immunohistochemistry, immunofluorescence or Western blot analysis. These are therefore classified as tertiary data types, stored within the secondary protein expression/antibody data type. Within the

Figure 1. Example of TPB hierarchical data structure and types. Primary data types provide high-level information about proteins from each gene, while greater detail is derived from secondary and tertiary data types. The data structure is extensible, allowing for unlimited data types and levels. B

dx.doi.org/10.1021/pr3010056 | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research

Article

Figure 2. Application component diagram of TPB. TPB is implemented as a Java Server application built using MVC, DAO and IoC design patterns and employs Struts2 and AJAX technology as well as Freemarker templates for generating interactive webpage content, with persistence to a database achieved using Hibernate JPA. TPB is currently deployed on an Apache Tomcat server using a PostgreSQL database.

C-HPP deliverables (Supporting Information3), though this can be easily extended within the framework to address any relevant questions, such as if there are homologous genes in other species or whether suitable reagents are available for detecting the protein. While currently implemented with up to a fourtiered hierarchy, the framework’s tiered data structure ultimately allows for any number of tiers to maximize the extendibility of the system and enable rapid customization of the data structure. In addition to representing the presence of data in the various categories, to address the reliability of the data, a quality score is generated for each piece of data as it is imported into the database. This quality score can theoretically take any value, though at present for the traffic light display it takes integer values from 1 to 4, representing absence of data (1, black), presence of poor or undefined quality data (2, red), reasonable data that should be treated with caution (3, yellow) or good quality data that may be taken as reliable (4, green). Translation of data quality is based on criteria or threshold scores defined for each combination of data type and source. For example, probability-based mass spectrometry protein expression evidence from GPM is defined using thresholds on the protein log(e) values from individual GPM protein identifications. A full definition of both data type and data quality mappings is available in the documentation section of TPB Web site (proteomebrowser.org) and as Supporting Information. As each child data type is a subset of its parent, the majority of data is inserted into the lowest tier (and most specific data type) of the hierarchy. Data types in higher tiers then take a quality score based on any data within the specific data type and all of its child data types. Compiling this data into a single quality score can be calculated using any rule-based collation,

though most commonly it is the maximum quality score of all underlying data. For each TPB data type, numerous data sources may be utilized to provide evidence. Although any data format can be incorporated, routinely we have utilized XML formats from the data sources as well-defined and relatively stable sources to parse data into data types adhering to a clearly defined set of rules. In addition, the server actively checks for updates to the underlying databases, using either regular cron jobs or listening to RSS feeds published by the sources, to ensure the data in TPB is current. All the data in the TPB is versioned, which allows comparison between versions and reporting of the progress being made toward the goals of the C-HPP.



TECHNICAL DESIGN The overall architectural goals of the system are to provide a readily available and scalable data integration and analysis system for researchers both within and beyond the proteomics community. The overall system architecture (Supplementary Figure 2, Supporting Information) enables the visualization of the proteome by collating data from multiple sources, which is analyzed and indexed to generate the traffic light matrix and create high-level metadata about these data collections that can be published to metadata repositories. To facilitate the discovery and reuse of notable data collections within the human proteome, high-level descriptive metadata (conforming to the Registry Interchange FormatCollections and Services (RIF-CS) schema) is created to describe data collections and associated researchers and grants; and the metadata is then made openly accessible for harvesting C

dx.doi.org/10.1021/pr3010056 | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research

Article

Figure 3. Overview of the web browser of TPB. TPB displays a traffic light matrix of proteome information for hierarchical data types across chromosome-ordered genes in a scrollable upper panel. The hierarchical data types are collapsible to enable visualization at different levels of detail. Selection of any data point launches the lower panel, which displays the underlying source-separated data that can be drilled through the data hierarchy, with links to the original source.

through a data provider using the Open Archives InitiativeProtocol for Metadata Harvesting (OAI-PMH) protocol. Metadata repository harvesters associated with national and

community research data collections can then harvest this metadata, index it and put it into a searchable format. Current TPB collections are published using this system at the D

dx.doi.org/10.1021/pr3010056 | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research



Australian National Data Service’s Research Data Australia12 (Supplementary Figure 3, Supporting Information) and consist of chromosome level data sets. One of the key architectural goals was to leverage industry best practices to design and develop a scalable enterprise-wide J2EE application. To meet this goal, the design of the TPB project is based on core J2EE patterns as well as industry standard development guidelines. The design patterns used in the design and development of the project include: MVC (Model-View-Controller), DAO (Data Access Object), and Spring IoC (Inversion of Control). The application architecture (Figure 2) shows the various layers within the application that enable flexibility in the implementation of each layer and also provide for greater reusability and integration by loosely coupling application logic with infrastructure. The use of the Struts2 framework and FreeMarker templates enables rapid reconfiguration of webpage content without the need to reprogram large sections of the application. The interactivity of the browser is provided by the use of AJAX together with CSS. The current implementation of TPB utilizes Hibernate JPA for persistence to a PostgreSQL database, is run on an Apache Tomcat server and uses Monash’s Large Research Data Store (LaRDS) as its repository. LaRDS is Monash University’s “petascale” research data store (www.monash.edu.au/eresearch/services/lards/). TPB is viewable in all major browsers (Chrome, Safari, Firefox, IE 8 and above) and requires JavaScript to be enabled. TPB is being developed as an open-source project (source code available at http://code.google.com/p/hupohpp) and the involvement of the proteomic community and data repositories will facilitate the development of a richer platform for the analysis and visualization of the proteome. To enable community involvement, input and suggestions are encouraged through the project wiki (www.ozhupohpp7.com, Supplementary Figure 4, Supporting Information) that provides background information about the project, a link to the current version of TPB, and a forum/mailing list.

Article

TPB DEVELOPMENT At the time of writing, TPB is finalizing phase one of development and has developed an enterprise framework to compile data from a variety of sources and deployed a first version of the interactive Web site containing six data types from three data sources (neXtProt,6 GPM8 and Human Protein Atlas9,13). While already providing coverage from all three pillars of the HPP, data from PeptideAtlas (www.peptideatlas. org)7 is currently being implemented to complete the coverage of the gold standard C-HPP data repositories. While TPB was developed originally for human chromosome 7 (i.e., the focus of the Australian/New Zealand consortium of the C-HPP) and some other individual chromosome teams are developing browsers for their national efforts (e.g., Russia (chromosome 18), www.kb18.ru, and China’s CAPER, http:// 61.50.138.124/), it has been noted that a multichromosome informatics effort will be necessary to maximize the output of the C-HPP.14 Therefore, data from all human chromosomes, including the mitochondrial genome, has been incorporated into the TPB to provide complete coverage of the proteome and ensure it has the potential as a universal tool for the CHPP. Both simple and advanced filtering and search functionalities on data types and genes are currently being implemented. Simple filtering will be enabled on the main report to filter individual data types for particular evidence levels (e.g., genes with no (black) evidence for protein expression), while simple searches will enable rapid discovery of genes of interest in the report from any related gene or protein name. Additionally, advanced search and filtering capabilities are being added to enable targeted analysis of classes of proteins (such as Gene Ontology classes), enable complex filtration on multiple data types, and provide guidance for targeted experimentation. Phase two development will be underway at the time of publication and will include data from several additional data sources including structural- and PTM-specific databases (e.g., PDB15 and PhosphoSitePlus,16 respectively) and mRNA expression data (i.e., RPKM) along with additional relevant data types. To make use of these additional data sets, various analysis functionalities will be implemented in TPB which will, for example, enable correlation of various data types, such as transcript and protein expression data, to determine genes under-represented at the protein level based on their transcript expression. Additionally, future integration of data from the Encyclopedia of DNA Elements (ENCODE) into TPB would provide a powerful mechanism to examine the interplay between gene regulation and protein expression, as recently discussed.17 In the future, we envisage that registered users would have the ability to add user-defined data as a separate data type and apply analysis to user-defined gene sets (rather than chromosome-based gene sets) to assist in rapid characterization of laboratory data sets and identify novel protein identifications or expression patterns. However, to ensure integrity of the data quality, user uploaded data sets would only be viewable to the user and not stored permanently in the database. Beyond the C-HPP, we have designed the framework to be independent of any particular database, species or data type and therefore we anticipate its potential implementation, and thus utility, in other HUPO projects, like the B/D-HPP and various tissue-specific projects, as well as in other model organisms.



FUNCTIONAL WALKTHROUGH The TPB browser interface (www.proteomebrowser.org, Figure 3) is launched by selecting a chromosome in the basic search tool from any page within TPB site. This launches the traffic light matrix that displays summary evidence for primary data types for all chromosome-ordered genes from the default set of data sources. A simple filter is provided on both the home page and browser interface pages to alter the chromosome selection or limit the data sources used in the matrix generation. Each primary data type can be expanded or collapsed to observe or hide summary evidence from the underlying data types. Selection of the data type title launches a pop-up window with a brief description of the data type and the relevant thresholds for the evidence levels displayed. To observe the underlying data for a given gene, selection of any traffic light item in the matrix launches the secondary report below the traffic light matrix. This secondary report follows the similar tiered data structure as the matrix and provides information about the child data type’s summary evidence in a source specific manner. This secondary report allows drilling through the tiered summary data to the underlying data, which is displayed for each available protein accession together with a direct link to the original data in the data sources web resource. E

dx.doi.org/10.1021/pr3010056 | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research



Article

CONCLUSION/SUMMARY The Human Proteome Project faces numerous challenges in attempting to characterize the human proteome in the context of normal and disease biology. However, attempting to define a baseline proteome, and our current knowledge of it, is a fundamental starting point and also a key deliverable of the recently launched C-HPP. To assist this endeavor, we have developed and implemented a framework, termed The Proteome Browser (TPB), to unite data from numerous proteomic and scientific data repositories. Data is imported into a hierarchical data structure and provided with a quality score to generate a coherent overview of the current knowledge of human proteome. The framework also enables automatic updating of data and its extendibility allows the addition of new data types and sources, both of which are critical as the rate of data generation increases and new technologies develop. Together, these features will provide an up-to-date record of our progress toward identifying, and assist designing experiments to complete, the human proteome.



Browser; MVC, Model-View-Controller; DAO, Data Access Object; IoC, Inversion of Control; GPM, Global Proteome Machine; J2EE, Java Platform, Enterprise Edition; RIF-CS, Registry Interchange Format − Collections and Services; OAIPMH, Open Archives Initiative Protocol for Metadata Harvesting; PTM, post-translational modification; nsSNP, nonsynonymous single nucleotide polymorphism; ASV, alternative splice variant; SAAV, single amino acid variant



ASSOCIATED CONTENT

S Supporting Information *

Supplemental figures. This material is available free of charge via the Internet at http://pubs.acs.org. Additionally, a video demonstration of the web browser functionality along with additional documentation can be viewed through the TPB Wiki page at www.ozhupohpp7.com.



REFERENCES

(1) Herring, A., The Human Proteome. Special issue of Wired UK Nov. 2012, 39. Print. (2) Legrain, P.; Aebersold, R.; Archakov, A.; Bairoch, A.; Bala, K.; Beretta, L.; Bergeron, J.; Borchers, C. H.; Corthals, G. L.; Costello, C. E.; Deutsch, E. W.; Domon, B.; Hancock, W.; He, F.; Hochstrasser, D.; Marko-Varga, G.; Salekdeh, G. H.; Sechi, S.; Snyder, M.; Srivastava, S.; Uhlen, M.; Wu, C. H.; Yamamoto, T.; Paik, Y. K.; Omenn, G. S. The Human Proteome Project: current state and future direction. Mol. Cell. Proteomics 2011, 10 (7), M111 009993. (3) Paik, Y. K.; Jeong, S. K.; Omenn, G. S.; Uhlen, M.; Hanash, S.; Cho, S. Y.; Lee, H. J.; Na, K.; Choi, E. Y.; Yan, F.; Zhang, F.; Zhang, Y.; Snyder, M.; Cheng, Y.; Chen, R.; Marko-Varga, G.; Deutsch, E. W.; Kim, H.; Kwon, J. Y.; Aebersold, R.; Bairoch, A.; Taylor, A. D.; Kim, K. Y.; Lee, E. Y.; Hochstrasser, D.; Legrain, P.; Hancock, W. S. The Chromosome-Centric Human Proteome Project for cataloging proteins encoded in the genome. Nat. Biotechnol. 2012, 30 (3), 221−3. (4) Paik, Y. K.; Omenn, G. S.; Uhlen, M.; Hanash, S.; Marko-Varga, G.; Aebersold, R.; Bairoch, A.; Yamamoto, T.; Legrain, P.; Lee, H. J.; Na, K.; Jeong, S. K.; He, F.; Binz, P. A.; Nishimura, T.; Keown, P.; Baker, M. S.; Yoo, J. S.; Garin, J.; Archakov, A.; Bergeron, J.; Salekdeh, G. H.; Hancock, W. S. Standard guidelines for the chromosomecentric human proteome project. J. Proteome Res. 2012, 11 (4), 2005− 13. (5) O’Neill, G. Open book. Aust. Life Sci. 2012, 9 (4), 30−4. (6) Lane, L.; Argoud-Puy, G.; Britan, A.; Cusin, I.; Duek, P. D.; Evalet, O.; Gateau, A.; Gaudet, P.; Gleizes, A.; Masselot, A.; Zwahlen, C.; Bairoch, A. neXtProt: a knowledge platform for human proteins. Nucleic Acids Res. 2012, 40 (Database issue), D76−83. (7) Desiere, F.; Deutsch, E. W.; King, N. L.; Nesvizhskii, A. I.; Mallick, P.; Eng, J.; Chen, S.; Eddes, J.; Loevenich, S. N.; Aebersold, R. The PeptideAtlas project. Nucleic Acids Res. 2006, 34 (Database issue), D655−8. (8) Craig, R.; Cortens, J. P.; Beavis, R. C. Open source system for analyzing, validating, and storing protein identification data. J. Proteome Res. 2004, 3 (6), 1234−42. (9) Berglund, L.; Bjorling, E.; Oksvold, P.; Fagerberg, L.; Asplund, A.; Szigyarto, C. A.; Persson, A.; Ottosson, J.; Wernerus, H.; Nilsson, P.; Lundberg, E.; Sivertsson, A.; Navani, S.; Wester, K.; Kampf, C.; Hober, S.; Ponten, F.; Uhlen, M. A genecentric Human Protein Atlas for expression profiles based on antibodies. Mol. Cell. Proteomics 2008, 7 (10), 2019−27. (10) Flicek, P.; Amode, M. R.; Barrell, D.; Beal, K.; Brent, S.; Carvalho-Silva, D.; Clapham, P.; Coates, G.; Fairley, S.; Fitzgerald, S.; Gil, L.; Gordon, L.; Hendrix, M.; Hourlier, T.; Johnson, N.; Kahari, A. K.; Keefe, D.; Keenan, S.; Kinsella, R.; Komorowska, M.; Koscielny, G.; Kulesha, E.; Larsson, P.; Longden, I.; McLaren, W.; Muffato, M.; Overduin, B.; Pignatelli, M.; Pritchard, B.; Riat, H. S.; Ritchie, G. R.; Ruffier, M.; Schuster, M.; Sobral, D.; Tang, Y. A.; Taylor, K.; Trevanion, S.; Vandrovcova, J.; White, S.; Wilson, M.; Wilder, S. P.; Aken, B. L.; Birney, E.; Cunningham, F.; Dunham, I.; Durbin, R.; Fernandez-Suarez, X. M.; Harrow, J.; Herrero, J.; Hubbard, T. J.; Parker, A.; Proctor, G.; Spudich, G.; Vogel, J.; Yates, A.; Zadissa, A.; Searle, S. M. Ensembl 2012. Nucleic Acids Res. 2012, 40 (Database issue), D84−90. (11) Uhlen, M.; Oksvold, P.; Algenas, C.; Hamsten, C.; Fagerberg, L.; Klevebring, D.; Lundberg, E.; Odeberg, J.; Ponten, F.; Kondo, T.;

AUTHOR INFORMATION

Corresponding Author

*Ian Smith, e-mail: [email protected]. Ph: +61 3 9902 4050. Fax: +61 3 9902 0894. Edouard Nice, e-mail: ed.nice@ monash.edu. Ph: +61 3 9905 8905. Fax: +61 3 9905 8070. Author Contributions ‡

These authors contributed equally to this manuscript.

Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The authors wish to thank: each of the data sources for discussions and assistance with data formats, especially Ron Beavis from GPM; Young-Ki Paik and Seul-Ki Jeong (Yonsei Proteome Research Center) for ongoing support and discussions during conceptualization; members of the various national C-HPP teams, particularly Fernando Corrales (Spain) and Andrey Lisitsa (Russia) and all other participants at various TPB and C-HPP workshops held in Australia and at 2012 AOHUPO and HUPO Congresses. This work was supported by the Applications Program (AP32) of the Australian National Data Service (ANDS). ANDS is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy Program and the Education Investment Fund (EIF) Super Science Initiative. W.S.H. was supported by the World Class University program through the National Research Foundation of Korea funded by the Ministry of Education, Science and Technology (R31-2008-000-10086-0 to W.S.H.).



ABBREVIATIONS ANDS, Australian National Data Service; C-HPP, chromosome-centric Human Proteome Project; TPB, The Proteome F

dx.doi.org/10.1021/pr3010056 | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research

Article

Sivertsson, A. Antibody-based protein profiling of the human chromosome 21. Mol. Cell. Proteomics 2012, 11 (3), M111 013458. (12) Australian National Data Service, Research Data Australia. Australian National Data Service [Online], 2012. doi:10.4225/14/ 4F79169D5911B (13) Uhlen, M.; Oksvold, P.; Fagerberg, L.; Lundberg, E.; Jonasson, K.; Forsberg, M.; Zwahlen, M.; Kampf, C.; Wester, K.; Hober, S.; Wernerus, H.; Bjorling, L.; Ponten, F. Towards a knowledge-based Human Protein Atlas. Nat. Biotechnol. 2010, 28 (12), 1248−50. (14) Hancock, W.; Omenn, G.; Legrain, P.; Paik, Y. K. Proteomics, human proteome project, and chromosomes. J. Proteome Res. 2011, 10 (1), 210. (15) Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. The Protein Data Bank. Nucleic Acids Res. 2000, 28 (1), 235−42. (16) Hornbeck, P. V.; Chabra, I.; Kornhauser, J. M.; Skrzypek, E.; Zhang, B. PhosphoSite: A bioinformatics resource dedicated to physiological protein phosphorylation. Proteomics 2004, 4 (6), 1551−61. (17) Paik, Y. K.; Hancock, W. S. Uniting ENCODE with genomewide proteomics. Nat. Biotechnol. 2012, 30 (11), 1065−7.

G

dx.doi.org/10.1021/pr3010056 | J. Proteome Res. XXXX, XXX, XXX−XXX