Proteomics INTegrator (PINT): An Online Tool To Store, Query, and

Jun 20, 2019 - platform-independent system to store, visualize, and query proteomics experiment results. PINT provides an extremely flexible query int...
0 downloads 0 Views 3MB Size
Article pubs.acs.org/jpr

Cite This: J. Proteome Res. XXXX, XXX, XXX−XXX

Proteomics INTegrator (PINT): An Online Tool To Store, Query, and Visualize Large Proteomics Experiment Results Salvador Martínez-Bartolome,́ † Tom Casimir Bamberger,† Mathieu Lavalleé -Adam,‡ Daniel B. McClatchy,† and John R. Yates, III*,† †

Downloaded via BUFFALO STATE on July 29, 2019 at 23:15:57 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

Department of Molecular Medicine, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California, 92037, United States ‡ Department of Biochemistry, Microbiology and Immunology, Ottawa Institute of Systems Biology, University of Ottawa, 451 Smyth Road, Ottawa, Ontario K1H 8M5, Canada S Supporting Information *

ABSTRACT: The characterization of complex biological systems based on high-throughput protein quantification through mass spectrometry commonly involves differential expression analysis between replicate samples originating from different experimental conditions. Here we present Proteomics INTegrator (PINT), a new user-friendly Web-based platform-independent system to store, visualize, and query proteomics experiment results. PINT provides an extremely flexible query interface that allows advanced Boolean algebrabased data filtering of many different proteomics features such as confidence values, abundance levels or ratios, data set overlaps, sample characteristics, as well as UniProtKB annotations, which are transparently incorporated into the system. In addition, PINT allows developers to incorporate data visualization and analysis tools, such as PSEA-Quant and Reactome pathway analysis, for data set enrichment analysis. PINT serves as a centralized hub for large-scale proteomics data and as a platform for data analysis, facilitating the interpretation of proteomics results and expediting biologically relevant conclusions. KEYWORDS: proteomics, mass spectrometry, data filtering, data query, data visualization, enrichment analysis, Java GWT

1. INTRODUCTION Mass-spectrometry-based proteomics encompasses many different types of experimental techniques to analyze protein samples. Proteomics has become a very powerful means to perform high-throughput rapid and deep proteome sample characterization analyses, and new technological improvements continue to quickly evolve.1−4 For example, in the past 15 years, proteomics experiments have progressed from straightforward qualitative determination of proteomes, to quantitative protein expression profiles, to comprehensive protein interactome analysis.5 A great variety of bioinformatics tools have been developed to accompany each technological advance in proteomics.6,7 Typically, each new bioinformatics tool applies a specific algorithm, designed and implemented for a particular type of data or goal, and generates a different output. Furthermore, downstream data analyses8 often require data integration across replicate sample measurements, additional statistical assessments, as well as annotation enrichment analysis. In these cases, it is common to use in-housedeveloped scripts or data analysis tools such as Microsoft Excel, MATLAB, or R, with the consequence that the final steps of data analysis often result in custom data tables that are © XXXX American Chemical Society

difficult to fully integrate in publicly available proteomics databases such as the ProteomeXchange9 because these public repositories require standard data formats. Other online resources such as Zenodo (https://zenodo.org/), GitHub (https://github.com/), and ResearchGate (https://www. researchgate.net/), among others, allow scientists, following good bioinformatics practices, to share both the output of these custom analyses and the scripts used to generate them. However, this does not solve the problem of how to integrate the results of these analyses with other data sets or databases due to their nonstandard format. Furthermore, because a single proteomics laboratory can generate a large amount of data, the long-term storage and availability of data become problematic. The data storage issue is further complicated by the great variety of experimental designs that are used in proteomics, which makes it very difficult to encode and store in a database. Experimental design or experimental metadata is usually captured in a minimum fashion in public repositories, and the link between the actual Received: September 13, 2018 Published: June 20, 2019 A

DOI: 10.1021/acs.jproteome.8b00711 J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research

Figure 1. PINT architecture schema. The schema of the architecture designed for the PINT platform is shown here. The database is based on mySQL, and it is accessed by Hibernate in Java. Protein annotations and information from the database are represented in our proteomics data model, which becomes accessible to the users by a query engine that translates the objects in the model to paginated lists of lightweight objects ready to be visualized in the client browser using JavaScript generated by GWT.

(experimental design information) so that they can be used together in proteomics-specific queries. Another common task in proteomics is the selection of subsets of proteins that share a certain annotation, like a conformational feature or functional domain, a protein interaction, a cellular process or pathway, or their involvement in any disease or cellular malfunction. PINT is also capable of integrating publicly available protein annotations with the stored proteomics data, improving the ability to mine existing data sets and find additional discoveries. We designed PINT as a local repository for storing, sharing, and querying results originally obtained from different proteomics experiments in a laboratory. PINT is an opensource Java Web application that provides long-term storage of data, including the link between experimental design and results, thus providing a powerful query system designed specifically for proteomics. All data stored in PINT can be visualized through a highly interactive Web interface, facilitating its accessibility, visibility, and dissemination. The tool can be visited through the Webpage (http://pint.scripps. edu) and is available to be installed and serve as a local repository for any laboratory at https://github.com/ proteomicsyates/PINT.

proteomics results and the experimental metadata is lost or minimized when stored in these repositories. Current public databases for MS-based proteomics data sets such as PRIDE,10 GPMDB,11 PeptideAtlas,12 and MassIVE13 mostly focus on hosting the raw data and results files, and some of them reprocess the data using their own pipelines. PRIDE allows one to filter data sets by features such as instrument type, species, tissue, disease, modifications, or experiment types through its PRIDE-Archive browser14 and provides a simple search box in which users can look for data sets containing certain peptides or proteins. However, it does not allow one to query over more details from the data, such as actual scores or relative ratios. Others such as Panorama Public15 and ProteomicsDB16 allow a user to query to a certain extent specific details about a data set. Panorama Public, a public repository for quantitative data sets processed in Skyline,17 provides a query system to retrieve data sets containing certain proteins or peptides and showing targetproteomics-related features of them, but it is not possible, for example, to identify a set of proteins that have been differentially expressed in a certain tissue and to compare them to the ones reported in another data set. In the case of ProteomicsDB, it enables cross-data-set comparisons of protein abundance, also providing the means to store and analyze proteomics data. However, it is currently focused only on human data sets, and because it is a centralized resource, it is not possible to store any data set without the ProteomicsDB’s approval and intervention. Other LIMS (Laboratory Information Management System)-like systems, such as colims (CompOmics-LIMS), an improved version of the ms_lims18 system, provide an SQL panel to directly query over the database, which could go further into the data retrieved. However, this requires having knowledge about the SQL syntax as well as the database structure. OpenBIS19 (Open Source Biology Information System) is designed to manage, annotate, and share data measured in biological experiments. However, the integration of the data sets is done on a more general level, dealing with materials, methods, and experiments but not digging into the details of the actual data from the results. Here we present a complementary tool, PINT (Proteomics INTegrator), that focuses only on proteomics final results and provides a comprehensive integration of the data (proteins, peptides, and all features associated with them) and metadata

2. MATERIALS AND METHODS 2.1. Data Model, Database, and Web Interface

One of the main challenges in data management in proteomics is the great heterogeneity of the data due to the huge variety of experimental approaches. Different types of mass spectrometers may generate different types of data, which may need to be analyzed with different computational pipelines. On the contrary, the existence of different MS-based relative quantification techniques often requires different data analysis approaches, including the averaging of protein expression ratios or the statistical assessment of the significance of the relative abundance changes of the peptides or proteins among multiple conditions. Therefore, a data model for the appropriate handling and understanding of this great variety of proteomics results must be flexible enough to support any experimental setup but at same time must not lose any information that may play a key role in the interpretation of the experimental results and that may be necessary to filter the data in further analysis. With this in mind, the data model of PINT was designed to represent any experiment including the B

DOI: 10.1021/acs.jproteome.8b00711 J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research

Figure 2. Graphical user interface from the import data set wizard. (A) The import data set wizard is composed of several steps. (B) The user can go forward and backward through the different steps of the wizard with the button bar at the bottom. (C) The user can save the progress at any time and use the saved configuration file to start from the point he left it. In this case, we show the “Samples definition” step in which the user has defined two samples (“sample A” and “sample B”) and has associated each of them with the “Homo Sapiens” organism and the “1205-Lu cell” cell line. (D) The optional association with sample labels is not performed yet, as can be seen in the “drop isobaric label here” box in each sample.

final results linked to the experimental design, including qualitative and quantitative protein and peptide features as well as statistical confidence values, and any additional manual annotation that the author may want to associate with the data set (Figure 1). All experimental data are stored in a relational database. It provides a long-term scalable storage system and a way to access and filter the data. Our original database implementation is in MySQL (version 5.7.17, Oracle, https://www.mysql. com), but other PINT users can use other database engines. The database schema, the design of the tables and relations between them, and an SQL script that will automatically create the database structure are shared in the package. They are also accessible from the links available in the “Install PINT database” wiki section (https://github.com/proteomicsyates/ PINT/wiki/How-to-install-PINT#install-pint-database). The database design is reflecting the data model, with some optimizations for improving the performance of the queries. To access the database, we used a combination of predefined static queries and the Hibernate framework (version 5.0), an ORM (object-relational mapping) for Java that maps the tables of the database to Java classes and the rows in the tables to Java objects. PINT has been designed as a Web application, accessible from any computer in the same network. Its Web interface is based on GWT (Google Web Toolkit, http://www.gwtproject. org/), an open-source project for the development of JavaScript front-end applications in Java compatible with any

browser. The core of PINT has been developed in Java 1.8; therefore, it is compatible with any computational platform, and its installation is straightforward. Our PINT server is accessible from http://pint.scripps.edu and is open to the public, providing storage for our finished proteomics projects and open access to the public for projects that are already published. All of the source code is shared through a permissive license Apache 2.0 and available to download from its GitHub Web page (https://github.com/proteomicsyates/PINT), where a wiki Web page can be found to help the users to install and use the software. 2.2. Integrated External Tools, Resources, and Annotations

The modular design of the source code implementation in PINT facilitates its maintenance, the development of additional features, and the integration of external tools. As an example, two enrichment analysis tools are integrated into the Web interface of PINT. One of the tabs from the data set view of PINT includes the PSEA-Quant tool, a protein set enrichment analysis for label-free and label-based protein quantification data sets.20,21 Its integration is based on a direct call to the HTTP POST service that PSEA-Quant provides, which submits the quantitative values of a list of proteins across several replicates. The results are sent by e-mail to the user. Another tab in the data set view of PINT consists of the Reactome pathway enrichment analysis and pathway browser.22,23 Reactome is an open-access, manually curated and C

DOI: 10.1021/acs.jproteome.8b00711 J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research

are experimental conditions, samples, experiments or replicates, ratios between experimental conditions, and identif ication lists and abundance lists. Additional items such as sample origin (tissue/ cell type), organisms, and labels need to be directly associated with each sample item. So, to import a data set, the user must: (1) upload the input files and define the elements to extract from them (in the case of Excel tables), (2) define the experimental design items of the project, (3) and define the relationships between the main items in the experimental design and the input files. Different experimental designs will be represented differently, and some may be more intuitive to build in PINT than others. To guide the users, there are some examples of popular experimental designs. These examples are presented in the wiki Web page, but to illustrate the differences, we briefly describe two examples here: (1) Suppose that a user wants to import a quantification experiment with three replicates using 6-plex TMT, in which the first three labels are the three replicates from one condition and the last three labels are the three replicates of another condition. The final relative abundance ratios are provided at the protein level, with p values and q values assessing differential expression across the conditions associated with them. The user also has a file listing peptide identifications and another file listing the proteins associated with their final quantitation and statistical values. The user would have to define the following items: two experimental conditions, two samples, one experiment, and one ratio between the two experimental conditions. Then, each sample has to be associated with a different experimental condition, and both input f iles are associated with both experimental conditions and the experiment. Finally, to define the quantitative ratio, the user must select the input file with the proteins and ratios. When custom Excel tables are used as input files, the user will have to define which columns to use and what their content is, such as the columns describing the peptide sequences, protein accessions, ratio values, or associated scores or statistical values, that is, p values or q values. (2) As another example, suppose that a user is importing a phosphoproteomics experiment in which a sample was analyzed under two different experimental conditions in triplicate and that a single relative abundance ratio between two conditions was obtained for each identified phosphorylated peptide sequence. Say that the user has one Excel file with the peptide-level ratios for each of the three replicates in three columns. In this case, if the user wants to incorporate the ratios per replicate, then they will have to be split into Excel sheets. Then, the items to define would be two experimental conditions, two samples, and three replicates. Then, for each of the Excel sheets, a ratio at peptide level will be defined between the two experimental conditions, and each of the sheets will be associated with a different replicate. PINT guides the user through these steps in which the semantic information about the data contained in the input files is defined (Figure 2B). The progress of the data set import can be saved at any moment in a file that can be used to start from the configuration that was previously saved (Figure 2C). This greatly accelerates the import of subsequent experiments with similar experimental design but different results. A list of some common experimental design configurations is available to download from the starting import data set page to facilitate new users in the import of new data sets.

peer-reviewed pathway database that additionally provides bioinformatics tools for the visualization, interpretation, and analysis of pathway knowledge. In this case, its integration is based on the inclusion of the GWT widgets and the use of the Pathway Analysis service (https://reactome.org/ AnalysisService/) provided by the Reactome project. This enables the query and visualization of the latest release of the Reactome database directly from PINT. Additionally, PINT includes some direct queries to external resources through their respective REST Web services such as the IntAct Molecular Interaction database24 (https://www.ebi. ac.uk/intact/). PINT queries the binary interactions of a protein in the repository. PINT also queries whether a protein is described to be involved in a macromolecular complex, as described by the Complex Portal, a manually curated encyclopedic resource of macromolecular complexes.25 Finally, PINT can also verify whether proteins or peptides have been included in any cluster in the PRIDE Archive,14 using PRIDE Cluster,26,27 a resource and Web service provided by the PRIDE-EBI team that clusters all MS/MS spectra submitted to the PRIDE repository.14 Protein annotations are regularly updated and automatically associated with the proteins that are stored in the database. UniProtKB28 annotations are efficiently retrieved and stored on the server in XML format using the Web Services from EMBL-EBI Services,29 and PINT regularly checks for new releases that occur approximately every 4 weeks. OMIM (Online Mendelian Inheritance in Man)30 annotations, that is, the gene-disease associations stored in the OMIM database, are retrieved via the programmatic API access using an API key obtained after user registration at https://omim.org/api. Every time a data set is loaded to the data set view of PINT, the proteins will be mapped on the fly to the latest available annotations from these two resources. Protein entries in PINT are identified by UniProtKB accessions. Non-UniProtKB proteins stored in PINT such as International Protein Index (IPI) or NCBI’s Gene ID (gi) entries will be mapped, if possible, to UniProtKB entries using mapping tables. In every new release of UniProtKB, some protein entries may be marked as obsolete and removed or merged to other entries. PINT takes care of this issue by keeping the accessions of the proteins in its database up-todate, always showing the latest UniProt protein entry accession, even when the protein was originally submitted with an obsolete accession. In this case, the obsolete accession becomes a secondary accession.

3. RESULTS 3.1. Importing Data into PINT

To import new data sets into its database, the user needs to describe some metadata related to the experimental design associated with the resulting protein and peptide lists. Compared with the metadata that is required by ProteomeXchange to perform a submission, PINT focuses more on the actual experimental design metadata (experimental conditions, replicates, etc.), which could be used in a data query even for a single data set, and not (at least for now) on other types of metadata such as that which the mass spectrometer uses. PINT provides an import data set wizard guiding the user step by step in which the user will have to upload the input data files and then define certain items and set the relationships between them (Figure 2). The main items in the experimental design D

DOI: 10.1021/acs.jproteome.8b00711 J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research Table 1. List of Filters/Query Commands Available in PINT filter/query command name

code

syntax

description Proteins or peptides satisfying a condition in the amount value (spectral counts, intensity, NSAF, XIC, ...) Proteins with specified annotations described by parameters

COND

AM[Aggregation_level, Amount_type, COND, Numerical_condition] CAN[Uniprot_version, Uniprot_header_line, Annotation_type, Annotation_name, Annotation_value, Numerical_condition] COND[Condition_name, Project_tag]

gene name label

GN LB

GN[gene_name] LB[Aggregation_level, Label_name, ONLY]

MS run

MSRUN

MSRUN[CSV_MS_run_Ids]

protein accession post-translational modification ratio

ACC PTM RA

score

SC

peptide sequence simple annotation

SQ AN

taxonomy threshold

TX THR

tissue

TIS

ACC[CSV_accessions] PTM[PTM_name, PTM_mass_diff_Dalton, Dalton_tolerance, Numerical_condition] RA[Aggregation_level, COND,COND, Ratio_name, Proteins or peptides satisfying a condition over a quantitative Numerical_condition, SC] relative ratio between two experimental conditions SC[Aggregation_level, Score_type, Score_name, Proteins or peptides satisfying a condition over the value of a Numerical_condition] score SEQ[Aggregation_level, regular_expression] Peptides with a sequence matching a regular expression AN[uniprot_version, Annotation_String, Numerical_condition] Same as CAN but without the need to specify annotations types or UniProt header lines TX[Aggregation_level, Organism_name, Ncbi_tax_id, ONLY] Proteins and peptides from a specified organism THR[threshold_name, Boolean_value] Proteins satisfying certain condition over a threshold associated with them TIS[CVS_tissue_names] Proteins detected in certain tissues

amount

AM

complex annotation

CAN

condition/project

Proteins detected under certain experimental conditions in a specific project Proteins with a specified gene name Proteins and peptides that were labeled with a certain quantitation label Proteins and peptides detected in a list of MS runs (stated as replicates or experiments in the import data set wizard) Proteins from a list of protein accessions Peptides containing specified PTMs in the sequence

new project. By default, data sets will remain private, only accessible by an encoded URL, unless the data owner makes them public.

Currently PINT supports most of the common proteomics data files generated in the Integrated Proteomics Pipeline (IP2) software suit (Integrated Proteomics Applications). For protein and peptide identification data, resulting data files from DTASelect31,32 (i.e., DTASelect-filter.txt), a postidentification processing tool for assessing the confidence of the results of search engines such as ProLuCID33 or SEQUEST,34 are supported. For quantitative data, output files from Census,35 a quantification tool, are also supported, including the ones coming from a novel isobaric isotopologue labeling36 (i.e., census-chro.xml). Additionally, the HUPO-PSI37 standard data format for proteomics results, mzIdentML,38 is also supported. However, custom algorithms, often not published, as well as additional and manually applied analysis steps are usually performed to analyze the data as part of the whole data analysis pipeline. Although these additional “non-standard” steps may be also described in the publications associated with the data, they make their handling difficult and complicate the interpretation of the data in an automatic fashion due to their heterogeneity. They also complicate the full integration of the data in public repositories such as the ones in the ProteomeXchange consortium,9 considering that the final results of numerous proteomics studies are published in nonstandard data files or custom formatted tables. Under these circumstances, PINT recovers these results that would have otherwise been kept out of the range of most bioinformatics analysis pipelines by supporting Excel tables as input data files. When using these types of files, PINT shows the headers of the columns of the table so that the user can select which columns need to be imported, and they can be interpreted accordingly (Figure S1). Once all aspects of the experimental design are defined and input data files and the minimum information for its interpretation are properly provided, the data set is automatically aggregated and incorporated in the database. The new data set will become available through the Web interface as a

3.2. Querying Proteomics Data Sets

As previously described, PINT stores both actual proteomics data and experimental metadata in its database. That way, a single protein is always linked to the MS runs, experimental conditions, and samples in which it was detected or quantified. Similarly, other entities in an MS experiment such as peptides, peptide-spectrum matches (PSMs), associated confidence scores, and so on are always stored in the context of an experimental design. This information is available to be searched through a query system able to perform proteomics-specific filters. Furthermore, as described in the Materials and Methods, PINT performs an automatic integration of external annotations; therefore, they become searchable in combination with, or not, the experimental design context of the data. For example, PIN queries allow the rapid identification of proteins, shared between a set of experimental conditions, with a confidence score above a given threshold that are also known to be associated with diseases, or they may contain at least one phosphorylated site. Queries can be performed from the Web interface through different customized filters constructed to extract the information of interest from the data sets. The filters can be combined using logical operators such as “AND” (intersection), “OR” (union), “XOR” (exclusive union), or “NOT” (negation). Applying multiple filters across the same bottomup proteomics data set is not a trivial task, especially when filters need to consider different elements in the data sets, that is, proteins or peptides (Figure S2). If the combination of the filters is not properly designed, then the resulting protein set may vary depending on the order in which the filters are performed (Figures S3−S5). Therefore, to properly apply the filters, we developed a method based on the representation of peptide and protein relationships as a bipartite graph G = (Vp, E

DOI: 10.1021/acs.jproteome.8b00711 J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research

Figure 3. Different views of a project in PINT. (A) The project information view shows information about the selected project, such as experimental conditions, MS runs, and samples. By selecting each of these project items, it will show the number of proteins, genes, peptides, and PSMs associated with it. (B) The data set view shows several tables with the actual data, in this case, a table with the proteins (top) and a table with the peptides of the selected protein (bottom). (C) List of filters/query commands available to use in the queries. In this menu, the user will find more options such as annotation types, score types, ratio names, and experimental condition names to use in the queries but will also be able to customize the tables (D) by adding or removing columns. (E) Panel showing how the proteins mapped to protein Q4VXU2 are shared by other proteins. Different colors mean different levels of evidence at protein and peptide levels, according to the PAnalyzer algorithm.

Vq, E), where Vp is the set of peptides, Vq is the set of proteins, and ep,q ∈ E is the set of all edges connecting a protein vq ∈ Vq to a peptide vp ∈ Vp. In this representation, the logical combination of all filters f ∈ F is first applied sequentially and iteratively to all edges ep,q, until there are no more edges that are discarded in a whole round (Figure S6). Specifically, all filters are first applied to the G. This may result in the removal of some edges. An edge ep,q will be discarded when the protein vq and the peptide vp do not comply the logical combination of filters F. Each filter f, by definition, will be applied either to the protein vq or to the peptide vp. For example, the filter AN will be applied to only proteins, the filter SEQ will be applied to only peptides, and the filter SC will be applied either to proteins or to peptides depending on how the user define it. (See Table 1 for the list of filters.) Because these removals may cause filters that were previously true, such as a protein being identified by at least three peptides, to become false, the filters are therefore reapplied, and this process is repeated until no more edges can be removed when applying an entire set of filters. This query system allows PINT to establish any possible logical combination of filters specifically designed to be applied to experimental proteomics features of proteins or peptides, to the experimental design, or to specific annotations coming from external resources such as UniProtKB. A comprehensive description of the available query filters is described in Table 1.

conditions, MS runs, and samples. Upon the selection of one of them, PINT will show the number of proteins, genes, peptides, and PSMs as well as the number of experimental conditions, samples, and MS runs that are associated with that item. Then, after loading a project or receiving the results of a query, a set of tables will be dynamically shown in the data set view tab. Each of these tables represent different levels of aggregation: protein groups, individual proteins, and peptides. When the user selects a row in the protein group or individual protein tables, the associated peptides will be dynamically loaded in a peptide table at the bottom. All tables are sortable by any column and paginated so that the browser can efficiently load and display large tables (Figure 3B). Additionally, users can customize the columns of the tables by enabling/disabling them from a list of all available columns in one of the controls on the left side of the page (protein group columns, protein columns, and peptide columns). At any point, the user is able to submit a new query of the data from the query tab. The filters available to the users are shown in the top left menu, and some examples of the values allowed for the filters are also shown in menus on the left side (Figure 3C). These include the names of the experimental conditions, ratios, scores, and also the name of the UniProt annotation types used in the filter Complex Annotation (CAN). The results will be automatically displayed in the data set view tab after being processed by the server. Proteomes are built directly upon request, meaning that depending on the peptide set the user is investigating, the proteome will be automatically created according to all of the possible proteins explained by those peptides. Changing the population of peptides in the data sets by applying any of the filters may result in a distinct set of proteins explained by these peptides. PAnalyzer,39 the protein grouping algorithm implemented in PINT, automatically clusters the proteins in protein groups depending on the uniqueness of their peptides and distinguishes different protein group types depending on their evidence (conclusive,

3.3. Visualizing Data Sets

PINT offers an advanced and easy-to-use graphical user interface through the Web, compatible with any Web browser. The interface allows the user to select data sets from a list of available data sets and visualize them in a single page organized in different interactive tabs and tables. In the project information tab (Figure 3A), PINT will show all of the experimental design characteristics associated with each of the selected projects. The user can see all of the details with regards to the experimental design, such as experimental F

DOI: 10.1021/acs.jproteome.8b00711 J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research

Figure 4. Enrichment analysis tools integrated in PINT. (A) PSEA-Quant tool and (B) Reactome pathway enrichment analysis tool are fully integrated into PINT. Whereas PSEA-Quant results are sent by e-mail, Reactome analysis results are loaded automatically in a table sorted by the false discovery rate (FDR) of pathway enrichment significance. Clicking on one pathway automatically selects that pathway in the pathway browser, and clicking again in that pathway opens the detailed view of the pathway, including all of the biological entities involved in it.

ambiguous, indistinguishable, or nonconclusive).40 Thereby, depending on the peptide set retrieved by a query, proteins will be grouped differently, and different types of evidence at the protein level will be shown. PINT also offers a view to display each protein group, showing the set of proteins that share peptides (Figure 3D). This feature is useful for the inspection of the analysis of samples containing different proteoforms or members of protein families, which will be shown grouped in a single row in the protein group table unless a unique peptide exists, in which case the protein will be shown separated and tagged as conclusive.

then be explored with the rich Reactome pathway visualization tool that is also embedded in the interface, which shows all of the pathways described in the Reactome repository and highlights the ones considered statistically significantly enriched in the query. With the integration of the Reactome pathway analysis and the Reactome pathway browser, to our knowledge, PINT is the first Reactome partner using their GWT widgets (see Reactome partners at https://reactome. org/community/partners), proving its rich functionality with minor migration efforts.

3.4. Enrichment Analyses in PINT

PINT is intended to be a local repository. For example, our group, in addition to submitting data sets to public repositories, provides our proteomics data sets through our PINT server at http://pint.scripps.edu. This allow external groups to more easily investigate and query the data sets produced in our group. Any laboratory is encouraged to download and install PINT to host their own data sets and to share them with the public in a richer way than with other public repositories. Nevertheless, PINT allows the users to decide whether they want to share the data sets. PINT can therefore also play the role of a private repository for a laboratory or for a group of laboratories involved in a common proteomics project. Our PINT server hosts a number of data sets associated with works that have been published during the recent years: such as “_CFTR_” (http://pint.scripps.edu/?project=03f24aa0391e68e4),41 “PALM” (http://pint.scripps.edu/?project=f302f024b4d5aa1f),42 “DmDshybrids2014” (http://pint.scripps.edu/?project=be0e9d3b59caf982a1c02edef0468d57),43 and “Alzheimer” (http://pint.scripps.edu/?project=3d7c1ac078930a798a07c6a397bd21ef).44 More information about each of these projects can be found in the Supporting Information.

3.5. Projects Stored in Our PINT Server

Additionally, PINT includes two different enrichment analysis tools: PSEA-Quant and Reactome pathway analysis (Figure 4). Both are fully integrated in the interface of PINT and are able to send ad hoc queries to the corresponding analysis service with the proteins of interest to the user. PSEQ-Quant is an enrichment analysis tool.20,21 PSEAQuant identifies protein sets from the Gene Ontology and Molecular Signature databases that are statistically significantly enriched with abundant proteins, which are measured with high reproducibility across a set of provided replicates. This tool is found in the PSEA-Quant tab (Figure 4A), and once the user selects the replicates and the PSEQ-Quant parameters, any data set in PINT containing quantitative values (absolute or relative) across multiple replicates can be analyzed. The results will be sent by e-mail to the e-mail address provided in the input parameters, as PSEA-Quant normally does. PINT facilitates submission to PSEA-Quant by linking the actual experimental data to the experimental design details, thereby allowing the user to rapidly send to PSEA-Quant the list of proteins and also the associated quantitative values from multiple replicates. The pathway enrichment analysis and the pathway visualization tool from Reactome22,23 are available in another tab. Users can quickly identify which pathways are significantly enriched in the list of proteins loaded by sending the query to the Reactome server with a simple click, and the results are received almost instantaneously (Figure 4B). The results can

4. DISCUSSION We presented a user-friendly Web application that is able to store almost any proteomics experiment regardless of its complexity. Unlike currently available public proteomicsG

DOI: 10.1021/acs.jproteome.8b00711 J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research ORCID

specific repositories, PINT keeps the experimental design tightly linked to the proteins and peptides stored, so that richer and more elaborate queries can be done over the data sets stored. In addition to that, we automatically link each stored protein to its annotations in UniProtKB so that data sets can be filtered by selecting protein annotations of interest. The Web user interface allows the user to efficiently load, query, and explore big data sets, providing fully customizable tables for proteins and peptides that can be sorted by any column. PINT is not intended to replace any public proteomics data repository. First, PINT is not intended to exist as a unique online instance; rather, it is designed to be a system that can be installed by any laboratory to store, share, and dig more in detail the data sets created in such a laboratory. Then, although one of its main features is to share data sets with the scientific community, PINT does not accept raw data files as all of the proteomics repositories do, not only because it does not perform any data processing with them but also because it is focused on the actual results of the experiments. Furthermore, to store raw files would require a storage infrastructure in the backend that would limit its usage. Additionally, PINT provides a proteomics-specific query system that is not provided by any of the public proteomics repositories. Currently, PINT is limited by its persistence module (database and access to it) to manage the number of data sets that are, for example, stored at PRIDE (currently 89 728 assays). As an open-source project, PINT is constantly evolving. We intend to incorporate new features on a constant basis, such as new query options, new columns in the data view tables, and new protein or peptide annotations. We will support additional available external analysis services, database queries, or portable visualization tools that will make PINT more comprehensive and useful for the exploration of proteomics data sets. Additionally, we also plan to support standard data format files, such as the ones defined in the HUPO Proteomics Standards Initiative,38,45 as input data files, which will also facilitate the incorporation of more data sets from the proteomics community.



Salvador Martínez-Bartolomé: 0000-0001-7592-5612 Tom Casimir Bamberger: 0000-0002-3830-4486 Mathieu Lavallée-Adam: 0000-0003-2124-3872 Daniel B. McClatchy: 0000-0002-0288-5645 John R. Yates, III: 0000-0001-5267-1672 Notes

The authors declare no competing financial interest. The PINT tool can be visited through the Webpage (http:// pint.scripps.edu) and is available to be installed and serve as a local repository for any laboratory at https://github.com/ proteomicsyates/PINT.



ACKNOWLEDGMENTS We thank C. Delahunty for critical reading of the manuscript and the Reactome team for helping with the Reactome enrichment analysis integration. We acknowledge the U.S. National Institutes of Health (U54GM114833, P41GM103533, and R01MH067880).



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jproteome.8b00711. Figure S1. Graphical user interface to import a data set in which the user uses Excel files as import files. Different approaches for the application of proteomicsspecific filters. Figure S2. Notation and representation of an example of a query. Figure S3. Application of the first approach leading to incorrect results. Figure S4. Application of the second approach leading to incorrect results depending on the order in which proteins are iterated. Figure S5. Application of the third approach leading to incorrect results. Figure S6. Illustration of the proposed method implemented on PINT to apply logical combination of proteomics-specific filters (PDF)



REFERENCES

(1) Huttlin, E. L.; Ting, L.; Bruckner, R. J.; Gebreab, F.; Gygi, M. P.; Szpyt, J.; Tam, S.; Zarraga, G.; Colby, G.; Baltier, K.; Dong, R.; Guarani, V.; Vaites, L. P.; Ordureau, A.; Rad, R.; Erickson, B. K.; Wuhr, M.; Chick, J.; Zhai, B.; Kolippakkam, D.; Mintseris, J.; Obar, R. A.; Harris, T.; Artavanis-Tsakonas, S.; Sowa, M. E.; De Camilli, P.; Paulo, J. A.; Harper, J. W.; Gygi, S. P. The BioPlex Network: A Systematic Exploration of the Human Interactome. Cell 2015, 162 (2), 425−440. (2) Omenn, G. S.; Lane, L.; Lundberg, E. K.; Overall, C. M.; Deutsch, E. W. Progress on the HUPO Draft Human Proteome: 2017 Metrics of the Human Proteome Project. J. Proteome Res. 2017, 16 (12), 4281−4287. (3) Wilhelm, M.; Schlegl, J.; Hahne, H.; Gholami, A. M.; Lieberenz, M.; Savitski, M. M.; Ziegler, E.; Butzmann, L.; Gessulat, S.; Marx, H.; Mathieson, T.; Lemeer, S.; Schnatbaum, K.; Reimer, U.; Wenschuh, H.; Mollenhauer, M.; Slotta-Huspenina, J.; Boese, J. H.; Bantscheff, M.; Gerstmair, A.; Faerber, F.; Kuster, B. Mass-spectrometry-based draft of the human proteome. Nature 2014, 509 (7502), 582−7. (4) Kim, M. S.; Pinto, S. M.; Getnet, D.; Nirujogi, R. S.; Manda, S. S.; Chaerkady, R.; Madugundu, A. K.; Kelkar, D. S.; Isserlin, R.; Jain, S.; Thomas, J. K.; Muthusamy, B.; Leal-Rojas, P.; Kumar, P.; Sahasrabuddhe, N. A.; Balakrishnan, L.; Advani, J.; George, B.; Renuse, S.; Selvan, L. D.; Patil, A. H.; Nanjappa, V.; Radhakrishnan, A.; Prasad, S.; Subbannayya, T.; Raju, R.; Kumar, M.; Sreenivasamurthy, S. K.; Marimuthu, A.; Sathe, G. J.; Chavan, S.; Datta, K. K.; Subbannayya, Y.; Sahu, A.; Yelamanchi, S. D.; Jayaram, S.; Rajagopalan, P.; Sharma, J.; Murthy, K. R.; Syed, N.; Goel, R.; Khan, A. A.; Ahmad, S.; Dey, G.; Mudgal, K.; Chatterjee, A.; Huang, T. C.; Zhong, J.; Wu, X.; Shaw, P. G.; Freed, D.; Zahari, M. S.; Mukherjee, K. K.; Shankar, S.; Mahadevan, A.; Lam, H.; Mitchell, C. J.; Shankar, S. K.; Satishchandra, P.; Schroeder, J. T.; Sirdeshmukh, R.; Maitra, A.; Leach, S. D.; Drake, C. G.; Halushka, M. K.; Prasad, T. S.; Hruban, R. H.; Kerr, C. L.; Bader, G. D.; Iacobuzio-Donahue, C. A.; Gowda, H.; Pandey, A. A draft map of the human proteome. Nature 2014, 509 (7502), 575−81. (5) Nilsson, T.; Mann, M.; Aebersold, R.; Yates, J. R., 3rd; Bairoch, A.; Bergeron, J. J. Mass spectrometry in high-throughput proteomics: ready for the big time. Nat. Methods 2010, 7 (9), 681−5. (6) Villavicencio-Diaz, T. N.; Rodriguez-Ulloa, A.; Guirola-Cruz, O.; Perez-Riverol, Y. Bioinformatics tools for the functional interpretation of quantitative proteomics results. Curr. Top. Med. Chem. 2014, 14 (3), 435−49.

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Tel: 858-784-8863. Fax: 858-7848883. H

DOI: 10.1021/acs.jproteome.8b00711 J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research (7) Martens, L.; Kohlbacher, O.; Weintraub, S. T. Managing expectations when publishing tools and methods for computational proteomics. J. Proteome Res. 2015, 14 (5), 2002−4. (8) Karimpour-Fard, A.; Epperson, L. E.; Hunter, L. E. A survey of computational tools for downstream analysis of proteomic and other omic datasets. Hum Genomics 2015, 9, 28. (9) Vizcaino, J. A.; Deutsch, E. W.; Wang, R.; Csordas, A.; Reisinger, F.; Rios, D.; Dianes, J. A.; Sun, Z.; Farrah, T.; Bandeira, N.; Binz, P. A.; Xenarios, I.; Eisenacher, M.; Mayer, G.; Gatto, L.; Campos, A.; Chalkley, R. J.; Kraus, H. J.; Albar, J. P.; Martinez-Bartolome, S.; Apweiler, R.; Omenn, G. S.; Martens, L.; Jones, A. R.; Hermjakob, H. ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat. Biotechnol. 2014, 32 (3), 223−6. (10) Vizcaino, J. A.; Cote, R. G.; Csordas, A.; Dianes, J. A.; Fabregat, A.; Foster, J. M.; Griss, J.; Alpi, E.; Birim, M.; Contell, J.; O’Kelly, G.; Schoenegger, A.; Ovelleiro, D.; Perez-Riverol, Y.; Reisinger, F.; Rios, D.; Wang, R.; Hermjakob, H. The PRoteomics IDEntifications (PRIDE) database and associated tools: status in 2013. Nucleic acids research 2012, 41, D1063−9. (11) Craig, R.; Cortens, J. P.; Beavis, R. C. Open source system for analyzing, validating, and storing protein identification data. J. Proteome Res. 2004, 3 (6), 1234−42. (12) Desiere, F.; Deutsch, E. W.; King, N. L.; Nesvizhskii, A. I.; Mallick, P.; Eng, J.; Chen, S.; Eddes, J.; Loevenich, S. N.; Aebersold, R. The PeptideAtlas project. Nucleic acids research 2006, 34 (90001), D655−D658. (13) Wang, M.; Wang, J.; Carver, J.; Pullman, B. S.; Cha, S. W.; Bandeira, N. Assembling the Community-Scale Discoverable Human Proteome. Cell Syst 2018, 7, 412. (14) Vizcaino, J. A.; Csordas, A.; Del-Toro, N.; Dianes, J. A.; Griss, J.; Lavidas, I.; Mayer, G.; Perez-Riverol, Y.; Reisinger, F.; Ternent, T.; Xu, Q. W.; Wang, R.; Hermjakob, H. 2016 update of the PRIDE database and its related tools. Nucleic Acids Res. 2016, 44 (22), 11033. (15) Sharma, V.; Eckels, J.; Schilling, B.; Ludwig, C.; Jaffe, J. D.; MacCoss, M. J.; MacLean, B. Panorama Public: A Public Repository for Quantitative Data Sets Processed in Skyline. Mol. Cell. Proteomics 2018, 17 (6), 1239−1244. (16) Schmidt, T.; Samaras, P.; Frejno, M.; Gessulat, S.; Barnert, M.; Kienegger, H.; Krcmar, H.; Schlegl, J.; Ehrlich, H. C.; Aiche, S.; Kuster, B.; Wilhelm, M. ProteomicsDB. Nucleic Acids Res. 2018, 46 (D1), D1271−D1281. (17) MacLean, B.; Tomazela, D. M.; Shulman, N.; Chambers, M.; Finney, G. L.; Frewen, B.; Kern, R.; Tabb, D. L.; Liebler, D. C.; MacCoss, M. J. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 2010, 26 (7), 966−8. (18) Helsens, K.; Colaert, N.; Barsnes, H.; Muth, T.; Flikka, K.; Staes, A.; Timmerman, E.; Wortelkamp, S.; Sickmann, A.; Vandekerckhove, J.; Gevaert, K.; Martens, L. ms_lims, a simple yet powerful open source laboratory information management system for MS-driven proteomics. Proteomics 2010, 10 (6), 1261−4. (19) Barillari, C.; Ottoz, D. S.; Fuentes-Serna, J. M.; Ramakrishnan, C.; Rinn, B.; Rudolf, F. openBIS ELN-LIMS: an open-source database for academic laboratories. Bioinformatics 2016, 32 (4), 638−40. (20) Lavallee-Adam, M.; Rauniyar, N.; McClatchy, D. B.; Yates, J. R., 3rd PSEA-Quant: a protein set enrichment analysis on label-free and label-based protein quantification data. J. Proteome Res. 2014, 13 (12), 5496−509. (21) Lavallee-Adam, M.; Yates, J. R., 3rd Using PSEA-Quant for Protein Set Enrichment Analysis of Quantitative Mass SpectrometryBased Proteomics. Curr. Protoc Bioinformatics 2016, 53, 13.28.1. (22) Fabregat, A.; Jupe, S.; Matthews, L.; Sidiropoulos, K.; Gillespie, M.; Garapati, P.; Haw, R.; Jassal, B.; Korninger, F.; May, B.; Milacic, M.; Roca, C. D.; Rothfels, K.; Sevilla, C.; Shamovsky, V.; Shorser, S.; Varusai, T.; Viteri, G.; Weiser, J.; Wu, G.; Stein, L.; Hermjakob, H.; D’Eustachio, P. The Reactome Pathway Knowledgebase. Nucleic Acids Res. 2018, 46 (D1), D649−D655. (23) Fabregat, A.; Sidiropoulos, K.; Garapati, P.; Gillespie, M.; Hausmann, K.; Haw, R.; Jassal, B.; Jupe, S.; Korninger, F.; McKay, S.;

Matthews, L.; May, B.; Milacic, M.; Rothfels, K.; Shamovsky, V.; Webber, M.; Weiser, J.; Williams, M.; Wu, G.; Stein, L.; Hermjakob, H.; D’Eustachio, P. The Reactome pathway Knowledgebase. Nucleic Acids Res. 2016, 44 (D1), D481−7. (24) Kerrien, S.; Aranda, B.; Breuza, L.; Bridge, A.; Broackes-Carter, F.; Chen, C.; Duesbury, M.; Dumousseau, M.; Feuermann, M.; Hinz, U.; Jandrasits, C.; Jimenez, R. C.; Khadake, J.; Mahadevan, U.; Masson, P.; Pedruzzi, I.; Pfeiffenberger, E.; Porras, P.; Raghunath, A.; Roechert, B.; Orchard, S.; Hermjakob, H. The IntAct molecular interaction database in 2012. Nucleic Acids Res. 2012, 40, D841−6. (25) Meldal, B. H.; Forner-Martinez, O.; Costanzo, M. C.; Dana, J.; Demeter, J.; Dumousseau, M.; Dwight, S. S.; Gaulton, A.; Licata, L.; Melidoni, A. N.; Ricard-Blum, S.; Roechert, B.; Skyzypek, M. S.; Tiwari, M.; Velankar, S.; Wong, E. D.; Hermjakob, H.; Orchard, S. The complex portal–an encyclopaedia of macromolecular complexes. Nucleic Acids Res. 2015, 43 (Database issue), D479−D484. (26) Griss, J.; Foster, J. M.; Hermjakob, H.; Vizcaino, J. A. PRIDE Cluster: building a consensus of proteomics data. Nat. Methods 2013, 10 (2), 95−6. (27) Griss, J.; Perez-Riverol, Y.; Lewis, S.; Tabb, D. L.; Dianes, J. A.; Del-Toro, N.; Rurik, M.; Walzer, M. W.; Kohlbacher, O.; Hermjakob, H.; Wang, R.; Vizcaino, J. A. Recognizing millions of consistently unidentified spectra across hundreds of shotgun proteomics datasets. Nat. Methods 2016, 13 (8), 651−656. (28) The UniProt Consortium. UniProt: a hub for protein information. Nucleic acids research 2015, 43 (D1), D204−D212. (29) Lopez, R.; Cowley, A.; Li, W.; McWilliam, H. Using EMBL-EBI Services via Web Interface and Programmatically via Web Services. Curr. Prot. Bioinf. 2014, 48, 3.12.1−3.12.50. (30) Amberger, J. S.; Bocchini, C. A.; Schiettecatte, F.; Scott, A. F.; Hamosh, A. OMIM.org: Online Mendelian Inheritance in Man (OMIM(R)), an online catalog of human genes and genetic disorders. Nucleic acids research 2015, 43 (Databaseissue), D789−D798. (31) Cociorva, D.; Tabb, D. L.; Yates, J. R. Validation of tandem mass spectrometry database search results using DTASelect. Curr. Prot. Bioinf. 2007, 16 (1), 13.4.1−13.4.14. (32) Tabb, D. L.; McDonald, W. H.; Yates, J. R., 3rd DTASelect and Contrast: tools for assembling and comparing protein identifications from shotgun proteomics. J. Proteome Res. 2002, 1 (1), 21−6. (33) Xu, T.; Venable, J. D.; Park, S. K.; Cociorva, D.; Lu, B.; Liao, L.; Wohlschlegel, J.; Hewel, J.; Yates, J. R., 3rd ProLuCID, a Fast and Sensitive Tandem Mass Spectra-based Protein Identification Program. Mol. Cell. Proteomics 2006, 5, S174. (34) Eng, J.; McCormack, A. L.; Yates, J. R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 1994, 5, 976−89. (35) Park, S. K.; Venable, J. D.; Xu, T.; Yates, J. R., 3rd A quantitative analysis software tool for mass spectrometry-based proteomics. Nat. Methods 2008, 5 (4), 319−22. (36) Bamberger, C.; Pankow, S.; Park, S. K.; Yates, J. R., 3rd Interference-free proteome quantification with MS/MS-based isobaric isotopologue detection. J. Proteome Res. 2014, 13 (3), 1494−501. (37) Orchard, S.; Hermjakob, H. Data standardization by the HUPO-PSI: how has the community benefitted? Methods Mol. Biol. 2011, 696, 149−60. (38) Vizcaino, J. A.; Mayer, G.; Perkins, S.; Barsnes, H.; Vaudel, M.; Perez-Riverol, Y.; Ternent, T.; Uszkoreit, J.; Eisenacher, M.; Fischer, L.; Rappsilber, J.; Netz, E.; Walzer, M.; Kohlbacher, O.; Leitner, A.; Chalkley, R. J.; Ghali, F.; Martinez-Bartolome, S.; Deutsch, E. W.; Jones, A. R. The mzIdentML Data Standard Version 1.2, Supporting Advances in Proteome Informatics. Mol. Cell. Proteomics 2017, 16 (7), 1275−1285. (39) Prieto, G.; Aloria, K.; Osinalde, N.; Fullaondo, A.; Arizmendi, J. M.; Matthiesen, R. PAnalyzer: a software tool for protein inference in shotgun proteomics. BMC bioinformatics 2012, 13, 288. (40) Nesvizhskii, A. I.; Aebersold, R. Interpretation of shotgun proteomic data: the protein inference problem. Mol. Cell. Proteomics 2005, 4 (10), 1419−40. I

DOI: 10.1021/acs.jproteome.8b00711 J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research (41) Pankow, S.; Bamberger, C.; Calzolari, D.; Martinez-Bartolome, S.; Lavallee-Adam, M.; Balch, W. E.; Yates, J. R., 3rd F508 CFTR interactome remodelling promotes rescue of cystic fibrosis. Nature 2015, 528 (7583), 510−6. (42) McClatchy, D. B.; Ma, Y.; Liu, C.; Stein, B. D.; MartinezBartolome, S.; Vasquez, D.; Hellberg, K.; Shaw, R. J.; Yates, J. R., 3rd Pulsed Azidohomoalanine Labeling in Mammals (PALM) Detects Changes in Liver-Specific LKB1 Knockout Mice. J. Proteome Res. 2015, 14 (11), 4815−22. (43) Bamberger, C.; Martinez-Bartolome, S.; Montgomery, M.; Lavallee-Adam, M.; Yates, J. R., 3rd Increased proteomic complexity in Drosophila hybrids during development. Sci. Adv. 2018, 4 (2), eaao3424. (44) Savas, J. N.; Wang, Y. Z.; DeNardo, L. A.; Martinez-Bartolome, S.; McClatchy, D. B.; Hark, T. J.; Shanks, N. F.; Cozzolino, K. A.; Lavallee-Adam, M.; Smukowski, S. N.; Park, S. K.; Kelly, J. W.; Koo, E. H.; Nakagawa, T.; Masliah, E.; Ghosh, A.; Yates, J. R., 3rd Amyloid Accumulation Drives Proteome-wide Alterations in Mouse Models of Alzheimer’s Disease-like Pathology. Cell Rep. 2017, 21 (9), 2614− 2627. (45) Deutsch, E. W.; Albar, J. P.; Binz, P. A.; Eisenacher, M.; Jones, A. R.; Mayer, G.; Omenn, G. S.; Orchard, S.; Vizcaino, J. A.; Hermjakob, H. Development of data representation standards by the human proteome organization proteomics standards initiative. J. Am. Med. Inform Assoc 2015, 22 (3), 495−506.

J

DOI: 10.1021/acs.jproteome.8b00711 J. Proteome Res. XXXX, XXX, XXX−XXX