PPIExp: A Web-Based Platform for Integration and Visualization of

Dec 19, 2018 - Integrating spatiotemporal proteomics data with protein–protein interaction (PPI) data can help researchers make an in-depth explorat...
0 downloads 0 Views 1MB Size
Subscriber access provided by University of South Dakota

Article

PPIExp: a web-based platform for integration and visualization of protein-protein interaction data and spatiotemporal proteomics data Xian Liu, Cheng Chang, Mingfei Han, Ronghua Yin, Yiqun Zhan, Changyan Li, Changhui Ge, Miao Yu, and Xiaoming Yang J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.8b00713 • Publication Date (Web): 19 Dec 2018 Downloaded from http://pubs.acs.org on December 21, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

PPIExp: a web-based platform for integration and visualization of protein-protein interaction data and spatiotemporal proteomics data Xian Liu†, Cheng Chang†,*, Mingfei Han†, Ronghua Yin†, Yiqun Zhan†, Changyan Li†, Changhui Ge† , Miao Yu†,*, and Xiaoming Yang†,* †State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing 102206, P.R. China * Cheng Chang, [email protected]; Miao Yu, [email protected]; Xiaoming Yang, [email protected]

ABSTRACT: Integrating spatiotemporal proteomics data with protein-protein interaction (PPI) data can help researchers make an in-depth exploration of their proteins of interest in a dynamic manner. However, there is still a lack of proper tools for the biologists who usually have few programming skills to construct a PPI network for a protein list, visualize active PPI subnetworks and then select key nodes for further study. Here, we propose a web-based platform named PPIExp, which can automatically construct a PPI network, perform clustering analysis according to protein abundances, and perform functional enrichment analysis. More importantly, it provides multiple effective visualization interfaces, such as the interface to display the PPI

ACS Paragon Plus Environment

1

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 24

network map, the interface to display a dendrogram & heatmap for the clustering result, and the interface to display the expression pattern of a selected protein. To visualize the active PPI subnetworks in specific space or time, it provides buttons to highlight the differentiallyexpressed proteins in each condition on the network map. Additionally, to help researchers determine which proteins are worth further attention, PPIExp provides extensive one-click interactive operations to map node centrality measures to node size on the network and highlight three types of proteins, i.e., the proteins in an enriched functional term, the co-expressed proteins selected from the dendgrogram & heatmap, and the proteins input by users. PPIExp is available at http://www.fgvis.com/expressvis/PPIExp.

KEYWORDS: spatiotemporal proteomics data; protein-protein interaction network; data visualization; interactive visualization; data integration.

INTRODUCTION In vivo, proteins rarely act alone, but physically interact with each other and form protein complexes to complete the coordinated biological processes1. In specific space or time, only a subset protein-protein interactions (PPIs) are active. Thus, dissecting the active PPIs in specific space or time becomes a key step to explore the higher-order biological processes and can help researchers further understand their proteins of interest. The activities of PPIs are affected by multiple factors, including protein abundances. In recent years, with the development of proteomic-related technologies (e.g. sample preparation, mass-spectrometry and data analysis), researchers can make a genome-wide and accurate measurement of protein abundances. The spatiotemporal proteomics data can be used to temporally profile protein abundances during the biological processes that change over time, such as the developmental process and the disease

ACS Paragon Plus Environment

2

Page 3 of 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

progression, or spatially profile protein abundances across different tissues or different types of cells.2-6 Integrating spatiotemporal proteomics data and PPI data can help researchers learn about the active PPIs in specific time or space and extract more biological knowledge from their data.7,8 This type of integration poses three problems for researchers: the first is how to construct a PPI network for a protein list conveniently; the second is how to visualize the active PPI subnetworks in specific space or time; and the third is how to discover key nodes from the PPI network for further study. As for constructing PPI network, researchers have developed multiple PPI databases to collect and store PPI data.9-13 In the websites of these databases, users can browser one protein’s partners and visualize their PPI network. As for visualizing PPI networks, researchers have developed multiple tools,14 such as Gephi,15 Networkx,16 Cytoscape17 and NAViGaTOR18. To visualize active PPI subnetworks, users have to take a series of snapshots of the network and merge them into a video using standalone softwares,19 or generate networks maps and merge them into a video using programming scripts.16,20 In these methods, users cannot interactively visualize active PPI subnetworks. To discover key nodes, network parameters, protein expression information and protein annotations are required. However, the tools integrating all the information and providing extensive interactive visualization operations to map different kinds of information to network styles are limited. In summary, although there have been multiple tools for each problem mentioned above, there is still a critical need for an easy-to-use and integrated platform with extensive interactive visualization operations. Here, we introduce PPIExp, a light-weight and biologist-friendly web-based platform to integrate spatiotemporal proteomic data and PPI data. It provides user-friendly interfaces to construct a PPI network for a protein list and visualize the active PPI subnetworks in specific

ACS Paragon Plus Environment

3

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 24

space or time. Additionally, it integrates functional enrichment analysis, network topology characteristics and protein abundances to help users to discover key nodes. Specifically, PPIExp has the following new features: (i) It allows users to conveniently construct a PPI network for the proteins in a user-input protein list or the proteins in a user-selected pathway or gene ontology (GO) term. (ii) It performs functional enrichment analysis for the proteins used in constructing the PPI network and provides interactive operations for users to highlight proteins in an enriched annotation term on the network map. (iii) It performs clustering analysis for the proteins used in constructing the PPI network according to their abundances, then presents simultaneously the PPI network and the dendrogram & heatmap of the clustering result, and provides interactive operations for users to select co-expressed proteins from the dendgrogram and highlight them on the network map. (iv) It computes and displays different kinds of node centrality measures, and allows users to map node centrality measures to node size on the network map. Taking advantage of these visual interfaces and interactive operations implemented in PPIExp, biologists, including those with few programming skills, can conveniently visualize active PPI subnetworks and select proteins that are worth further attention.

MATERIALS AND METHODS Architecture The architecture of PPIExp is based on the client – server website design (Figure 1). The frontend client communicates with the back-end using AJAX calls. The front-end website provides two types of interfaces, one to load input data and another to display the results from the backend. The input data are processed into multiple requests, which are sent to the back-end server with AJAX calls. The back-end database stores PPI data and functional annotation data, and contains multiple RESTful APIs, including hierarchical clustering API, PPI network construction

ACS Paragon Plus Environment

4

Page 5 of 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

API and functional enrichment API. After receiving the requests, these APIs load necessary data from the database, process the data, and return results to the front-end. The front-end website displays the results with maps, e.g. network map, heatmap and scatter map. More importantly, the front-end website provides extensive interactive operations within the maps and between the maps.

Figure 1. The architecture and implementation of PPIExp. The lines with solid arrows represent requests or responses. The texts on the lines with hollow arrows represent the library used to develop the corresponding interfaces. The lines with hollow arrows between the interfaces represent the cross-talk interactive operations.

Implementation We hosted the back-end server with Nginx. The requests from the front-end website are first sent to Nginx and then sent to RESTful APIs. We developed the RESTful APIs based on Django

ACS Paragon Plus Environment

5

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 24

and Django REST frame work. For the hierarchical clustering API and functional enrichment API, we used Scipy library to implement most of the numerical computation functions except the clustering function. For the clustering function, to make clustering process faster, we used the function in fastcluster.21 In the functional enrichment API, we implemented a function to calculate the p-values with hypergeometric distribution. In the hierarchical clustering API, we implemented a function to calculate the distance between two proteins with Pearson correlation. The matrix used in clustering analysis is the abundance value matrix received from the front-end. We developed the front-end website with the latest techniques of website development. We chose Angular version 4.0.0 as the basic framework and used multiple libraries to implement the interfaces. The interface for visualizing PPI network map was developed on the basis of cytoscape.js;22 the interface for visualizing dendrogram & heatmap and scatter map was developed on the basis of d3.js; the interface for displaying functional enrichment results was developed on the basis of ngx-datatable (Figure 1). In the front-end of PPIExp, the interactive operations are the most important features. These libraries provide necessary interactive operations within one interface: Cytoscape.js and d3.js provide basic interactive operations to manipulate the maps, such as zooming in or out, and dragging; ngx-datatable provides basic interactive operations to manipulate the data table, such as sorting and selection. But these libraries do not provide cross-talk operations between interfaces, such as selecting proteins from the dendrogram & heatmap and highlighting them on the network map. We used the techniques of Angular and reactive functionally programming (RXJS) to implement the cross-talk operations between interfaces.

ACS Paragon Plus Environment

6

Page 7 of 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Data Sources The PPI data used in constructing PPI network can be those uploaded by the user or those stored in the back-end server. The PPI data stored in the server were downloaded from Mentha database.9 Mentha is an integrated database; it collects PPI data from the manually curated PPI databases that have adhered to IMEx consortium.23 PPIExp allows users to upload their own PPI data, otherwise, it uses the data from Mentha database by default. The functional data were downloaded from Homer database.24 The annotation types used in the functional enrichment analysis implemented in PPIExp include KEGG pathway,25 Reactome pathway,26 Wiki pathway,27 and GO annotations,28 – namely biological process, molecular function and cellular component. The space-dependent proteomics data used in this study was downloaded from https://www.cell.com/cell-metabolism/fulltext/S1550-4131(14)00499-9 under the file name “Table S1. Identified and Quantified Proteins in HCTs”.2 This dataset provided a comprehensive and cell-type-resolved liver proteome; it quantified proteins across the whole liver, hepatocytes (HCs), Kuppffer cells (KCs), liver sinusoidal endothelial cells (LSECs), hepatic stellate cells (HSCs) and cholangiocytes (CHCs). In this dataset, each group contained four replicates. We used this dataset as an example and generated the analysis results shown in Figures 1 - 4. The time-dependent proteomics data used in this study was downloaded from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5587857/bin/supp_16_9_1548__index.html under the file names “Supplemental Table S2” and “Supplemental Table S7”. This dataset quantified proteins across the differentiation process from the induced pluripotent stem cell (iPS) to neurons (Neu).4 We used this dataset as a case study and generated the analysis results shown in Supplementary Figures S3, S4.

ACS Paragon Plus Environment

7

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 24

Process of the spatiotemporal proteomics data Before sending to the back-end server for clustering analysis, the values of spatiotemporal proteomics data are first log2 transformed, and then are z-score normalized.

Results Workflow PPIExp is a visual analytics platform, which automatically completes most of the work and helps users explore their data by interactive visual interfaces. Before using PPIExp, users should first select species of the experiment, import spatiotemporal proteomics data, define samples in each experiment group, and then load a protein list or select an annotation term (Figure 2A). Users can input a protein list through two ways, one is through annotation and another is through custom user-input. For the way through annotation, after users select an annotation type, PPIExp loads the annotation terms list from the back-end server, then users select an annotation term from the list, and then PPIExp loads proteins in the term from the back-end server. For the way through custom user-input, users first select identifier type and then paste protein identifiers to the input text area. After users input a protein list or obtain a protein list in the selected annotation term, the back-end server of PPIExp automatically constructs a PPI network, clusters the proteins according to their abundances, and performs functional enrichment analysis for the proteins (Figure 2A). The results are then sent to the front-end. The front-end presents the results from the back-end server with maps and tables (Figure 1, Figure 2B). As shown in Figure 2B, the front-end presents the PPIs as a network map at the left side of the window, the clustering results as dendrogram & heatmap at the right side. Below the network map, the front-end presents buttons for the experimental groups. Additionally, the frontend presents “annotation” button and “network centrality” button. Once the “annotation” button

ACS Paragon Plus Environment

8

Page 9 of 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

is clicked, the front-end displays the functional enrichment results in an interactive table (Supplementary Figure S1); Once the “network centrality” button is clicked, the front-end displays the nodes centrality in an interactive table (Supplementary Figure S1). After the results and necessary buttons are presented, the front-end provides extensive one-click interactive operations for users to visualize active PPI subnetworks and select key nodes that are worth further attention.

ACS Paragon Plus Environment

9

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 24

Figure 2. The workflow of PPIExp (A) and an example for the workflow (B). (A)The steps in boxes with solid lines require users’ operation in the front-end website, while those in boxes with dashed lines are tasks automatically completed in the back-end server. (B) In the example, after importing the example proteomic data and selecting Wnt signaling pathway, PPIExp displays the maps and buttons for users to explore the results.

ACS Paragon Plus Environment

10

Page 11 of 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Interactive operations for visualizing active PPI subnetworks PPIExp provides a convenient way for users to visualize the active PPI subnetworks in specific space or time. The activities of protein-protein interactions are affected by many factors, such as post-translational modification and protein abundances. PPIExp focuses on using protein abundances to model the activities of protein interactions. In specific time or space, the interactions whose target protein and source protein are up-regulated are assumed to be active while those are down-regulated are assumed to be inactive. Below the interface of PPI network map, PPIExp presents buttons for all the experimental groups (Figure 2B). When users click one button, the up-regulated proteins are highlighted as red and the down-regulated proteins are highlighted as green (Figure 3). Users can click the buttons one by one to highlight differentiallyexpressed proteins across all the experimental groups (Figure 3A-D). Through highlighting differentially-expressed proteins in sequence, researchers can determine what proteins form a protein complex in a specific condition. As shown in Figure 3A, we can obviously see that two proteins – Dvl1 and Vangl1 – interacted with each other, and they were both down-regulated in KC group (Figure 3A), which indicated that the two proteins might not function in KCs; another two proteins, Rac1 and Rhoa, were up-regulated in KC, LSEC, HSC, and CHC, which indicated that Rac1 and Rhoa might specifically function in non-parenchyma cells (i.e., KCs, LSECs, HSCs and CHCs). The differentially-expressed proteins used in visualizing the active PPI subnetworks in specific space or time are automatically generated based on user-defined fold change cutoffs. By default, PPIExp uses the first group as the baseline group, and identifies a protein is differentiallyexpressed if its fold-change is higher than 2 or lower than 0.5. Users can set the baseline group

ACS Paragon Plus Environment

11

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 24

and fold-change cutoffs according to their data. Additionally, PPIExp allows users to input differentially-expressed proteins generated by other tools.

Figure 3. Interactive operations to visualize the active PPI subnetworks in specific space or time. When users click the button for a specific group, the differentially-expressed proteins in this group are highlighted on the network map. Subfigures A-D represent the active PPI subnetwork in KC, LSEC, HSC, and CHC, respectively. Users can also visualize active PPI subnetworks by clicking the play button.

Interactive operations for selecting key nodes After constructing a PPI network, the most important thing is to extract biological insight from the network, and specifically, is to select key nodes for further study. The proteins associated with the nodes with high centrality may play key roles in vivo.29,30 Meanwhile, the co-expressed proteins or the proteins in a specific annotation term may form a protein complex and function

ACS Paragon Plus Environment

12

Page 13 of 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

together.31,32 So PPIExp provides interactive visualization operations for users to highlight the co-expressed proteins selected from the dendrogram & heatmap or the proteins in an annotation term on the network map, and map nodes centrality measures to node size (Figure 4). Additionally, PPIExp also provides interactive operations for users to highlight proteins of their interest on the network map. Users can overview the whole network and then dive into the details to find key nodes that are worth further attention. Researchers can determine if the co-expressed proteins form a protein complex using the operation that highlights co-expressed proteins on the network map (Figure 4A). For the PPI network in which one protein participates, highlighting the co-expressed proteins can help researchers determine which partners were co-expressed with this protein. As shown in Figure 4A, we can see that Dvl1 and Vangl1 interacted with each other and they were co-expressed.

ACS Paragon Plus Environment

13

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 24

Figure 4. Interactive operations for selecting key nodes. The texts under the red lines indicate the network styles that the corresponding operations modify. (A) Select a group of co-expressed proteins in the dendrogram & heatmap to highlight them on the network map, (B) Click one enriched annotation term to highlight the proteins in the annotation term, (C) Click the mapping button to map node centrality parameters to network style, (D) Click one protein set button to highlight proteins in the set. PPIExp performs functional enrichment analysis for the proteins in the network and provides operations for users to highlight the proteins in a specific GO/pathway term on the network map. Proteins belong to a certain GO/pathway term may form a protein complex and function together. When users click one annotation type, PPIExp presents the enriched terms in an interactive table, in which the terms can be ordered by p-value or annotation names. After users

ACS Paragon Plus Environment

14

Page 15 of 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

click one annotation term, PPIExp highlights the proteins in the annotation on the network map (Figure 4B, Supplementary Figure S1). As shown in Figure 4B, we can obviously see that many proteins (nodes with square shape) participate in Hippo signaling pathway, which indicated that Hippo signaling pathway and Wnt signaling pathway have cross-talks.33 PPIExp computes closeness centrality, betweenness centrality, and degree centrality for the nodes in a PPI network. Betweenness centrality measures the total number of nonredundant shortest paths going through a certain node. Degree centrality measures the total number of partners of a certain node. Closeness centrality measures the shortest paths a certain to all other nodes. Nodes with high centrality are more likely to be essential than proteins with low centrality.29,30 These centralities are presented in tables and PPIExp provides one-click operation for users to map centrality to node size: the higher of the centrality, the bigger of node size (Figure 4C, Supplementary Figure S1). In the example dataset, Ctnnb1 has the highest closeness centrality, which is in accordance with the fact that Ctnnb1 is the key regulator in Wnt signaling pathway.34 Additionally, users can input multiple protein lists and PPIExp would present the names of protein lists under the network map. Users can click one protein list and highlight the proteins in the list on the network map (Figure 4D). After filtering nodes and obtaining some candidate proteins, users can double click one protein in the heatmap and then PPIExp will present a scatter map showing the expression pattern of this protein (Supplementary Figure S2). In the scatter map, users can navigate to other databases – such as NCBI and Uniprot35 – to learn more about this protein (Supplementary Figure S2).

ACS Paragon Plus Environment

15

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 24

Comparison with existing tools Currently, to integrate spatiotemporal proteomics data with PPI data, users have to use different kinds of tools, i.e. PPI database browsers9-11,36, generic network visualization and analysis tools15,17,37, functional enrichment tools38-40 and clustering analysis tools41,42 (Figure 5). These tools can be combined to complete a task. For example, PPI database browsers, such as IntAct, BioGRID and STRING, are often used in combination with Cytoscape for constructing and visualizing PPI networks. These tools mostly focus on a specific task. In contrast, PPIExp contains the main features of these tools. For a protein list, PPIExp can automatically construct a PPI network, perform functional enrichment analysis and clustering analysis. Furthermore, PPIExp not only uses interfaces to present the results from different kinds of analysis, but also provides extensive interactive operations between the interfaces. Though some integrated tools,such as Metascape43 and NetworkAnalyst44, can perform different tasks, they lack cross-talk interactive operations (Figure 5). In PPIExp, users can select a group of coexpressed proteins from the dendrogram & heatmap (showing the results of clustering analysis) and highlight them on the network map; users can click an annotation term from the interactive table (showing the results of functional enrichment analysis) and highlight the proteins in the annotation term on the network map. Additionally, users can map the centrality measures to nodes parameters on the network map. With these easy-to-use and cross-talk operations, researchers can conveniently find key nodes for further study.

ACS Paragon Plus Environment

16

Page 17 of 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 5. Functionality comparison of PPIExp and 15 existing representative tools. The 15 tools are classified into five categories. For each category, at least two widely-used tools are selected. PPI database browsers: IntAct, BioGRID, STRING and Mentha; Generic network visualization and analysis tool: Cytoscape, Pajek, NAVIGaTOR and Gephi; Functional enrichment tools: David, Enrichr and g:Profiler; Clustering analysis and visualization tools: Java Treeview and clusterGrammer; Integration tools: Metascape and NetworkAnalyst. PPI database browsers, such as IntAct, BioGRID and STRING, are often used in combination with Cytoscape.

Conclusion and discussion In a cell, proteins function in different space and time through interacting with different partners. Intuitively visualizing the active PPI subnetworks in specific space or time can help researchers discover key proteins for further study. In this study, we developed PPIExp, a webbased platform for integrating spatiotemporal proteomics data and PPI data. To validate PPIExp, we analyzed a space-dependent proteomics dataset, which quantified proteins across different kinds of liver cells2 (Figure 1 - 4), and a time-dependent proteomics dataset, which quantified proteins during the neuronal differentiation process in vitro4 (Supplementary Figure S3, S4). PPIExp could help researchers interpret the more and more spatiotemporal proteomics data generated with the development of proteomics technologies. Extracting biological insight requires domain knowledge. However, many bioinformatics tools were not so convenient for the biologists with domain knowledge but few programming skills,

ACS Paragon Plus Environment

17

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 24

because these tools usually were packaged into R or python libraries, or were only packaged to be used with Windows or UNIX commands. Thus, PPIExp was developed with state-of-the-art techniques to bridge this gap. Users can use it in the website without installing other tools. More importantly, due to the high heterogeneity and complexity of the biomedical data, to help researchers digest the data, a tool should contain effective computation and visualization support.45,46 So, in PPIExp, we implemented multiple visualization interfaces and extensive interactive visualization operations. PPIExp is expected to facilitate the analysis of spatiotemporal proteomics data and PPI data and help researchers find the target proteins of their interests.

ASSOCIATED CONTENT Supporting Information The following files are available free of charge. Supplementary data.docx. This file contains Figure S1 and Figure S2 which present some operations in PPIExp and a case study of a time-dependent proteomics data. The case study contains Figure S3 and Figure S4. Figure S1. The operations to map node centrality to network map and highlight proteins annotated by a specific annotation term. Figure S2. The operations to show the scatter map interface that displays the expression pattern of the selected protein. Figure S3. The active PPI subnetworks during Neuronal differentiation in vitro. iPS, induced pluripotent stem cell; NPC, neural progenitor cells; Neu, neurons. Figure S4. (A) The network map in which the differentially-regulated proteins in Neu were highlighted. (B) The sub network map in which the proteins in synaptic vesicle cycle were highlighted as diamond. (C) The sub network map in which the proteins in anatomical structure morphogenesis were highlighted as diamond.

ACS Paragon Plus Environment

18

Page 19 of 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

PPIExpTutorial.zip. This fold contains seven video tutorials for PPIExp basic workflow: 1Import data and construct a PPI network.mp4; 2-Operations to visualize active PPI subnetworks.mp4; 3-Operations to highlight co-expressed proteins.mp4; 4-Operations to highlight proteins in an annotation.mp4; 5-Operations to map nodes centrality to network styles.mp4; 6-Opeartions to highlight custom proteins.mp4; 7-Operations to show the expression of one protein.mp4.

AUTHOR INFORMATION Corresponding Author Cheng Chang. *E-mail: [email protected]. ORCID: https://orcid.org/0000-0002-0361-2438 Miao Yu. *E-mail: [email protected] Xiaoming Yang. *E-mail: [email protected]. ORCID: https://orcid.org/0000-0003-36290946

Author Contributions X.Y, M.Y. and C.C. designed and supervised the whole study. X.L. and M.H. designed and programmed the web server. R.Y, C.L and Y.Z designed the UI and tested the web server. X.L and C.C wrote the initial manuscript. All authors have given approval to the final version of the manuscript.

Notes The authors declare no competing financial interest.

ACS Paragon Plus Environment

19

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 24

ACKNOWLEDGMENT We thank Ting Hua for his technical help in developing the platform. This work has been supported by the State Key Laboratory of Proteomics [SKLP-K201404], the National Basic Research Program of China [2013CB910801] and the National Natural Science Foundation of China [21605159 and 21475150]

ABBREVIATIONS PPI, protein-protein interaction; NCBI, national center for biotechnology information; REST, representational state transfer; HC, hepatocyte; KC, Kupffer cell; HSC, hepatic stellate cell; LSEC, liver sinusoidal endothelial cell. iPS, Induced pluripotent stem cell; Neu, neurons.

REFERENCES 1. De Las Rivas, J.; Fontanillo, C., Protein-protein interactions essentials: key concepts to building and analyzing interactome networks. PLoS Comput Biol 2010, 6, (6), e1000807. 2. Azimifar, S. B.; Nagaraj, N.; Cox, J.; Mann, M., Cell-type-resolved quantitative proteomics of murine liver. Cell Metab 2014, 20, (6), 1076-87. 3. Ding, C.; Li, Y.; Guo, F.; Jiang, Y.; Ying, W.; Li, D.; Yang, D.; Xia, X.; Liu, W.; Zhao, Y.; He, Y.; Li, X.; Sun, W.; Liu, Q.; Song, L.; Zhen, B.; Zhang, P.; Qian, X.; Qin, J.; He, F., A Cell-type-resolved Liver Proteome. Mol Cell Proteomics 2016, 15, (10), 3190-3202. 4. Djuric, U.; Rodrigues, D. C.; Batruch, I.; Ellis, J.; Shannon, P.; Diamandis, P., Spatiotemporal Proteomic Profiling of Human Cerebral Development. Mol Cell Proteomics 2017, 16, (9), 1548-1562. 5. Hosp, F.; Gutierrez-Angel, S.; Schaefer, M. H.; Cox, J.; Meissner, F.; Hipp, M. S.; Hartl, F. U.; Klein, R.; Dudanova, I.; Mann, M., Spatiotemporal Proteomic Profiling of Huntington's Disease Inclusions Reveals Widespread Loss of Protein Function. Cell Rep 2017, 21, (8), 22912303. 6. Kim, M. S.; Pinto, S. M.; Getnet, D.; Nirujogi, R. S.; Manda, S. S.; Chaerkady, R.; Madugundu, A. K.; Kelkar, D. S.; Isserlin, R.; Jain, S.; Thomas, J. K.; Muthusamy, B.; LealRojas, P.; Kumar, P.; Sahasrabuddhe, N. A.; Balakrishnan, L.; Advani, J.; George, B.; Renuse, S.; Selvan, L. D.; Patil, A. H.; Nanjappa, V.; Radhakrishnan, A.; Prasad, S.; Subbannayya, T.; Raju, R.; Kumar, M.; Sreenivasamurthy, S. K.; Marimuthu, A.; Sathe, G. J.; Chavan, S.; Datta, K. K.; Subbannayya, Y.; Sahu, A.; Yelamanchi, S. D.; Jayaram, S.; Rajagopalan, P.; Sharma, J.; Murthy, K. R.; Syed, N.; Goel, R.; Khan, A. A.; Ahmad, S.; Dey, G.; Mudgal, K.; Chatterjee, A.; Huang, T. C.; Zhong, J.; Wu, X.; Shaw, P. G.; Freed, D.; Zahari, M. S.; Mukherjee, K. K.; Shankar, S.; Mahadevan, A.; Lam, H.; Mitchell, C. J.; Shankar, S. K.; Satishchandra, P.; Schroeder, J. T.; Sirdeshmukh, R.; Maitra, A.; Leach, S. D.; Drake, C. G.; Halushka, M. K.; Prasad, T. S.; Hruban, R. H.; Kerr, C. L.; Bader, G. D.; Iacobuzio-Donahue, C. A.; Gowda, H.; Pandey, A., A draft map of the human proteome. Nature 2014, 509, (7502), 575-81.

ACS Paragon Plus Environment

20

Page 21 of 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

7. Wallach, T.; Schellenberg, K.; Maier, B.; Kalathur, R. K.; Porras, P.; Wanker, E. E.; Futschik, M. E.; Kramer, A., Dynamic circadian protein-protein interaction networks predict temporal organization of cellular functions. PLoS Genet 2013, 9, (3), e1003398. 8. Wang, J.; Peng, X.; Li, M.; Pan, Y., Construction and application of dynamic protein interaction network based on time course gene expression data. Proteomics 2013, 13, (2), 30112. 9. Calderone, A.; Castagnoli, L.; Cesareni, G., mentha: a resource for browsing integrated protein-interaction networks. Nat Methods 2013, 10, (8), 690-1. 10. Chatr-Aryamontri, A.; Oughtred, R.; Boucher, L.; Rust, J.; Chang, C.; Kolas, N. K.; O'Donnell, L.; Oster, S.; Theesfeld, C.; Sellam, A.; Stark, C.; Breitkreutz, B. J.; Dolinski, K.; Tyers, M., The BioGRID interaction database: 2017 update. Nucleic Acids Res 2017, 45, (D1), D369-D379. 11. Hermjakob, H.; Montecchi-Palazzi, L.; Lewington, C.; Mudali, S.; Kerrien, S.; Orchard, S.; Vingron, M.; Roechert, B.; Roepstorff, P.; Valencia, A.; Margalit, H.; Armstrong, J.; Bairoch, A.; Cesareni, G.; Sherman, D.; Apweiler, R., IntAct: an open source molecular interaction database. Nucleic Acids Res 2004, 32, (Database issue), D452-5. 12. Licata, L.; Briganti, L.; Peluso, D.; Perfetto, L.; Iannuccelli, M.; Galeota, E.; Sacco, F.; Palma, A.; Nardozza, A. P.; Santonico, E.; Castagnoli, L.; Cesareni, G., MINT, the molecular interaction database: 2012 update. Nucleic Acids Res 2012, 40, (Database issue), D857-61. 13. Orchard, S.; Ammari, M.; Aranda, B.; Breuza, L.; Briganti, L.; Broackes-Carter, F.; Campbell, N. H.; Chavali, G.; Chen, C.; del-Toro, N.; Duesbury, M.; Dumousseau, M.; Galeota, E.; Hinz, U.; Iannuccelli, M.; Jagannathan, S.; Jimenez, R.; Khadake, J.; Lagreid, A.; Licata, L.; Lovering, R. C.; Meldal, B.; Melidoni, A. N.; Milagros, M.; Peluso, D.; Perfetto, L.; Porras, P.; Raghunath, A.; Ricard-Blum, S.; Roechert, B.; Stutz, A.; Tognolli, M.; van Roey, K.; Cesareni, G.; Hermjakob, H., The MIntAct project--IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res 2014, 42, (Database issue), D358-63. 14. Agapito, G.; Guzzi, P. H.; Cannataro, M., Visualization of protein interaction networks: problems and solutions. BMC Bioinformatics 2013, 14 Suppl 1, S1. 15. Bastian M., H. S., Jacomy M, Gephi: an open source software for exploring and manipulating networks. In International AAAI Conference on Weblogs and Social Media, 2009. 16. Aric A. Hagberg, D. A. S. a. P. J. S., Exploring network structure, dynamics, and function using NetworkX. 2008. 17. Shannon, P.; Markiel, A.; Ozier, O.; Baliga, N. S.; Wang, J. T.; Ramage, D.; Amin, N.; Schwikowski, B.; Ideker, T., Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 2003, 13, (11), 2498-504. 18. Brown, K. R.; Otasek, D.; Ali, M.; McGuffin, M. J.; Xie, W.; Devani, B.; Toch, I. L.; Jurisica, I., NAViGaTOR: Network Analysis, Visualization and Graphing Toronto. Bioinformatics 2009, 25, (24), 3327-9. 19. Morris, J. H.; Vijay, D.; Federowicz, S.; Pico, A. R.; Ferrin, T. E., CyAnimator: Simple Animations of Cytoscape Networks. F1000Res 2015, 4, 482. 20. Ono, K.; Muetze, T.; Kolishovski, G.; Shannon, P.; Demchak, B., CyREST: Turbocharging Cytoscape Access for External Tools via a RESTful API. F1000Res 2015, 4, 478. 21. Müllner, D., fastcluster: Fast Hierarchical, Agglomerative Clustering Routines for R and Python. .Journal of Statistical Software 2013, 53(9), 1-18. 22. Franz, M.; Lopes, C. T.; Huck, G.; Dong, Y.; Sumer, O.; Bader, G. D., Cytoscape.js: a graph theory library for visualisation and analysis. Bioinformatics 2016, 32, (2), 309-11.

ACS Paragon Plus Environment

21

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 24

23. Orchard, S.; Kerrien, S.; Abbani, S.; Aranda, B.; Bhate, J.; Bidwell, S.; Bridge, A.; Briganti, L.; Brinkman, F. S.; Cesareni, G.; Chatr-aryamontri, A.; Chautard, E.; Chen, C.; Dumousseau, M.; Goll, J.; Hancock, R. E.; Hannick, L. I.; Jurisica, I.; Khadake, J.; Lynn, D. J.; Mahadevan, U.; Perfetto, L.; Raghunath, A.; Ricard-Blum, S.; Roechert, B.; Salwinski, L.; Stumpflen, V.; Tyers, M.; Uetz, P.; Xenarios, I.; Hermjakob, H., Protein interaction data curation: the International Molecular Exchange (IMEx) consortium. Nat Methods 2012, 9, (4), 345-50. 24. Heinz, S.; Benner, C.; Spann, N.; Bertolino, E.; Lin, Y. C.; Laslo, P.; Cheng, J. X.; Murre, C.; Singh, H.; Glass, C. K., Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 2010, 38, (4), 576-89. 25. Kanehisa, M.; Furumichi, M.; Tanabe, M.; Sato, Y.; Morishima, K., KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res 2017, 45, (D1), D353-D361. 26. Croft, D.; O'Kelly, G.; Wu, G.; Haw, R.; Gillespie, M.; Matthews, L.; Caudy, M.; Garapati, P.; Gopinath, G.; Jassal, B.; Jupe, S.; Kalatskaya, I.; Mahajan, S.; May, B.; Ndegwa, N.; Schmidt, E.; Shamovsky, V.; Yung, C.; Birney, E.; Hermjakob, H.; D'Eustachio, P.; Stein, L., Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res 2011, 39, (Database issue), D691-7. 27. Slenter, D. N.; Kutmon, M.; Hanspers, K.; Riutta, A.; Windsor, J.; Nunes, N.; Melius, J.; Cirillo, E.; Coort, S. L.; Digles, D.; Ehrhart, F.; Giesbertz, P.; Kalafati, M.; Martens, M.; Miller, R.; Nishida, K.; Rieswijk, L.; Waagmeester, A.; Eijssen, L. M. T.; Evelo, C. T.; Pico, A. R.; Willighagen, E. L., WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research. Nucleic Acids Res 2018, 46, (D1), D661-D667. 28. The Gene Ontology, C., Expansion of the Gene Ontology knowledgebase and resources. Nucleic Acids Res 2017, 45, (D1), D331-D338. 29. Jeong, H.; Mason, S. P.; Barabasi, A. L.; Oltvai, Z. N., Lethality and centrality in protein networks. Nature 2001, 411, (6833), 41-2. 30. Yu, H.; Kim, P. M.; Sprecher, E.; Trifonov, V.; Gerstein, M., The importance of bottlenecks in protein networks: correlation with gene essentiality and expression dynamics. PLoS Comput Biol 2007, 3, (4), e59. 31. Bhardwaj, N.; Lu, H., Correlation between gene expression profiles and protein-protein interactions within and across genomes. Bioinformatics 2005, 21, (11), 2730-8. 32. Jansen, R.; Greenbaum, D.; Gerstein, M., Relating whole-genome expression data with protein-protein interactions. Genome Res 2002, 12, (1), 37-46. 33. Kim, M.; Jho, E. H., Cross-talk between Wnt/beta-catenin and Hippo signaling pathways: a brief review. BMB Rep 2014, 47, (10), 540-5. 34. Anastas, J. N.; Moon, R. T., WNT signalling pathways as therapeutic targets in cancer. Nat Rev Cancer 2013, 13, (1), 11-26. 35. UniProt Consortium, T., UniProt: the universal protein knowledgebase. Nucleic Acids Res 2018, 46, (5), 2699. 36. Szklarczyk, D.; Franceschini, A.; Wyder, S.; Forslund, K.; Heller, D.; Huerta-Cepas, J.; Simonovic, M.; Roth, A.; Santos, A.; Tsafou, K. P.; Kuhn, M.; Bork, P.; Jensen, L. J.; von Mering, C., STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res 2015, 43, (Database issue), D447-52.

ACS Paragon Plus Environment

22

Page 23 of 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

37. Batagelj V., M. A., Pajek — Analysis and Visualization of Large Networks. Mathematics and Visualization. Springer, Berlin, Heidelberg 2004, (In: Jünger M., Mutzel P. (eds) Graph Drawing Software.). 38. Dennis, G., Jr.; Sherman, B. T.; Hosack, D. A.; Yang, J.; Gao, W.; Lane, H. C.; Lempicki, R. A., DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol 2003, 4, (5), P3. 39. Kuleshov, M. V.; Jones, M. R.; Rouillard, A. D.; Fernandez, N. F.; Duan, Q.; Wang, Z.; Koplev, S.; Jenkins, S. L.; Jagodnik, K. M.; Lachmann, A.; McDermott, M. G.; Monteiro, C. D.; Gundersen, G. W.; Ma'ayan, A., Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res 2016, 44, (W1), W90-7. 40. Reimand, J.; Arak, T.; Vilo, J., g:Profiler--a web server for functional interpretation of gene lists (2011 update). Nucleic Acids Res 2011, 39, (Web Server issue), W307-15. 41. Fernandez, N. F.; Gundersen, G. W.; Rahman, A.; Grimes, M. L.; Rikova, K.; Hornbeck, P.; Ma'ayan, A., Clustergrammer, a web-based heatmap visualization and analysis tool for highdimensional biological data. Sci Data 2017, 4, 170151. 42. Saldanha, A. J., Java Treeview--extensible visualization of microarray data. Bioinformatics 2004, 20, (17), 3246-8. 43. Tripathi, S.; Pohl, M. O.; Zhou, Y.; Rodriguez-Frandsen, A.; Wang, G.; Stein, D. A.; Moulton, H. M.; DeJesus, P.; Che, J.; Mulder, L. C.; Yanguez, E.; Andenmatten, D.; Pache, L.; Manicassamy, B.; Albrecht, R. A.; Gonzalez, M. G.; Nguyen, Q.; Brass, A.; Elledge, S.; White, M.; Shapira, S.; Hacohen, N.; Karlas, A.; Meyer, T. F.; Shales, M.; Gatorano, A.; Johnson, J. R.; Jang, G.; Johnson, T.; Verschueren, E.; Sanders, D.; Krogan, N.; Shaw, M.; Konig, R.; Stertz, S.; Garcia-Sastre, A.; Chanda, S. K., Meta- and Orthogonal Integration of Influenza "OMICs" Data Defines a Role for UBR4 in Virus Budding. Cell Host Microbe 2015, 18, (6), 723-35. 44. Xia, J.; Gill, E. E.; Hancock, R. E., NetworkAnalyst for statistical, visual and networkbased meta-analysis of gene expression data. Nat Protoc 2015, 10, (6), 823-44. 45. Gehlenborg, N.; O'Donoghue, S. I.; Baliga, N. S.; Goesmann, A.; Hibbs, M. A.; Kitano, H.; Kohlbacher, O.; Neuweger, H.; Schneider, R.; Tenenbaum, D.; Gavin, A. C., Visualization of omics data for systems biology. Nat Methods 2010, 7, (3 Suppl), S56-68. 46. Turkay, C.; Jeanquartier, F.; Holzinger, A.; Hauser, H., On Computationally-Enhanced Visual Analysis of Heterogeneous Data and Its Application in Biomedical Informatics. In Interactive Knowledge Discovery and Data Mining in Biomedical Informatics: State-of-the-Art and Future Challenges, Holzinger, A.; Jurisica, I., Eds. Springer Berlin Heidelberg: Berlin, Heidelberg, 2014; pp 117-140.

ACS Paragon Plus Environment

23

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 24

For Table of Contents Only

ACS Paragon Plus Environment

24