Subscriber access provided by University of Sunderland
Technical Note
XINA: a workflow for the integration of multiplexed proteomics kinetics data with network analysis Lang Ho Lee, Arda Halu, Stephanie Morgan, Hiroshi Iwata, Masanori Aikawa, and Sasha Singh J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.8b00615 • Publication Date (Web): 12 Oct 2018 Downloaded from http://pubs.acs.org on October 16, 2018
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
XINA: a workflow for the integration of multiplexed proteomics kinetics data with network analysis Lang Ho Lee†, Arda Halu†, ‡, Stephanie Morgan†, Hiroshi Iwata†, *Masanori Aikawa†, ‡, §, and *Sasha A. Singh† †Center for Interdisciplinary Cardiovascular Sciences, Cardiovascular Division, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts 02115, USA. ‡Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts 02115, USA. §Center for Excellence in Vascular Biology, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts 02115, USA.
*These authors equally contributed to this study. Correspondence should be addressed to: Masanori Aikawa, MD, PhD The Center for Interdisciplinary Cardiovascular Sciences, Brigham and Women’s Hospital, Harvard Medical School; 3 Blackfan Street, 17th Floor, Boston, MA 02115, USA; Phone: 617730-7777; Fax: 617-730-7791; E-mail:
[email protected] Sasha Anna Singh, PhD The Center for Interdisciplinary Cardiovascular Sciences, Brigham and Women’s Hospital, Harvard Medical School; 3 Blackfan Street, 17th Floor, Boston, MA 02115, USA; Phone: 617730-7702; Fax: 617-730-7791; E-mail:
[email protected] KEYWORDS Proteomics, Multiplexed data, Time-series, Co-regulation, Clustering, Protein-protein interaction network, Alluvial diagram, R, Bioconductor, Macrophages
ACS Paragon Plus Environment
1
Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 2 of 21
ABSTRACT (200/200 words) Quantitative proteomics experiments, using for instance isobaric tandem mass tagging approaches, are conducive to measuring changes in protein abundance over multiple time points in response to one or more conditions or stimulations. The aim is often to determine which proteins exhibit similar patterns within and across experimental conditions, since proteins with co-abundance patterns may have common molecular functions related to a given stimulation. In order to facilitate identification and analyses of co-abundance patterns within and across conditions, we previously developed a software inspired by the isobaric mass tagging method itself.
Specifically, multiple datasets are tagged in silico and combined for subsequent
subgrouping into multiple clusters within a single output depicting the variation across all conditions; converting a typical inter-dataset comparison into an intra-dataset comparison. An updated version of our software, XINA, not only extracts co-abundance profiles within and across experiments, but also incorporates protein-protein interaction databases and integrative resources such as KEGG to infer interactors and molecular functions, respectively, and produces intuitive graphical outputs. In this report we compare the kinetics profiles of >5,600 unique proteins derived from three macrophage cell culture experiments and demonstrate through intuitive visualizations that XINA identifies key regulators of macrophage activation via their coabundance patterns.
ACS Paragon Plus Environment
2
Page 3 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
INTRODUCTION Quantitative proteomics experiments generate large amounts of information with great potential for the addition to, and benefit from, network biology resources. Kinetics-based studies that include various stimulation conditions can reveal co-regulated or disparately regulated proteins via their co-abundance protein profiles.1-2 For instance, co-abundance profiles may indicate coregulation (transcription or translational) which in turn implies common signaling pathways or shared molecular functions 3, and/or possible physical interactions. Established genetic and physical interactions are continually curated by various repositories to build interactomes.4-10 These interactomes, however, are an amalgamation of multiple datasets and sources, and do not readily provide context-specific information. We recently demonstrated that when macrophage protein co-abundance networks are integrated into a generic interactome, the combined network improves therapeutic target prediction.2 While these in silico advances are powerful, they do require high expertise in the computational sciences. In 2015, we published a clustering and visualization software that facilitates the analysis of multiple dataset comparisons such as, but not limited to, protein kinetics data collected using the isobaric mass tagging (IMT) strategy.11 The multiple datasets should be derived from the same biological context (common cell or tissue type, and common time points) to ensure the underlying biological variations are comparable. Our workflow multiplexes IMT-generated kinetics data (mIMT) for subsequent clustering that requires a specialized visualization paradigm to perform high throughput screening (visHTS) of high dimensional (multiple datasets, time points and proteins) co-abundance profiles.11-12 At this early stage of our workflow development (mIMT-visHTS), we rationalized that co-abundance proteins profiles could infer co-regulated proteins and common molecular pathways.1-2,
11
In this current version of our software that
ACS Paragon Plus Environment
3
Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 4 of 21
incorporates network methods, XINA (multiplexed isobaric mass tagged-based kinetics data for network analysis) queries protein co-abundance profiles to the STRING molecular interaction database6 to map their protein-protein interactions and determine subnetworks of co-abundant proteins, which in turn can be interpreted by the KEGG pathway database.13 While users still require basic knowledge of R and programming, XINA is designed for non-expert users aiming to perform network analyses on their proteomics data. We demonstrated XINA’s capabilities by identifying key molecules of the macrophage activations using data from a previous study.14
METHODS Example proteomics kinetics data. This study utilizes previously published kinetic profiles of activated human THP-1 macrophage-like cells (ATCC, catalog no. T1B-202 ) stimulated with either interferon gamma (M(IFN- ߛ ), 10 ng/mL), interleukin-4 (M(IL-4), 10 ng/mL), or no stimulation (M(-)) for 72 hours.14 Tandem mass tagging TMT (analyzed on the LTQ-Orbitrap Elite mass spectrometer, Thermo Fisher Scientific) was used to monitor the changes in the proteome for each condition at six time points.14 Master proteins with two or more unique peptides were used for TMT reporter ion intensity quantification. The proteins in each stimulation time series were referenced normalized (to t=0), and the corresponding kinetics profiles were exported as .csv files for import into XINA (Table S1).
Improvement to our pre-existing format for co-abundance studies. XINA is the next generation of our previously published mIMT-visHTS software11 that has been completely redesigned to include the added features of representing co-abundant proteins as networks and determining central proteins within these networks (Figure 1A). Technically, unlike mIMT-
ACS Paragon Plus Environment
4
Page 5 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
visHTS that was developed using Visual Studio with limited extensibility, XINA is now operating system-independent since it is fully based in R that can be executed in Windows, Linux and MacOS, using for example R Studio (https://www.rstudio.com/) (Figure 1B). In addition, XINA has new graphical output options, includes a more extensive analysis of coabundance patterns by using condition composition pie charts, and alluvial diagrams to examine protein co-migration patterns. XINA also has a new function in that the STRING Database has been imported for protein-protein interaction networks and functional enrichment tests.
Data preprocessing. XINA is an R package that provides statistical and graphical tools to investigate the co-abundance patterns of proteins derived from, but not limited to, multiplexed time-series proteomics data. XINA is comprised of three functional parts: 1) Cluster analysis, 2) co-abundance and co-migration analysis, and 3) protein-protein interaction network analysis. The XINA workflow starts from the integration of multiple sets of proteomics data sharing the same data matrix format (i.e. time points to be analyzed, conditions to be compared, etc.). Akin to the IMT method that adds a chemical tag to each proteome’s peptide pool15, XINA adds an electronic tag to each dataset’s proteins during the data integration. These tags enable XINA to distinguish the data source throughout the course of subsequent analyses. The integrated data is referred to as, a ‘super (data)set’ and is processed together for clustering (Figure 1C).
Network analysis. XINA conducts protein-protein interaction (PPI) network analysis through implementing ‘igraph’ [http://igraph.org/] and ‘STRINGdb’4 R packages. In addition to the ‘STRINGdb’, XINA also accepts user-curated PPI information, such as ‘edgelists’ consisting of a source node column and a target node column for the network construction. XINA constructs
ACS Paragon Plus Environment
5
Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 6 of 21
PPI networks of co-migrated proteins as well as co-abundant proteins within a cluster with the aim of identifying candidate co-regulated proteins or proteins with a common molecular function. After the PPI network construction, XINA computes network centralities that identify the most inter-connected vertices within a graph, thus potentially identifying key molecules in the network. XINA provides various centrality calculations using ‘igraph’ including betweenness, closeness, hub, and eigenvector centralities. XINA plots the network graph and colors protein vertices by their centralities using ‘igraph’, with darker colors indicating higher centrality in the graph. Finally, XINA provides enrichment tests using Gene Ontology (GO) and KEGG pathways so that the investigators can understand the co-regulations in their proteomics data in terms of their related biological processes and pathways.
Availability and system requirements. XINA is free software: it can be redistributed and/or modified under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, any later version. XINA available for download from
Bioconductor
GitHub
(https://www.bioconductor.org/packages/devel/bioc/html/XINA.html),
(https://github.com/langholee/XINA)
and
our
website
(https://cics.bwh.harvard.edu/software). For an optimal experience we recommend the most recent version of RStudio desktop requirements including Windows 7+, MacOS 10.11+, Ubuntu 14+, RedHat 7+, and Arch x64. XINA was developed using MacOS, 10.13.6, 2.6 GHz i7 processor, and 16 Gb Memory; and fully tested on Windows 10 with 3.6 GHz i7 processor, and 32 Gb Memory.
ACS Paragon Plus Environment
6
Page 7 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
RESULTS AND DISCUSSION Data multiplexing and clustering By default XINA implements model-based clustering16 to classify proteins depending on their abundance profiles. As additional clustering options for users, XINA now provides the K-means clustering algorithm (Figures 1D and S1). Before clustering, XINA normalizes the protein abundance profiles via the sum-normalization method to standardize protein abundances to range from 0 to 1.17 As an alternative to the sum-normalization, XINA now also provides a z-score for data standardization. For clustering, the model-based algorithm either optimizes the number of clusters at minimum Bayesian information criteria (BIC) in an unbiased manner or accepts a user-based maximum cluster number input. Clustering is performed by using the ‘mclust’ R package.16 ‘mclust’ determines a co-variance structure and the number of clusters at the best BIC16. XINA prints out the BIC plots of ‘mclust’ to display the optimization results. Due to the optimization steps employed by model-based clustering, it takes longer than K-means. For the data below comprising ≥14,000 kinetics profiles, the K-means clustering was completed within 10 minutes but took just over two hours for model-based clustering. The cost and benefits of one clustering algorithm over the other can be decided upon by the user, but based on our overview of the two clustering algorithms, they produced comparable results In this case study, we compared the TMT-determined proteome kinetics 14 of the human macrophage-like cell line THP-1 in response to treatment with either IFN-γ (M(IFN-γ)) or IL-4 (M(IL-4)) over 72 hours. The unstimulated cell culture condition M(-) is the baseline biological noise control (Figure 1C). The initial steps in the XINA pipeline are in common with those in mIMT-visHTS: 1) the multiplexing of the three datasets (M(-), M(IFN-γ) and M(IL-4)) into a single super dataset (Figure 1C) for the subsequent step, 2) model-based clustering, which in this
ACS Paragon Plus Environment
7
Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 8 of 21
case resulted in 30 clusters that best describe the variability in the super dataset (Figures 1D and S1). The user can choose the color themes for each experimental condition and clusters. All subsequent steps outlined below are novel features provided by XINA.
Cluster composition analysis XINA computes the percentage of each experimental condition or time point within each cluster, and outputs the resulting percentage as pie-charts to represent the composition of each cluster. It then utilizes this condition composition information to detect clusters that are biased (e.g. >60% composition, set by user) towards one specific condition. The biased clusters can be used to indicate a condition-specific biological variation. Each of the example 30 super dataset clusters comprises a mixture of kinetic profiles from one or more of the three experimental condition datasets. XINA depicts the relative contribution of each dataset to each cluster in the form of pie charts (Figures 1E and S1). For example, clusters 1, 4, 23, 26, and 27 that tend to induce changes at earlier time points (up to 24 hours) comprise predominately M(IFN-γ) kinetics; while clusters 5, 22, and 25 depict changes at the later time points (24 to 72 hours) comprise predominately M(IL-4) (Figure S1).
Cluster-specific protein-protein interaction (PPI) networks The proteins contained within each cluster can be queried against the imported R package for the STRING database. The cluster-specific PPI networks can be presented in multiple ways; for instance, Figure 2A displays all PPI network nodes and edges, and Figure 2B displays only significant interactions between high eigenvector centrality proteins. For the PPI networks, the nodes are colored based on experimental condition: green, M(-); orange, M(IFN-γ); and blue,
ACS Paragon Plus Environment
8
Page 9 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
M(IL-4). The edges are colored according to the cluster. XINA refines the PPI networks by extracting subnetworks that are built on either all combined experimental conditions, or subnetworks built on each experimental condition separately, with the latter being more biologically relevant. For example, cluster 4 contains primarily M(IFN-γ)-protein profiles (Figure 1E) and has the largest PPI network (Figure 2A,C). Although these clusters provide the user with relative densities of the cluster-centric networks, they are cluttered with excess information. XINA therefore reduces the PPI network to the most central nodes (Figure 2B,D), shaded red in relation to eigenvector centrality, and grey-colored edges to enhance the central nodes. These central nodes can be exported as a protein list.
Cluster-to-cluster protein migrations A major novel feature of XINA is that it traces the movement of proteins across the clusters, or ‘protein migration, in response to the various conditions. Two or more proteins with the same movement between clusters are said to ‘co-migrate’, and XINA supposes that proteins that comigrate in response to a given condition are more likely to be co-regulated at the biological level than other proteins within the same clusters. Co-migration is visualized via alluvial diagrams, in which the cluster numbers are assigned to the vertical axis, and the experimental conditions are assigned to the horizontal axis (Figure 3A). The left-most condition is the reference, in this case, M(-), and M(IFNγ)-to-M(IL-4) in tandem are the conditions to be compared. The vertical size of each cluster block on the plot, as well as the width of the alluvium, is proportional to the number of proteins within the cluster. The alluvia are colored based on the reference cluster, and those streaming to and from a cluster block assigned ‘0’ indicates that the proteins were not detected in the condition (Figure 3A). While there is no set limit to the number of conditions to be compared,
ACS Paragon Plus Environment
9
Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 10 of 21
we generally recommend two to three until users become more familiar with co-abundance patterns. Unbiased filtering - If left unfiltered, the alluvial diagram for all comigrations in the entire superset is overwhelming (Figure 3A). XINA thus unbiasedly finds significant comigrations for when the user defines the protein number threshold (e.g. ≥9 proteins) whose results are further supported by the adjusted p-value of Fisher’s exact test measuring the significance over all co-migrations. For instance, using a threshold of nine co-migrating proteins within an alluvium, the alluvia are reduced to 50 co-migrations (Figure 3A). As a second step, XINA can further prioritize the significant co-migrations between condition-biased clusters that were determined at model-based clustering and composition analysis steps (Figure 1E and S1). These alluvia are now colored based on the condition bias (Figure 3B, legend). Biased filtering and co-migration analyses - XINA also provides the user with options to investigate hypothesis-driven co-regulation patterns or to track co-migrations associated with specific targets. XINA users are expected to be interested in specific clusters or controls/known proteins given their biological question, and thus XINA can be used to selectively track comigrations. The PPI networks and KEGG pathways of co-migrated proteins – PPI networks and central proteins, and their associated KEGG pathways can be extracted by XINA. As an example, we present M(IL-4)-biased clusters whose proteins were not observed in either M(-) or M(IFN-γ) datasets (Figure 3C). These proteins are localized to clusters whose kinetics either increase (clusters 5 and 25) or decrease (cluster 22) late in the stimulation period (Figure S1). One of the central proteins in the combined co-migration PPI network is CD4 (Figure 3D,E) which is most commonly associated with its role as a co-receptor for MHC class II molecules on
ACS Paragon Plus Environment
10
Page 11 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
T lymphocytes.18 The combined network’s top KEGG pathways include metabolism pathways related to with glutathione, retinol, and cytochrome P40 (Figure 3E). A majority of the CD4 protein-protein interactors, including receptors MRC1 (CD206)19 and VSIG420-21, in the combined network were primarily found in cluster 5-specific network (Figure E).
CONCLUSIONS Sample multiplexing strategies are commonly applied at the bench-top in proteomics studies to reduce the variability and cost associated with acquiring mass spectrometry data separately.22-23 XINA extends the principle of multiplexing to the data analysis portion by combining multiple datasets to be compared using an intra-dataset strategy in lieu of typical inter-dataset strategies.11 Such high dimensional ‘omics’ data, however, require innovative and intuitive data processing and visualization tools to extract biologically meaningful data.24-27 An underlying assumption in kinetics studies is that proteins with common abundance patterns are likely to be co-expressed, share molecular functions, and/or be involved in similar signaling pathways.1, 17 XINA thus determines protein co-abundance patterns using clustering methods and employs intuitive alluvial diagrams to track the co-migration of protein groups between clusters in response to one or more stimuli. We and others then rely on network medicine and interactome databases to interpret these co-abundance proteomics patterns. 14, 28-29 These example studies, however, have employed sophisticated network analysis that non-expert users may not have readily available. Other resources such as pwOmics have provided a broader research community with options to perform pathway and protein interaction analyses of their data. 30 However, to our knowledge, no one has provided a visualization paradigm to investigate co-regulated proteins via coabundance patterns. Moreover, XINA also links to the String Database (or a custom curated PPI
ACS Paragon Plus Environment
11
Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 12 of 21
information) so that the user can query their co-abundance profiles to infer novel molecular pathways relevant to their biology, thereby inciting hypothesis-driven and guided downstream follow-up experiments.
ACS Paragon Plus Environment
12
Page 13 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
FIGURES Figure 1. XINA facilitates high dimensional quantitative proteomics data analyses. (B) Screenshot of XINA working through the R Studio interface. (C) Multiple IMT datasets are sumnormalized and then combined to create a multiplexed super dataset. Featured experiment: the macrophage-like cell line THP-1 cultured in one of three conditions. (D) Thumbnails of modelbased and k-means cluster outputs, composition pie charts, and clusters recolored for condition bias. (E) Cluster no. 4 detailing cluster composition and condition bias.
Figure 2. Cluster-specific protein-protein interaction (PPI) networks from the STRINGdb. (A) The PPI networks for the multiplexed (all experimental conditions) and the individual experimental conditions, retrieved from the super dataset cluster in Fig. 1D. (B) The PPI networks focused on the central nodes (red). (C) PPI insets for cluster 4. (D) Centrality insets for cluster 4.
Figure 3. Co-migration PPI networks. (A) Alluvia diagrams tracking all co-migration patterns across clusters for the three experimental conditions. (B) Co-migrations with nine or more proteins, that can be further filtered down to alluvia moving amongst condition-biased clusters (alluvia are recolored according to the experimental conditions). (C) Alluvia diagrams for M(IL4) biased clusters whose proteins were not detected in the other two conditions. These proteins likely indicate the IL-4 or anti-inflammatory state since their baseline levels were too low to be detected in the M(-) and M(IFN-γ) mass spectrometric acquisition experiments. (D) The eigenvector centralized PPI networks of the combined co-migrating proteins. (E) Output of the top-ranked centrality scores and KEGG-enriched pathways (F) The breakdown of PPI networks for the individual alluvia.
ACS Paragon Plus Environment
13
Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 14 of 21
SUPPORTING INFORMATION Supplemental Table S1 The three THP-1 macrophage TMT datasets: M(-), M(IFN-γ) and M(IL-4)
Supplemental Figure 1 XINA exported super dataset clusters, composition pie charts and cluster biases for model-based and K-means clustering presented in Figure 1.
ACS Paragon Plus Environment
14
Page 15 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
REFERENCES 1. Singh, S. A.; Winter, D.; Kirchner, M.; Chauhan, R.; Ahmed, S.; Ozlu, N.; Tzur, A.; Steen, J. A.; Steen, H., Co-regulation proteomics reveals substrates and mechanisms of APC/Cdependent degradation. The EMBO Journal 2014, 33 (4), 385-99. 2. Halu, A.; Wang, J.; Iwata, H.; Mojcher, A.; Abib, A. L.; Singh, S. A.; Aikawa, M.; Sharma, A., Context-enriched interactome powered by proteomics helps the identification of novel regulators of macrophage activation. eLife 2018, accepted. 3. Kustatscher, G.; Grabowski, P.; Rappsilber, J., Pervasive coexpression of spatially proximal genes is buffered at the protein level. Mol Syst Biol 2017, 13 (8), 937. 4. Franceschini, A.; Szklarczyk, D.; Frankild, S.; Kuhn, M.; Simonovic, M.; Roth, A.; Lin, J.; Minguez, P.; Bork, P.; von Mering, C.; Jensen, L. J., STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res 2013, 41 (Database issue), D808-15. 5. Chatr-Aryamontri, A.; Oughtred, R.; Boucher, L.; Rust, J.; Chang, C.; Kolas, N. K.; O'Donnell, L.; Oster, S.; Theesfeld, C.; Sellam, A.; Stark, C.; Breitkreutz, B. J.; Dolinski, K.; Tyers, M., The BioGRID interaction database: 2017 update. Nucleic Acids Res 2017, 45 (D1), D369-D379. 6. Szklarczyk, D.; Morris, J. H.; Cook, H.; Kuhn, M.; Wyder, S.; Simonovic, M.; Santos, A.; Doncheva, N. T.; Roth, A.; Bork, P.; Jensen, L. J.; von Mering, C., The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res 2017, 45 (D1), D362-D368. 7. Orchard, S.; Ammari, M.; Aranda, B.; Breuza, L.; Briganti, L.; Broackes-Carter, F.; Campbell, N. H.; Chavali, G.; Chen, C.; del-Toro, N.; Duesbury, M.; Dumousseau, M.; Galeota, E.; Hinz, U.; Iannuccelli, M.; Jagannathan, S.; Jimenez, R.; Khadake, J.; Lagreid, A.; Licata, L.; Lovering, R. C.; Meldal, B.; Melidoni, A. N.; Milagros, M.; Peluso, D.; Perfetto, L.; Porras, P.; Raghunath, A.; Ricard-Blum, S.; Roechert, B.; Stutz, A.; Tognolli, M.; van Roey, K.; Cesareni, G.; Hermjakob, H., The MIntAct project--IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res 2014, 42 (Database issue), D358-63. 8. Keshava Prasad, T. S.; Goel, R.; Kandasamy, K.; Keerthikumar, S.; Kumar, S.; Mathivanan, S.; Telikicherla, D.; Raju, R.; Shafreen, B.; Venugopal, A.; Balakrishnan, L.; Marimuthu, A.; Banerjee, S.; Somanathan, D. S.; Sebastian, A.; Rani, S.; Ray, S.; Harrys Kishore, C. J.; Kanth, S.; Ahmed, M.; Kashyap, M. K.; Mohmood, R.; Ramachandra, Y. L.; Krishna, V.; Rahiman, B. A.; Mohan, S.; Ranganathan, P.; Ramabadran, S.; Chaerkady, R.; Pandey, A., Human Protein Reference Database--2009 update. Nucleic Acids Res 2009, 37 (Database issue), D767-72. 9. Das, J.; Yu, H., HINT: High-quality protein interactomes and their applications in understanding human disease. BMC Syst Biol 2012, 6, 92. 10. Lehne, B.; Schlitt, T., Protein-protein interaction databases: keeping up with growing interactomes. Hum Genomics 2009, 3 (3), 291-7. 11. Ricchiuto, P.; Iwata, H.; Yabusaki, K.; Yamada, I.; Pieper, B.; Sharma, A.; Aikawa, M.; Singh, S. A., mIMT-visHTS: A novel method for multiplexing isobaric mass tagged datasets with an accompanying visualization high throughput screening tool for protein profiling. Journal of Proteomics 2015, 128, 132-40. 12. Nakano, T.; Katsuki, S.; Chen, M.; Decano, J. L.; Halu, A.; Lee, L. H.; Pestana, D. V. S.; Kum, A. S. T.; Kuromoto, R. K.; Golden, W. S.; Boff, M. S.; Guimaraes, G. S.; Higashi, H.;
ACS Paragon Plus Environment
15
Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 16 of 21
Kauffman, K. J.; Maejima, T.; Suzuki, T.; Iwata, H.; Barabási, A. L.; Aster, J. C.; Anderson, D. G.; Sharma, A.; Singh, S. A.; Aikawa, E.; Aikawa, M., Uremic toxin indoxyl sulfate promotes pro-inflammatory macrophage activation via the interplay of OATB2B1 and Dll4-Notch signaling Potential mechanism for accelerated atherogenesis in chronic kidney disease. Circulation 2018, in press. 13. Kanehisa, M.; Sato, Y.; Kawashima, M.; Furumichi, M.; Tanabe, M., KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res 2016, 44 (D1), D457-62. 14. Iwata, H.; Goettsch, C.; Sharma, A.; Ricchiuto, P.; Goh, W. W.; Halu, A.; Yamada, I.; Yoshida, H.; Hara, T.; Wei, M.; Inoue, N.; Fukuda, D.; Mojcher, A.; Mattson, P. C.; Barabasi, A. L.; Boothby, M.; Aikawa, E.; Singh, S. A.; Aikawa, M., PARP9 and PARP14 cross-regulate macrophage activation via STAT1 ADP-ribosylation. Nat Commun 2016, 7, 12849. 15. Rauniyar, N.; Yates, J. R., 3rd, Isobaric labeling-based relative quantification in shotgun proteomics. Journal of Proteome Research 2014, 13 (12), 5293-309. 16. Scrucca, L.; Fop, M.; Murphy, T. B.; Raftery, A. E., mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models. R J 2016, 8 (1), 289-317. 17. Kirchner, M.; Renard, B. Y.; Kothe, U.; Pappin, D. J.; Hamprecht, F. A.; Steen, H.; Steen, J. A., Computational protein profile similarity screening for quantitative mass spectrometry experiments. Bioinformatics 2010, 26 (1), 77-83. 18. Doyle, C.; Strominger, J. L., Interaction between CD4 and class II MHC molecules mediates cell adhesion. Nature 1987, 330 (6145), 256-9. 19. Roszer, T., Understanding the Mysterious M2 Macrophage through Activation Markers and Effector Mechanisms. Mediators Inflamm 2015, 2015, 816460. 20. Li, J.; Diao, B.; Guo, S.; Huang, X.; Yang, C.; Feng, Z.; Yan, W.; Ning, Q.; Zheng, L.; Chen, Y.; Wu, Y., VSIG4 inhibits proinflammatory macrophage activation by reprogramming mitochondrial pyruvate metabolism. Nat Commun 2017, 8 (1), 1322. 21. Vogt, L.; Schmitz, N.; Kurrer, M. O.; Bauer, M.; Hinton, H. I.; Behnke, S.; Gatto, D.; Sebbel, P.; Beerli, R. R.; Sonderegger, I.; Kopf, M.; Saudan, P.; Bachmann, M. F., VSIG4, a B7 family-related protein, is a negative regulator of T cell activation. The Journal of Clinical Investigation 2006, 116 (10), 2817-26. 22. Chahrour, O.; Cobice, D.; Malone, J., Stable isotope labelling methods in mass spectrometry-based quantitative proteomics. J Pharm Biomed Anal 2015, 113, 2-20. 23. Singh, S. A.; Miyosawa, K.; Aikawa, M., Mass spectrometry meets the challenge of understanding the complexity of the lipoproteome: recent findings regarding proteins involved in dyslipidemia and cardiovascular disease. Expert Review of Proteomics 2015, 12 (5), 519-32. 24. Amir el, A. D.; Davis, K. L.; Tadmor, M. D.; Simonds, E. F.; Levine, J. H.; Bendall, S. C.; Shenfeld, D. K.; Krishnaswamy, S.; Nolan, G. P.; Pe'er, D., viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia. Nat Biotechnol 2013, 31 (6), 545-52. 25. Serviss, J. T.; Gadin, J. R.; Eriksson, P.; Folkersen, L.; Grander, D., ClusterSignificance: a bioconductor package facilitating statistical analysis of class cluster separations in dimensionality reduced data. Bioinformatics 2017, 33 (19), 3126-3128. 26. Martinez-Bartolome, S.; Medina-Aunon, J. A.; Lopez-Garcia, M. A.; Gonzalez-Tejedo, C.; Prieto, G.; Navajas, R.; Salazar-Donate, E.; Fernandez-Costa, C.; Yates, J. R., 3rd; Albar, J. P., PACOM: A Versatile Tool for Integrating, Filtering, Visualizing, and Comparing Multiple Large Mass Spectrometry Proteomics Data Sets. Journal of Proteome Research 2018, 17 (4), 1547-1558.
ACS Paragon Plus Environment
16
Page 17 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
27. Winkels, H.; Ehinger, E.; Vassallo, M.; Buscher, K.; Dinh, H. Q.; Kobiyama, K.; Hamers, A. A. J.; Cochain, C.; Vafadarnejad, E.; Saliba, A. E.; Zernecke, A.; Pramod, A. B.; Ghosh, A. K.; Anto Michel, N.; Hoppe, N.; Hilgendorf, I.; Zirlik, A.; Hedrick, C. C.; Ley, K.; Wolf, D., Atlas of the Immune Cell Repertoire in Mouse Atherosclerosis Defined by Single-Cell RNA-Sequencing and Mass Cytometry. Circ Res 2018, 122 (12), 1675-1688. 28. Schlotter, F.; Halu, A.; Goto, S.; Blaser, M. C.; Body, S. C.; Lee, L. H.; Higashi, H.; DeLaughter, D. M.; Hutcheson, J. D.; Vyas, P.; Pham, T.; Rogers, M. A.; Sharma, A.; Seidman, C. E.; Loscalzo, J.; Seidman, J. G.; Aikawa, M.; Singh, S. A.; Aikawa, E., Spatiotemporal Multiomics Mapping Generates a Molecular Atlas of the Aortic Valve and Reveals Networks Driving Disease. Circulation 2018. 29. Vella, D.; Zoppis, I.; Mauri, G.; Mauri, P.; Di Silvestre, D., From protein-protein interactions to protein co-expression networks: a new perspective to evaluate large-scale proteomic data. EURASIP J Bioinform Syst Biol 2017, 2017 (1), 6. 30. Wachter, A.; Beissbarth, T., pwOmics: an R package for pathway-based integration of time-series omics data using public database knowledge. Bioinformatics 2015, 31 (18), 3072-4.
ACS Paragon Plus Environment
17
Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 18 of 21
Figure 1 A
B
C
D
E
Multiplexed cluster
Condition bias Cluster composition
ACS Paragon Plus Environment
Page 19 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
Figure 2
A
B
C D
ACS Paragon Plus Environment
Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Figure 3 A
C
B
D
F
E
ACS Paragon Plus Environment
Page 20 of 21
Page 21 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
Table of Content Graphic 84x47mm (216 x 216 DPI)
ACS Paragon Plus Environment